machine learning for signal...

93
Machine Learning for Signal Processing Fundamentals of Linear Algebra - 2 Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1

Upload: others

Post on 16-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Machine Learning for Signal Processing

Fundamentals of Linear Algebra - 2

Class 3. 8 Sep 2016

Instructor: Bhiksha Raj

11-755/18-797 1

Page 2: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Overview

• Vectors and matrices • Vector spaces

• Basic vector/matrix operations • Various matrix types • Projections

• More on matrix types • Matrix determinants • Matrix inversion • Eigenanalysis • Singular value decomposition • Matrix Calculus

11-755/18-797 2

Page 3: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

The importance of Bases

• Conventional 3D representation

– Each point (vector) is just a triplet of coordinates

– In reality, the coordinates are weights!

– X = x.u1 + y.u2 + z.u3

– u1 = [0 0 1], u2 = [0 1 0], u3 = [1 0 0]

• Unit vectors in each of the three directions

11-755/18-797 3

(x, y, z)

u1 x

y

z

u2 u3

Page 4: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

The importance of Bases

• Specialty of u1, u2, u3

– Every point in the space can be expressed as some

x.u1 + y.u2 + z.u3

– All three “bases” u1, u2, u3 are required

11-755/18-797 4

(x, y, z)

u1 x

y

z

u2 u3

𝒖1 =100

𝒖2 =010

𝒖3 =001

𝑿 =𝑥𝑦𝑧= 𝑥100

+ 𝑦010

+ 𝑧001

Page 5: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

The importance of Bases

• Is there any other set v1, v2,.. vn which share

this property

– Any point can be expressed as a.v1 + b.v2 + c.v3 …

– How many “v”s will we require 11-755/18-797 5

(a, b, c)

v1

v2

v3 𝑋 = 𝑎𝑽1+𝑏𝑽2+𝑐𝑽3

aV1

bV2

cV3

Page 6: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

u1

u2

u3

v1

v2

v3

Basis based representation

• A “good” basis captures data structure

• Here u1, u2 and u3 all take large values for data in the set

• But in the (v1 v2 v3) set, coordinate values along v3 are always small for data on the blue sheet – v3 likely represents a “noise subspace” for these data

11-755/18-797 6

Page 7: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Basis based representation

• The most important challenge in ML: Find the best set of bases for a given data set

11-755/18-797 7

u1

u2

u3

v1

v2

v3

Page 8: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Matrix as a Basis transform

• A matrix transforms a representation in terms of a standard basis u1 u2 u3 to a representation in terms of a different bases v1 v2 v3

• Finding best bases: Find matrix that transforms standard representation to these bases

11-755/18-797 8

𝑎𝑏𝑐= 𝐓𝑥𝑦𝑧

𝐗 = 𝑎𝐯1 + 𝑏𝐯2 + 𝑐𝐯3, 𝐗 = 𝑥𝐮1+ 𝑦𝐮2+ 𝑧𝐮3

Page 9: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

• Going on to more mundane stuff..

11-755/18-797 9

Page 10: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Orthogonal/Orthonormal vectors

• Two vectors are orthogonal if they are perpendicular to one another

– A.B = 0

– A vector that is perpendicular to a plane is orthogonal to every vector on

• Two vectors are orthonormal if

– They are orthogonal

– The length of each vector is 1.0

– Orthogonal vectors can be made orthonormal by scaling their lengths to 1.0

11-755/18-797 10

z

yx

A

wvu

B

0 0. zwyvxuBA

Page 11: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Orthogonal matrices

• Orthogonal Matrix : AAT = ATA = I

– The matrix is square

– All row vectors are orthonormal to one another • Every vector is perpendicular to the hyperplane formed by all other vectors

– All column vectors are also orthonormal to one another

– Observation: In an orthogonal matrix if the length of the row vectors is 1.0, the length of the column vectors is also 1.0

– Observation: In an orthogonal matrix no more than one row can have all entries with the same polarity (+ve or –ve)

11-755/18-797 11

5.075.0 0

375.0 125.0 5.0

375.0 125.0 5.0All 3 at 90o to

on another

Page 12: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Orthogonal Matrices

• Orthogonal matrices will retain the length and relative angles between transformed vectors

– Essentially, they are combinations of rotations, reflections and permutations

– Rotation matrices and permutation matrices are all orthogonal

11-755/18-797 12

q Ax

Page 13: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Orthogonal and Orthonormal Matrices

• If the vectors in the matrix are not unit length, it cannot be orthogonal – AAT != I, ATA != I

– AAT = Diagonal or ATA = Diagonal, but not both

– If all the entries are the same length, we can get AAT = ATA = Diagonal, though

• A non-square matrix cannot be orthogonal – AAT=I or ATA = I, but not both

11-755/18-797 13

5.075.0 0

375.0 125.0 5.0

1875.0 0675.0 1

Page 14: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Matrix Rank and Rank-Deficient Matrices

• Some matrices will eliminate one or more dimensions during transformation

– These are rank deficient matrices

– The rank of the matrix is the dimensionality of the transformed version of a full-dimensional object

11-755/18-797 14

P * Cone =

Page 15: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Matrix Rank and Rank-Deficient Matrices

• Some matrices will eliminate one or more dimensions during transformation – These are rank deficient matrices

– The rank of the matrix is the dimensionality of the transformed version of a full-dimensional object

11-755/18-797 15

Rank = 2 Rank = 1

Page 16: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Projections are often examples of rank-deficient transforms

11-755/18-797 16

P = W (WTW)-1 WT ; Projected Spectrogram = P*M

The original spectrogram can never be recovered P is rank deficient

P explains all vectors in the new spectrogram as a mixture of only the 4 vectors in W There are only a maximum of 4 linearly independent bases

Rank of P is 4

M =

W =

Page 17: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Non-square Matrices

• Non-square matrices add or subtract axes – More rows than columns add axes

• But does not increase the dimensionality of the dataaxes • May reduce dimensionality of the data

11-755/18-797 17

N

N

yyy

xxx

..

..

21

21

X = 2D data

06.

9.1.

9.8.

P = transform PX = 3D, rank 2

N

N

N

zzz

yyy

xxx

ˆ..ˆˆ

ˆ..ˆˆ

ˆ..ˆˆ

21

21

21

Page 18: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Non-square Matrices

• Non-square matrices add or subtract axes – More rows than columns add axes

• But does not increase the dimensionality of the data

– Fewer rows than columns reduce axes • May reduce dimensionality of the data

11-755/18-797 18

N

N

yyy

xxxˆ..ˆˆ

ˆ..ˆˆ

21

21

X = 3D data, rank 3

115.

2.13.

P = transform PX = 2D, rank 2

N

N

N

zzz

yyy

xxx

..

..

..

21

21

21

Page 19: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

The Rank of a Matrix

• The matrix rank is the dimensionality of the transformation of a full-dimensioned object in the original space

• The matrix can never increase dimensions – Cannot convert a circle to a sphere or a line to a circle

• The rank of a matrix can never be greater than the lower of its two dimensions

11-755/18-797 19

115.

2.13.

06.

9.1.

9.8.

Page 20: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

The Rank of Matrix

11-755/18-797 20

Projected Spectrogram = P * M Every vector in it is a combination of only 4 bases

The rank of the matrix is the smallest no. of bases required to describe the output E.g. if note no. 4 in P could be expressed as a combination of notes 1,2

and 3, it provides no additional information

Eliminating note no. 4 would give us the same projection

The rank of P would be 3!

M =

Page 21: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Matrix rank is unchanged by transposition

• If an N-dimensional object is compressed to a K-dimensional object by a matrix, it will also be compressed to a K-dimensional object by the transpose of the matrix

11-755/18-797 21

86.044.042.0

9.04.01.0

8.05.09.0

86.09.08.0

44.04.05.0

42.01.09.0

Page 22: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Matrix Determinant

• The determinant is the “volume” of a matrix

• Actually the volume of a parallelepiped formed from its row vectors – Also the volume of the parallelepiped formed from its column

vectors

• Standard formula for determinant: in text book

11-755/18-797 22

(r1)

(r2) (r1+r2)

(r1)

(r2)

Page 23: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Matrix Determinant: Another Perspective

• The determinant is the ratio of N-volumes

– If V1 is the volume of an N-dimensional sphere “O” in N-dimensional space

• O is the complete set of points or vertices that specify the object

– If V2 is the volume of the N-dimensional ellipsoid specified by A*O, where A is a matrix that transforms the space

– |A| = V2 / V1

11-755/18-797 23

Volume = V1 Volume = V2

7.09.0 7.0

8.0 8.0 0.1

7.0 0 8.0

Page 24: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Matrix Determinants • Matrix determinants are only defined for square matrices

– They characterize volumes in linearly transformed space of the same dimensionality as the vectors

• Rank deficient matrices have determinant 0

– Since they compress full-volumed N-dimensional objects into zero-volume N-dimensional objects

• E.g. a 3-D sphere into a 2-D ellipse: The ellipse has 0 volume (although it does have area)

• Conversely, all matrices of determinant 0 are rank deficient

– Since they compress full-volumed N-dimensional objects into zero-volume objects

11-755/18-797 24

Page 25: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Multiplication properties • Properties of vector/matrix products

– Associative

– Distributive

– NOT commutative!!!

• left multiplications ≠ right multiplications

– Transposition

11-755/18-797 25

A (B C) (A B) C

A B B A

A (BC) A BA C

TTTABBA

Page 26: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Determinant properties • Associative for square matrices

– Scaling volume sequentially by several matrices is equal to scaling once by the product of the matrices

• Volume of sum != sum of Volumes

• Commutative

– The order in which you scale the volume of an object is irrelevant

11-755/18-797 26

CBACBA

BAABBA

CBCB )(

Page 27: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Matrix Inversion

• A matrix transforms an N-dimensional object to a different N-dimensional object

• What transforms the new object back to the original?

– The inverse transformation

• The inverse transformation is called the matrix inverse

11-755/18-797 28

1

???

???

???

TQ

7.09.0 7.0

8.0 8.0 0.1

7.0 0 8.0

T

Page 28: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Matrix Inversion

• The product of a matrix and its inverse is the identity matrix

– Transforming an object, and then inverse transforming it gives us back the original object

11-755/18-797 29

T T-1

ITTDTDT 11

ITTDDTT 11

Page 29: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

• Given the Transform T and transformed vector Y, how do we determine X?

11-755/18-797 30

The Inverse Transform and Simultaneous Equations

T

X Y

Page 30: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Inverse Transform and Simultaneous Equation

• Inverting the transform is identical to solving simultaneous equations

11-755/18-797 31

𝑎𝑏𝑐= 𝐓𝑥𝑦𝑧

𝑎 = 𝑇11𝑥 + 𝑇12𝑦 + 𝑇13𝑧𝑏 = 𝑇21𝑥 + 𝑇22𝑦 + 𝑇23𝑧𝑐 = 𝑇31𝑥 + 𝑇32𝑦 + 𝑇33𝑧

Given

𝑎𝑏𝑐

find

𝑥𝑦𝑧

𝐓=

𝑇11 𝑇12 𝑇13𝑇21 𝑇22 𝑇23𝑇31 𝑇32 𝑇33

Page 31: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Inverting rank-deficient matrices

• Rank deficient matrices “flatten” objects – In the process, multiple points in the original object get mapped to the same

point in the transformed object

• It is not possible to go “back” from the flattened object to the original object – Because of the many-to-one forward mapping

• Rank deficient matrices have no inverse

11-755/18-797 32

75.0433.00

433.025.0

001

Page 32: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Inverse Transform and Simultaneous Equation

• Inverting the transform is identical to solving

simultaneous equations

• Rank-deficient transforms result in too-few equations

– Cannot be inverted to obtain a unique solution

11-755/18-797 33

𝑎𝑏= 𝐓𝑥𝑦𝑧

Given 𝑎𝑏

find

𝑥𝑦𝑧

𝐓=𝑇11 𝑇12 𝑇13𝑇21 𝑇22 𝑇23

𝑎 = 𝑇11𝑥 + 𝑇12𝑦 + 𝑇13𝑧𝑏 = 𝑇21𝑥 + 𝑇22𝑦 + 𝑇23𝑧

Page 33: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Inverse Transform and Simultaneous Equation

• Inverting the transform is identical to solving simultaneous equations

• Rank-deficient transforms result in too-few equations

– Cannot be inverted to obtain a unique solution

• Or too many equations

– Cannot be inverted to obtain an exact solution 11-755/18-797 34

𝑎𝑏𝑐= 𝐓𝑥𝑦 𝐓=

𝑇11 𝑇12𝑇21 𝑇22𝑇31 𝑇32

𝑎 = 𝑇11𝑥 + 𝑇12𝑦𝑏 = 𝑇21𝑥 + 𝑇22𝑦𝑐 = 𝑇31𝑥 + 𝑇32𝑦

Given

𝑎𝑏𝑐

find 𝑥𝑦

Page 34: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Rank Deficient Matrices

11-755/18-797 35

The projection matrix is rank deficient

You cannot recover the original spectrogram from the projected one..

M =

Page 35: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Revisiting Projections and Least Squares • Projection computes a least squared error estimate

• For each vector V in the music spectrogram matrix

– Approximation: Vapprox = a.note1 + b.note2 + c.note3..

– Error vector E = V – Vapprox

– Squared error energy for V e(V) = norm(E)2

• Projection computes Vapprox for all vectors such that Total error is minimized

• But WHAT ARE “a” “b” and “c”?

11-755/18-797 36

T

note

1

note

2

note

3

c

b

a

TVapprox

Page 36: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

The Pseudo Inverse (PINV)

• We are approximating spectral vectors V as the transformation of the vector [a b c]T

– Note – we’re viewing the collection of bases in T as a transformation

• The solution is obtained using the pseudo inverse

– This give us a LEAST SQUARES solution

• If T were square and invertible Pinv(T) = T-1, and V=Vapprox

11-755/18-797 37

c

b

a

TVapprox

c

b

a

TV

𝑎𝑏𝑐= 𝑃𝑖𝑛𝑣 𝑇 𝑉

Page 37: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Generalization to matrices

11-755/18-797 38

𝐗 = 𝐓𝐘 𝐘 = 𝐓−1𝐗 𝐗 = 𝐘𝐓 𝐘 = 𝐗𝐓−1

Left multiplication Right multiplication

𝐗 = 𝐓𝐘 𝐘 = Pinv(𝐓)𝐗 𝐗 = 𝐘𝐓 𝐘 = 𝐗Pinv(𝐓)

Left multiplication Right multiplication

• Unique exact solution exists

• T must be square

• No unique exact solution exists

– At least one (if not both) of the forward and backward equations may be inexact

• T may or may not be square

Page 38: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

The Pseudo Inverse

• Case 1: Too many solutions

• Pinv(X)Y picks the shortest solution 11-755/18-797 39

𝑎𝑏= 𝐓𝑥𝑦𝑧

𝑎 = 𝑇11𝑥 + 𝑇12𝑦 + 𝑇13𝑧𝑏 = 𝑇21𝑥 + 𝑇22𝑦 + 𝑇23𝑧

𝑥𝑦𝑧= 𝑃𝑖𝑛𝑣(𝐓)

𝑎𝑏

X

Y

Z

Plane of solutions

Shortest solution Figure only meant for illustration for the above equations, actual set of solutions is a line, not a plane. Pinv(T)A will be the point on the line closest to origin

Page 39: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

The Pseudo Inverse

• Case 2: No exact solution

• Pinv(X)Y picks the solution that results in the lowest error 11-755/18-797 40

𝑥𝑦 = 𝑃𝑖𝑛𝑣(𝐓)

𝑎𝑏𝑐

Figure only meant for illustration for the above equations, Pinv(T) will actually have 6 components. The error is a quadratic in 6 dimensions

𝑎𝑏𝑐= 𝐓𝑥𝑦 𝐓=

𝑇11 𝑇12𝑇21 𝑇22𝑇31 𝑇32

𝑎 = 𝑇11𝑥 + 𝑇12𝑦𝑏 = 𝑇21𝑥 + 𝑇22𝑦𝑐 = 𝑇31𝑥 + 𝑇32𝑦

(Pinv(T))12 (Pinv(T))11

||X – Pinv(T)A||2 “Optimal” Pinv(T)

Page 40: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Explaining music with one note

11-755/18-797 41

Recap: P = W (WTW)-1 WT, Projected Spectrogram = P*M

Approximation: M = W*X

The amount of W in each vector = X = PINV(W).M

W.Pinv(W).M = Projected Spectrogram

W.Pinv(W) = Projection matrix!!

M =

W =

PINV(W) = (WTW)-1WT

X =PINV(W).M

Page 41: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Explanation with multiple notes

11-755/18-797 42

X = Pinv(W).M; Projected matrix = W.X = W.Pinv(W).M

M =

W =

X=PINV(W)M

Page 42: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

How about the other way?

11-755/18-797 43

W = M Pinv(V) U = WV

M =

W = ? ?

V =

U =

Page 43: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Pseudo-inverse (PINV)

• Pinv() applies to non-square matrices

• Pinv ( Pinv (A))) = A

• A.Pinv(A)= projection matrix!

– Projection onto the columns of A

• If A = K x N matrix and K > N, A projects N-D vectors into a higher-dimensional K-D space

– Pinv(A) = NxK matrix

– Pinv(A).A = I in this case

• Otherwise A.Pinv(A) = I 11-755/18-797 44

Page 44: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Finding the Transform

• Given examples – T.X1 = Y1

– T.X2 = Y2

– .. – T.XN = YN

• Find T

11-755/18-797 45

Page 45: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Finding the Transform

• Pinv works here too

11-755/18-797 46

𝐗 =↑ ⋮ ↑𝑋1 ⋱ 𝑋𝑁↓ ⋮ ↓

𝐘 =↑ ⋮ ↑𝑌1 ⋱ 𝑌𝑁↓ ⋮ ↓

𝐗 𝐘

𝐘 = 𝐓𝐗 𝐓 = 𝐘Pinv(𝐗)

Page 46: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Finding the Transform: Inexact

• Even works for inexact solutions

• We desire to find a linear transform T that maps X to Y

– But such a linear transform doesn’t really exist

• Pinv will give us the “best guess” for T that minimizes the total squared error between Y and TX 47

𝐗 =↑ ⋮ ↑𝑋1 ⋱ 𝑋𝑁↓ ⋮ ↓

𝐘 =↑ ⋮ ↑𝑌1 ⋱ 𝑌𝑁↓ ⋮ ↓

𝐗 𝐘

𝐘 ≈ 𝐓𝐗 𝐓 = 𝐘Pinv(𝐗) minimizes ||𝑌𝑖 − 𝑇𝑋𝑖||2

𝑖

Page 47: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Matrix inversion (division)

• The inverse of matrix multiplication – Not element-wise division!!

• Provides a way to “undo” a linear transformation – Inverse of the unit matrix is itself

– Inverse of a diagonal is diagonal

– Inverse of a rotation is a (counter)rotation (its transpose!)

– Inverse of a rank deficient matrix does not exist! • But pseudoinverse exists

• For square matrices: Pay attention to multiplication side!

• If matrix is not square, of the matrix is not invertible, use a matrix pseudoinverse:

11-755/18-797 48

A B C, A C B1, B A 1 C

CABBCACBA , ,

Page 48: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Eigenanalysis

• If something can go through a process mostly unscathed in character it is an eigen-something

– Sound example:

• A vector that can undergo a matrix multiplication and keep pointing the same way is an eigenvector

– Its length can change though

• How much its length changes is expressed by its corresponding eigenvalue

– Each eigenvector of a matrix has its eigenvalue

• Finding these “eigenthings” is called eigenanalysis

11-755/18-797 49

Page 49: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

EigenVectors and EigenValues

• Vectors that do not change angle upon

transformation

– They may change length

– V = eigen vector

– l = eigen value 50

VMV l

0.17.07.05.1M

Black vectors are eigen vectors

11-755/18-797

Page 50: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Eigen vector example

11-755/18-797 51

Page 51: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Matrix multiplication revisited

• Matrix transformation “transforms” the space

– Warps the paper so that the normals to the two vectors now lie along the axes

11-755/18-797 52

2.11.1

07.00.1A

Page 52: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

A stretching operation

• Draw two lines

• Stretch / shrink the paper along these lines by factors l1 and l2

– The factors could be negative – implies flipping the paper

• The result is a transformation of the space

11-755/18-797 53

1.4 0.8

Page 53: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

A stretching operation

11-755/18-797 54

Draw two lines

Stretch / shrink the paper along these lines by factors l1 and l2

The factors could be negative – implies flipping the paper

The result is a transformation of the space

Page 54: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

A stretching operation

11-755/18-797 55

Draw two lines

Stretch / shrink the paper along these lines by factors l1 and l2

The factors could be negative – implies flipping the paper

The result is a transformation of the space

Page 55: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Physical interpretation of eigen vector

• The result of the stretching is exactly the same as transformation by a matrix

• The axes of stretching/shrinking are the eigenvectors

– The degree of stretching/shrinking are the corresponding eigenvalues

• The EigenVectors and EigenValues convey all the information about the matrix

11-755/18-797 56

Page 56: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Physical interpretation of eigen vector

• The result of the stretching is exactly the same as transformation by a matrix

• The axes of stretching/shrinking are the eigenvectors

– The degree of stretching/shrinking are the corresponding eigenvalues

• The EigenVectors and EigenValues convey all the information about the matrix

11-755/18-797 57

1

2

1

21

0

0

VVM

VVV

ll

Page 57: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Eigen Analysis • Not all square matrices have nice eigen values and

vectors – E.g. consider a rotation matrix

– This rotates every vector in the plane • No vector that remains unchanged

• In these cases the Eigen vectors and values are complex

11-755/18-797 58

'

'

cossin

sincos

y

xX

y

xX

new

qq

qqqR

q

Page 58: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Singular Value Decomposition

• Matrix transformations convert circles to ellipses

• Eigen vectors are vectors that do not change direction in the process

• There is another key feature of the ellipse to the left that carries information about the transform

– Can you identify it?

11-755/18-797 59

2.11.1

07.00.1A

Page 59: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Singular Value Decomposition

• The major and minor axes of the transformed ellipse define the ellipse

– They are at right angles

• These are transformations of right-angled vectors on the original circle!

11-755/18-797 60

2.11.1

07.00.1A

Page 60: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Singular Value Decomposition

• U and V are orthonormal matrices

– Columns are orthonormal vectors

• S is a diagonal matrix

• The right singular vectors in V are transformed to the left singular vectors in U

– And scaled by the singular values that are the diagonal entries of S

11-755/18-797 61

2.11.1

07.00.1A

matlab: [U,S,V] = svd(A)

A = U S VT

V1 V2

s1U1

s2U2

Page 61: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Singular Value Decomposition • The left and right singular vectors are not the same

– If A is not a square matrix, the left and right singular vectors will be of different dimensions

• The singular values are always real

• The largest singular value is the largest amount by which a vector is scaled by A

– Max (|Ax| / |x|) = smax

• The smallest singular value is the smallest amount by which a vector is scaled by A

– Min (|Ax| / |x|) = smin

– This can be 0 (for low-rank or non-square matrices)

11-755/18-797 62

Page 62: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

The Singular Values

• Square matrices: product of singular values = determinant of the matrix

– This is also the product of the eigen values

– I.e. there are two different sets of axes whose products give you the area of an ellipse

• For any “broad” rectangular matrix A, the largest singular value of any square submatrix B cannot be larger than the largest singular value of A

– An analogous rule applies to the smallest singular value

– This property is utilized in various problems, such as compressive sensing 63

s1U1

s2U1

11-755/18-797

Page 63: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

SVD vs. Eigen Analysis

• Eigen analysis of a matrix A: – Find two vectors such that their absolute directions are not changed by the

transform

• SVD of a matrix A: – Find two vectors such that the angle between them is not changed by the

transform

• For one class of matrices, these two operations are the same

11-755/18-797 64

s1U1

s2U1

Page 64: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

A matrix vs. its transpose

• Multiplication by matrix A:

– Transforms right singular vectors in V to left singular vectors U

• Multiplication by its transpose AT:

– Transforms left singular vectors U to right singular vector V

• A AT : Converts V to U, then brings it back to V

– Result: Only scaling

11-755/18-797 65

11.0

07.A

TA

Page 65: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Symmetric Matrices

• Matrices that do not change on transposition – Row and column vectors are identical

• The left and right singular vectors are identical – U = V – A = U S UT

• They are identical to the Eigen vectors of the matrix • Symmetric matrices do not rotate the space

– Only scaling and, if Eigen values are negative, reflection

11-755/18-797 66

17.0

7.05.1

Page 66: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Symmetric Matrices

• Matrices that do not change on transposition – Row and column vectors are identical

• Symmetric matrix: Eigen vectors and Eigen values are always real

• Eigen vectors are always orthogonal – At 90 degrees to one another

11-755/18-797 67

17.0

7.05.1

Page 67: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Symmetric Matrices

• Eigen vectors point in the direction of the major and minor axes of the ellipsoid resulting from the transformation of a spheroid

– The eigen values are the lengths of the axes

11-755/18-797 68

17.0

7.05.1

Page 68: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Symmetric matrices • Eigen vectors Vi are orthonormal

– ViTVi = 1

– ViTVj = 0, i != j

• Listing all eigen vectors in matrix form V

– VT = V-1

– VT V = I

– V VT= I

• M Vi = lVi

• In matrix form : M V = V

– is a diagonal matrix with all eigen values

• M = V VT 11-755/18-797 69

Page 69: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Square root of a symmetric matrix

11-755/18-797 70

T

T

VSqrtVCSqrt

VVC

).(.)(

TT VSqrtVVSqrtVCSqrtCSqrt ).(.).(.)().(

CVVVSqrtSqrtV TT )().(.

Page 70: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Definiteness..

• SVD: Singular values are always positive!

• Eigen Analysis: Eigen values can be real or imaginary

– Real, positive Eigen values represent stretching of the space along the Eigen vector

– Real, negative Eigen values represent stretching and reflection (across origin) of Eigen vector

– Complex Eigen values occur in conjugate pairs

• A square (symmetric) matrix is positive definite if all Eigen values are real and positive, and are greater than 0

– Transformation can be explained as stretching and rotation

– If any Eigen value is zero, the matrix is positive semi-definite

11-755/18-797 71

Page 71: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Positive Definiteness..

• Property of a positive definite matrix: Defines

inner product norms

– xTAx is always positive for any vector x if A is positive

definite

• Positive definiteness is a test for validity of Gram

matrices

– Such as correlation and covariance matrices

– We will encounter these and other gram matrices

later

11-755/18-797 72

Page 72: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

SVD on data-container matrices

• We can also perform SVD on matrices that are data containers

• S is a d x N rectangular matrix

– N vectors of dimension d

• U is an orthogonal matrix of d vectors of size d

– All vectors are length 1

• V is an orthogonal matrix of N vectors of size N

• S is a d x N diagonal matrix with non-zero entries only on diagonal

11-755/18-797 73

[

[ [

[

𝐗 = 𝑋1 𝑋2 ⋯𝑋𝑁

𝐗 = 𝐔𝐒𝐕T

Page 73: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

SVD on data-container matrices

11-755/18-797 74

[

[ [

[

𝐗 = 𝑋1 𝑋2 ⋯𝑋𝑁

𝐗 = 𝐔𝐒𝐕T 𝐗 =

𝐔 = 𝐕T = 𝐒 = 0 0

|Ui| = 1.0 for every vector in U

|Vi| = 1.0 for every vector in V

Page 74: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

SVD on data-container matrices

11-755/18-797 75

𝐗 = 𝐔𝐒𝐕T =

𝐔 = 𝐕T = 𝐒 = 0 0

+

+

𝐗 = 𝑠𝑖𝑈𝑖𝑉𝑖𝑇

𝑖

Page 75: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Expanding the SVD

• Each left singular vector and the corresponding right singular vector contribute on “basic” component to the data

• The “magnitude” of its contribution is the corresponding singular value

11-755/18-797 76

[

[ [

[

𝐗 = 𝑋1 𝑋2 ⋯𝑋𝑁 𝐗 = 𝐔𝐒𝐕T

...444333222111 TTTT VUsVUsVUsVUsX

Page 76: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Expanding the SVD

• Each left singular vector and the corresponding right singular vector contribute on “basic” component to the data

• The “magnitude” of its contribution is the corresponding singular value

• Low singular-value components contribute little, if anything

– Carry little information

– Are often just “noise” in the data 11-755/18-797 77

...444333222111 TTTT VUsVUsVUsVUsX

Page 77: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Expanding the SVD

• Low singular-value components contribute little, if anything

– Carry little information

– Are often just “noise” in the data

• Data can be recomposed using only the “major” components with minimal change of value

– Minimum squared error between original data and recomposed data

– Sometimes eliminating the low-singular-value components will, in fact “clean” the data

11-755/18-797 78

...444333222111 TTTT VUsVUsVUsVUsX

TT VUsVUs 222111 X

Page 78: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

An audio example

• The spectrogram has 974 vectors of dimension 1025

– A 1024x974 matrix!

• Decompose: M = USVT = Si siUi ViT

• U is 1024 x 1024

• V is 974 x 974

• There are 974 non-zero singular values Si

11-755/18-797 79

Page 79: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Singular Values

• Singular values for spectrogram M

– Most Singluar values are close to zero

– The corresponding components are “unimportant”

11-755/18-797 80

Page 80: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

An audio example

• The same spectrogram constructed from only the 25 highest singular-value components

– Looks similar • With 100 components, it would be indistinguishable from the

original

– Sounds pretty close

– Background “cleaned up”

11-755/18-797 81

Page 81: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

With only 5 components

• The same spectrogram constructed from only the 5 highest-valued components

– Corresponding to the 5 largest singular values

– Highly recognizable

– Suggests that there are actually only 5 significant unique note combinations in the music

11-755/18-797 82

Page 82: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Trace

• The trace of a matrix is the sum of the diagonal entries

• It is equal to the sum of the Eigen values!

11-755/18-797 83

44434241

34333231

24232221

14131211

aaaa

aaaa

aaaa

aaaa

A

44332211)( aaaaATr

i

iiaATr ,)(

i

i

i

iiaATr l,)(

Page 83: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Trace

• Often appears in Error formulae

• Useful to know some properties..

11-755/18-797 84

44434241

34333231

24232221

14131211

dddd

aaad

dddd

dddd

D

44434241

34333231

24232221

14131211

cccc

cccc

cccc

cccc

C

CDE ji

jiEerror,

2

, )( TEETrerror

Page 84: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Properties of a Trace

• Linearity: Tr(A+B) = Tr(A) + Tr(B)

Tr(c.A) = c.Tr(A)

• Cycling invariance:

– Tr (ABCD) = Tr(DABC) = Tr(CDAB) =

Tr(BCDA)

– Tr(AB) = Tr(BA)

• Frobenius norm F(A) = Si,j aij2 = Tr(AAT)

11-755/18-797 85

Page 85: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Decompositions of matrices • Square A: LU decomposition

– Decompose A = L U

– L is a lower triangular matrix • All elements above diagonal are 0

– R is an upper triangular matrix • All elements below diagonal are zero

– Cholesky decomposition: A is symmetric, L = UT

• QR decompositions: A = QR – Q is orthgonal: QQT = I

– R is upper triangular

• Generally used as tools to compute Eigen decomposition or least square solutions

11-755/18-797 86

=

=

Page 86: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Calculus of Matrices

• Derivative of scalar w.r.t. vector

• For any scalar z that is a function of a vector x

• The dimensions of dz / dx are the same as the dimensions of x

11-755/18-797 87

Nx

x

1

x

Ndx

dz

dx

dz

d

dz

1

x

N x 1 vector

N x 1 vector

Page 87: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Calculus of Matrices

• Derivative of scalar w.r.t. matrix

• For any scalar z that is a function of a matrix X

• The dimensions of dz / dX are the same as the dimensions of X

11-755/18-797 88

232221

131211

xxx

xxxX

232221

131211

dx

dz

dx

dz

dx

dz

dx

dz

dx

dz

dx

dz

d

dz

X

N x M matrix

N x M matrix

Page 88: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Calculus of Matrices

• Derivative of vector w.r.t. vector

• For any Mx1 vector y that is a function of an Nx1 vector x

• dy / dx is an MxN matrix

11-755/18-797 89

N

MM

N

dx

dy

dx

dy

dx

dy

dx

dy

d

d

1

1

1

1

x

y

M x N matrix

Nx

x

1

x

My

y

1

y

Page 89: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Calculus of Matrices

• Derivative of vector w.r.t. matrix

• For any Mx1 vector y that is a function of an NxL matrx X

• dy / dX is an MxLxN tensor (note order)

11-755/18-797 90

X

y

d

d

M x 3 x 2 tensor

My

y

1

y

232221

131211

xxx

xxxX

(i,j,k)th element =

jk

i

dx

dy

,

Page 90: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Calculus of Matrices

• Derivative of matrix w.r.t. matrix

• For any MxK vector Y that is a function of an NxL matrx X

• dY / dX is an MxKxLxN tensor (note order)

11-755/18-797 91

232221

131211

yyy

yyyY (i,j)th element =

ijdx

dy

,

11

Page 91: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

In general

• The derivative of an N1 x N2 x N3 x … tensor w.r.t to an M1 x M2 x M3 x … tensor

• Is an N1 x N2 x N3 x … x ML x ML-1 x … x M1 tensor

11-755/18-797 92

Page 92: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Compound Formulae

• Let Y = f ( g ( h ( X ) ) )

• Chain rule (note order of multiplication)

• The # represents a transposition operation

– That is appropriate for the tensor

11-755/18-797 93

))((

))(((

)(

))(()(##

X

X

X

X

X

X

X

Y

hdg

hgdf

dh

hdg

d

dh

d

d

Page 93: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2016/slides/Lecture3.Linear...Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview • Vectors and

Example

• y is N x 1

• x is M x 1

• A is N x M

• Compute dz/dA

– On board

11-755/18-797 94

2Axy z