03 machine learning linear algebra

Machine Learning for Data MiningLinear Algebra Review

Andres Mendez-Vazquez

May 14, 2015

1 / 50

Outline

1 IntroductionWhat is a Vector?

2 Vector SpacesDefinitionLinear Independence and Basis of Vector SpacesNorm of a VectorInner ProductMatricesTrace and DeterminantMatrix DecompositionSingular Value Decomposition

2 / 50

Outline

3 / 50

What is a Vector?

A ordered tuple of numbers

x1x2...

Expressing a magnitude and a direction

4 / 50

What is a Vector?

A ordered tuple of numbers

x1x2...

Expressing a magnitude and a direction

Magnitude

Direction

4 / 50

Outline

5 / 50

Vector Spaces

DefinitionA vector is an element of a vector space

Vector Space VIt is a set that contains all linear combinations of its elements:

1 If x,y ∈ V then x + y ∈ V .2 If x ∈ V then αx ∈ V for any scalar α.3 There exists 0 ∈ V then x + 0 = x for any x ∈ V .

A subspaceIt is a subset of a vector space that is also a vector space

6 / 50

Vector Spaces

6 / 50

Vector Spaces

6 / 50

Vector Spaces

6 / 50

Vector Spaces

6 / 50

Classic Example

Euclidean Space R3

7 / 50

DefinitionThe span of any set of vectors {x1,x2, ...,xn} is defined as:

span (x1,x2, ...,xn) = α1x1 + α2x2 + ...+ αnxn

What Examples can you Imagine?Give it a shot!!!

8 / 50

DefinitionThe span of any set of vectors {x1,x2, ...,xn} is defined as:

span (x1,x2, ...,xn) = α1x1 + α2x2 + ...+ αnxn

What Examples can you Imagine?Give it a shot!!!

8 / 50

Subspaces of Rn

A line through the origin in Rn

A plane in Rn

9 / 50

Subspaces of Rn

A line through the origin in Rn

A plane in Rn

9 / 50

Outline

10 / 50

Linear Independence and Basis of Vector Spaces

Fact 1A vector x is a linearly independent of a set of vectors {x1,x2, ...,xn} if itdoes not lie in their span.

Fact 2A set of vectors is linearly independent if every vector is linearlyindependent of the rest.

The Rest1 A basis of a vector space V is a linearly independent set of vectors

whose span is equal to V2 If the basis has d vectors then the vector space V has dimensionality

11 / 50

Outline

12 / 50

Norm of a Vector

DefinitionA norm ‖u‖ measures the magnitud of the vector.

Properties1 Homogeneity: ‖αx‖ = α ‖x‖.2 Triangle inequality: ‖x + y‖ ≤ ‖x‖+ ‖y‖.3 Point Separation ‖x‖ = 0 if and only if x = 0.

Examples1 Manhattan or `1-norm : ‖x‖1 =

∑di=1 |xi |.

2 Euclidean or `2-norm : ‖x‖2 =√∑d

i=1 x2i .

13 / 50

Norm of a Vector

∑di=1 |xi |.

i=1 x2i .

13 / 50

Norm of a Vector

∑di=1 |xi |.

i=1 x2i .

13 / 50

Norm of a Vector

∑di=1 |xi |.

i=1 x2i .

13 / 50

Norm of a Vector

∑di=1 |xi |.

i=1 x2i .

13 / 50

Norm of a Vector

∑di=1 |xi |.

i=1 x2i .

13 / 50

ExamplesExample `1-norm and `2-norm

14 / 50

Outline

15 / 50

Inner Product

DefinitionThe inner product between u and v

〈u, v〉 =n∑

i=1uivi .

It is the projection of one vector onto the other one

Remark: It is related to the Euclidean norm: 〈u,u〉 = ‖u‖22.

16 / 50

Inner ProductDefinitionThe inner product between u and v

〈u, v〉 =n∑

i=1uivi .

It is the projection of one vector onto the other one

Remark: It is related to the Euclidean norm: 〈u,u〉 = ‖u‖22.16 / 50

Properties

MeaningThe inner product is a measure of correlation between two vectors, scaledby the norms of the vectors

if u · v > 0, u and v are aligned

17 / 50

Properties

MeaningThe inner product is a measure of correlation between two vectors, scaledby the norms of the vectors

if u · v > 0, u and v are aligned

17 / 50

Properties

The inner product is a measure of correlation between two vectors,scaled by the norms of the vectors

18 / 50

Properties

19 / 50

Properties

20 / 50

Definitions involving the norm

OrthonormalThe vectors in orthonormal basis have unit Euclidean norm and areorthonorgonal.

To express a vector x in an orthonormal basisFor example, given x = α1b1 + α2b2

〈x, b1〉 = 〈α1b1 + α2b2, b1〉= α1 〈b1, b1〉+ α2 〈b2, b1〉= α1 + 0

Likewise, 〈x, b2〉 = α2

21 / 50

〈x, b1〉 = 〈α1b1 + α2b2, b1〉= α1 〈b1, b1〉+ α2 〈b2, b1〉= α1 + 0

21 / 50

〈x, b1〉 = 〈α1b1 + α2b2, b1〉= α1 〈b1, b1〉+ α2 〈b2, b1〉= α1 + 0

21 / 50

Outline

22 / 50

Linear Operator

DefinitionA linear operator L : U → V is a map from a vector space U to anothervector space V satisfies:

L (u1 + u2) = L (u1) + L (u2)

Something NotableIf the dimension n of U and m of V are finite, L can be represented bym × n matrix:

a11 a12 · · · a1na21 a22 · · · a2n

· · ·am1 am2 · · · amn

23 / 50

Linear Operator

DefinitionA linear operator L : U → V is a map from a vector space U to anothervector space V satisfies:

L (u1 + u2) = L (u1) + L (u2)

Something NotableIf the dimension n of U and m of V are finite, L can be represented bym × n matrix:

a11 a12 · · · a1na21 a22 · · · a2n

· · ·am1 am2 · · · amn

23 / 50

Thus, product of

The product of two linear operator can be seen as the multiplicationof two matrices

a11 a12 · · · a1na21 a22 · · · a2n

· · ·am1 am2 · · · amn

b11 b12 · · · b1pb21 b22 · · · b2p

· · ·bn1 bn2 · · · bnp

i=1 a1ibi1∑n

i=1 a1ibi2 · · ·∑n

i=1 a1ibip∑ni=1 a2ibi1

∑ni=1 a2ibi2 · · ·

∑ni=1 a2ibip

· · ·∑ni=1 amibi1

∑ni=1 amibi2 · · ·

∑ni=1 amibip

Note: if A is m × n and B is n × p, then AB is m × p.

24 / 50

Thus, product of

The product of two linear operator can be seen as the multiplicationof two matrices

a11 a12 · · · a1na21 a22 · · · a2n

· · ·am1 am2 · · · amn

b11 b12 · · · b1pb21 b22 · · · b2p

· · ·bn1 bn2 · · · bnp

i=1 a1ibi1∑n

i=1 a1ibi2 · · ·∑n

i=1 a1ibip∑ni=1 a2ibi1

∑ni=1 a2ibi2 · · ·

∑ni=1 a2ibip

· · ·∑ni=1 amibi1

∑ni=1 amibi2 · · ·

∑ni=1 amibip

Note: if A is m × n and B is n × p, then AB is m × p.

24 / 50

Transpose of a MatrixThe transpose of a matrix is obtained by flipping the rows andcolumns

a11 a21 · · · an1a12 a22 · · · an2

· · ·a1m a2m · · · anm

Which the following properties(AT)T

(A + B)T = AT + BT

(AB)T = BT AT

Not only that, we have the inner product

〈u, v〉 = uT v25 / 50

a11 a21 · · · an1a12 a22 · · · an2

· · ·a1m a2m · · · anm

(A + B)T = AT + BT

(AB)T = BT AT

〈u, v〉 = uT v25 / 50

a11 a21 · · · an1a12 a22 · · · an2

· · ·a1m a2m · · · anm

(A + B)T = AT + BT

(AB)T = BT AT

〈u, v〉 = uT v25 / 50

a11 a21 · · · an1a12 a22 · · · an2

· · ·a1m a2m · · · anm

(A + B)T = AT + BT

(AB)T = BT AT

〈u, v〉 = uT v25 / 50

a11 a21 · · · an1a12 a22 · · · an2

· · ·a1m a2m · · · anm

(A + B)T = AT + BT

(AB)T = BT AT

〈u, v〉 = uT v25 / 50

As always, we have the identity operator

The identity operator in matrix multiplication is defined as

1 0 · · · 00 1 · · · 0

· · ·0 0 · · · 1

With propertiesFor any matrix A, AI = A.I is the identity operator for the matrix product.

26 / 50

1 0 · · · 00 1 · · · 0

· · ·0 0 · · · 1

26 / 50

1 0 · · · 00 1 · · · 0

· · ·0 0 · · · 1

26 / 50

Column Space, Row Space and Rank

Let A be an m × n matrixWe have the following spaces...

Column spaceSpan of the columns of A.Linear subspace of Rm .

Row spaceSpan of the rows of A.Linear subspace of Rn .

27 / 50

Important facts

Something NotableThe column and row space of any matrix have the same dimension.

The rankThe dimension is the rank of the matrix.

28 / 50

Important facts

Something NotableThe column and row space of any matrix have the same dimension.

The rankThe dimension is the rank of the matrix.

28 / 50

Range and Null Space

RangeSet of vectors equal to Au for some u ∈ Rn .

Range (A) = {x|x = Au for some u ∈ Rn}

It is a linear subspace of Rm and also called the column space of A.

Null SpaceWe have the following definition

Null Space (A) = {u|Au = 0}

It is a linear subspace of Rm .

29 / 50

Important fact

Something NotableEvery vector in the null space is orthogonal to the rows of A.The null space and row space of a matrix are orthogonal.

30 / 50

Important fact

Something NotableEvery vector in the null space is orthogonal to the rows of A.The null space and row space of a matrix are orthogonal.

30 / 50

Range and Column Space

We have another interpretation of the matrix-vector product

Au = (A1 A2 · · · An)

u1u2...

=u1A1 + u2A2 + · · ·+ unAn

ThusThe result is a linear combination of the columns of A.Actually, the range is the column space.

31 / 50

Au = (A1 A2 · · · An)

u1u2...

=u1A1 + u2A2 + · · ·+ unAn

31 / 50

Au = (A1 A2 · · · An)

u1u2...

=u1A1 + u2A2 + · · ·+ unAn

31 / 50

Matrix Inverse

Something NotableFor an n × n matrix A: rank + dim(null space) = n.if dim(null space)= 0 then A is full rank.In this case, the action of the matrix is invertible.The inversion is also linear and consequently can be represented byanother matrix A−1.A−1 is the only matrix such that A−1A = AA−1 = I .

32 / 50

Matrix Inverse

32 / 50

Matrix Inverse

32 / 50

Matrix Inverse

32 / 50

Matrix Inverse

32 / 50

Orthogonal Matrices

DefinitionAn orthogonal matrix U satisfies U T U = I .

PropertiesU has orthonormal columns.

In additionApplying an orthogonal matrix to two vectors does not change their innerproduct: 〈Uu,Uv〉 = (Uu)T Uv

=uT U T Uv=uT v= 〈u, v〉

33 / 50

Orthogonal Matrices

33 / 50

Orthogonal Matrices

33 / 50

Example

A classic oneMatrices representing rotations are orthogonal.

34 / 50

Outline

35 / 50

Trace and Determinant

Definition (Trace)The trace is the sum of the diagonal elements of a square matrix.

Definition (Determinant)The determinant of a square matrix A, denoted by |A|, is defined as

det (A) =n∑

j=1(−1)i+j aijMij

where Mij is determinant of matrix A without the row i and column j.

36 / 50

Trace and Determinant

Definition (Trace)The trace is the sum of the diagonal elements of a square matrix.

Definition (Determinant)The determinant of a square matrix A, denoted by |A|, is defined as

det (A) =n∑

j=1(−1)i+j aijMij

where Mij is determinant of matrix A without the row i and column j.

36 / 50

Special Case

For a 2× 2 matrix A =(

a bc d

|A| = ad − bc

The absolute value of |A|is the area of the parallelogram given by therows of A

37 / 50

Special Case

For a 2× 2 matrix A =(

a bc d

|A| = ad − bc

The absolute value of |A|is the area of the parallelogram given by therows of A

37 / 50

Properties of the Determinant

Basic Properties|A| =

∣∣∣AT∣∣∣

|AB| = |A| |B||A| = 0 if and only if A is not invertibleIf A is invertible, then

∣∣A−1∣∣ = 1|A| .

38 / 50

∣∣∣AT∣∣∣

∣∣A−1∣∣ = 1|A| .

38 / 50

∣∣∣AT∣∣∣

∣∣A−1∣∣ = 1|A| .

38 / 50

∣∣∣AT∣∣∣

∣∣A−1∣∣ = 1|A| .

38 / 50

Outline

39 / 50

Eigenvalues and Eigenvectors

EigenvaluesAn eigenvalue λ of a square matrix A satisfies:

Au = λu

for some vector , which we call an eigenvector.

PropertiesGeometrically the operator A expands when (λ > 1) or contracts (λ < 1)eigenvectors, but does not rotate them.

Null Space relationIf u is an eigenvector of A, it is in the null space of A− λI , which isconsequently not invertible.

40 / 50

Au = λu

40 / 50

Au = λu

40 / 50

More properties

Given the previous relationThe eigenvalues of A are the roots of the equation |A− λI | = 0

Remark: We do not calculate the eigenvalues this way

Something NotableEigenvalues and eigenvectors can be complex valued, even if all the entriesof A are real.

41 / 50

More properties

Given the previous relationThe eigenvalues of A are the roots of the equation |A− λI | = 0

Remark: We do not calculate the eigenvalues this way

Something NotableEigenvalues and eigenvectors can be complex valued, even if all the entriesof A are real.

41 / 50

Eigendecomposition of a Matrix

GivenLet A be an n × n square matrix with n linearly independent eigenvectorsp1,p2, ...,pn and eigenvalues λ1, λ2, ..., λn

We define the matrices

P = (p1 p2 · · ·pn)

λ1 0 · · · 00 λ2 · · · 0

· · ·0 0 · · · λn

42 / 50

Eigendecomposition of a Matrix

GivenLet A be an n × n square matrix with n linearly independent eigenvectorsp1,p2, ...,pn and eigenvalues λ1, λ2, ..., λn

We define the matrices

P = (p1 p2 · · ·pn)

λ1 0 · · · 00 λ2 · · · 0

· · ·0 0 · · · λn

42 / 50

Properties

We have that A satisfies

AP = PΛ

In additionP is full rank.

Thus, inverting it yields the eigendecomposition

A = PΛP−1

43 / 50

Properties

AP = PΛ

A = PΛP−1

43 / 50

Properties

AP = PΛ

A = PΛP−1

43 / 50

Properties of the Eigendecomposition

We have thatNot all matrices are diagonalizable/eigendecomposition. Example(

1 10 1

)Trace (A) = Trace (Λ) =

∑ni=1 λi

|A| = |Λ| = Πni=1λi

The rank of A is equal to the number of nonzero eigenvalues.If λ is anonzero eigenvalue of A, 1

λ is an eigenvalue of A−1 with thesame eigenvector.The eigendecompositon allows to compute matrix powers efficiently:

Am =(PΛP−1)m = PΛP−1PΛP−1PΛP−1 . . .PΛP−1 =

PΛmP−1

44 / 50

1 10 1

∑ni=1 λi

|A| = |Λ| = Πni=1λi

PΛmP−1

44 / 50

1 10 1

∑ni=1 λi

|A| = |Λ| = Πni=1λi

PΛmP−1

44 / 50

1 10 1

∑ni=1 λi

|A| = |Λ| = Πni=1λi

PΛmP−1

44 / 50

1 10 1

∑ni=1 λi

|A| = |Λ| = Πni=1λi

PΛmP−1

44 / 50

1 10 1

∑ni=1 λi

|A| = |Λ| = Πni=1λi

PΛmP−1

44 / 50

Eigendecomposition of a Symmetric Matrix

When A symmetric, we haveIf A = AT then A is symmetric.The eigenvalues of symmetric matrices are real.The eigenvectors of symmetric matrices are orthonormal.Consequently, the eigendecomposition becomes A = UΛU T for Λreal and U orthogonal.The eigenvectors of A are an orthonormal basis for the column spaceand row space.

45 / 50

We can see the action of a symmetric matrix on a vector uas...

We can decompose the action Au = UΛU Tu asProjection of u onto the column space of A (Multiplication by U T ).Scaling of each coefficient 〈U i ,u〉 by the corresponding eigenvalue(Multiplication by Λ).Linear combination of the eigenvectors scaled by the resultingcoefficient (Multiplication by U ).

Final equation

Au =n∑

i=1λi 〈U i ,u〉U i

It would be great to generalize this to all matrices!!!

46 / 50

Final equation

Au =n∑

46 / 50

Final equation

Au =n∑

46 / 50

Final equation

Au =n∑

46 / 50

Final equation

Au =n∑

46 / 50

Outline

47 / 50

Singular Value Decomposition

Every Matrix has a singular value decomposition

A = UΣV T

WhereThe columns of U are an orthonormal basis for the column space.The columns of V are an orthonormal basis for the row space.The Σ is diagonal and the entries on its diagonal σi = Σii are positivereal numbers, called the singular values of A.

The action of Aon a vector u can be decomposed intoAu =

∑ni=1 σi 〈V i ,u〉U i

48 / 50

A = UΣV T

48 / 50

A = UΣV T

48 / 50

A = UΣV T

48 / 50

Properties of the Singular Value Decomposition

FirstThe eigenvalues of the symmetric matrix AT A are equal to the square ofthe singular values of A:

AT A = V ΣU T U T ΣV T = V Σ2V T

SecondThe rank of a matrix is equal to the number of nonzero singular values.

ThirdThe largest singular value σ1 is the solution to the optimization problem:

σ1 = maxx 6=0

‖Ax‖2‖x‖2

49 / 50

σ1 = maxx 6=0

‖Ax‖2‖x‖2

49 / 50

σ1 = maxx 6=0

‖Ax‖2‖x‖2

49 / 50

RemarkIt can be verified that the largest singular value satisfies the properties of anorm, it is called the spectral norm of the matrix.

FinallyIn statistics analyzing data with the singular value decomposition is calledPrincipal Component Analysis.

50 / 50

RemarkIt can be verified that the largest singular value satisfies the properties of anorm, it is called the spectral norm of the matrix.

FinallyIn statistics analyzing data with the singular value decomposition is calledPrincipal Component Analysis.

50 / 50

03 machine learning linear algebra

Engineering