international workshop on machine learning and text analytics (mlta2013)

42
BHU Banaras Hindu University 1 DST-CIMS International Workshop on Machine Learning and Text Analytics (MLTA2013) Linear Algebra for Machine Learning and I Manoj Kumar Singh DST-Centre for Interdisciplinary Mathematical Sciences(DST-CIMS) Banaras Hindu University (BHU), Varanasi-221005, INDIA. E-mail: [email protected] December 15, 2013 South Asian University (SAU), New Delhi.

Upload: shufang-chi

Post on 02-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

International Workshop on Machine Learning and Text Analytics (MLTA2013). Linear Algebra for Machine Learning and IR. Manoj Kumar Singh. DST-Centre for Interdisciplinary Mathematical Sciences(DST-CIMS) Banaras Hindu University (BHU), Varanasi-221005, INDIA. E-mail: [email protected]. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 1 DST-CIMS

International Workshop on Machine Learning and Text Analytics (MLTA2013)

Linear Algebra for Machine Learning and IR

Manoj Kumar Singh

DST-Centre for Interdisciplinary Mathematical Sciences(DST-CIMS)

Banaras Hindu University (BHU), Varanasi-221005, INDIA.

E-mail: [email protected]

December 15, 2013

South Asian University (SAU), New Delhi.

Page 2: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 2 DST-CIMS

Content

Vector Matrix Model in IR, ML and Other Area

Vector Space

- Formal definition - Linear Combination - Independence - Generator and Basis

- Dimension - Inner product, Norm, Orthogonality - Example

Linear Transformation

- Definition - Matrix and Determinant - LT using Matrix - Rank and Nullity

- Column Space and Row Space - Invertility - Singularity and Non-Singularity – Eigen

Value Eigen Vector - Linear Algebra

Different Type of Matrix And Matrix Algebra

Matrix Factorization

Applications

( ), ( )n nR R R C

Page 3: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 3 DST-CIMS

Vector Matrix Model in IR

A collection consisting of the following five documents is queried for latent semantic indexing (q):

d1 = LSI tutorials and fast tracks.

d2 = Books on semantic analysis.

d3 = Learning latent semantic indexing.

d4 = Advances in structures and advances in indexing.

d5 = Analysis of latent structures.

Rank documents in decreasing order of relevance to the

query?

Recommendation System:Item based collaborative filtering

Item1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 ?

User1 3 1 2 3 3

User2 4 3 4 3 5

User3 3 3 1 5 4

User4 1 5 5 2 1

Classification

Page 4: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 4 DST-CIMS

Blind Source Separation

12

Cocktail Party Problem

Humans are capable in steering hearing attention. So it is like identification of source of interest but BSS is about separation of sources, very near to CPP solution

What, who from where

13

Confused Computer in Cocktail Party Situation

In Multiple speaker environment microphones collect garbage …. a hotchpotch of speech !

Source

Measured

,x As [ ]ij m nA a

Page 5: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 5 DST-CIMS

Imaging Application

Figure 1. PSF of the components in FPA imaging system

InputScene

Lens

( , )opticsh x y( , )I x ySample and

HoldCircuit

WGArray

Electronics DisplayHuman

EyeOutputScene

( , )WGh x y

det ( , )h x y

( , )elech x y ( , )disph x y ( , )eyeh x y

( , )O x y

( , )shh x y

Rx

( , )Rxh x y

1f̂ H y

Page 6: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 6 DST-CIMS

Vector Space

Def.: Algebraic structure with sets and binary operations

is vector space if

( ) V,F, ,+, ,

: V V V + :F F F : F F F :F V V

, V F

1 -1 -1

i. (a b) c=a (b c) , a,b,c Vii. e V s.t. a

Associativity:Identity :

Inv

e e a=a, a V

iii. a V, a V s.t. a aerse :Commutativi

a a=eiv. a b=b a, a,b Vty:

(V, ) is Abelian Group:

* *

i. (F,+)

ii. (F , ) F F-{0}

is Abelian Group.

is Abelian Group. Where Multiplication operation is distributive oveiii. , , +:

a (b+c)=a b+a c, , Fr

a,b c

(F,+, ) is Field:

a V, a V, F (a b)= a b, a,b V, F ( ) a= a a, a V, , F ( ) a= ( a), a V, , F 1 a=a, a V, 1 is unity element

i.ii.iii.iv.v. of F

Scalar Mult. satisfy following:

Page 7: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 7 DST-CIMS

Vector Space

Note: 1. Elements of V are called as vector and F are scalar.

2. Vector do not mean vector quantity as defined in vector algebra as directed line segment.

3. We say vector space V over field F and denote it as

V(F).

A vector space is called a over field F if there is an additional operat

(V,F, ,+, , )

multiplication ofion : V V

V called and satisfying the follow

vecing

tpostulates:

a b V,

linear algebra

ors

i.

a,b V a (b c)=(a b) c, a,b,c V

a (b+c)=(a b)+(a c), a,b,c V (a b)=( a) b, a,b V, FIf there is an element 1 in V such that 1 a=a 1=a, a V,then V is linear alg

ii. ii

ebar with ide

i.iv.

n

tity. And 1 is called as

the identity of V.

Algebra V(F) is if:

a b=

commuati

b a,

t

a,b V

ive

Linear Algebra:

Page 8: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 8 DST-CIMS

Vector Space

Linear Combination:

1 1 2 2 n n 1 2 n

1 2 n

a = α a +α a +....+α a , where α ,α ,..α F is called

linear combinat

of the vectors a ,a ,...,ai n .o

V(F) be a vector space. Any vector

Subspace :

Generator:

V(F) be vector space. W V is called a subspace of V if W(F) is itself a vector space

w.r.t. operation in V(F).

be a vector space and . If is a subspace of containing S and is contained in every subspaV(F) S V U V

V S U smallest subspacece containg , then is of containing . This subspace U of containing

VS V i S

s called as subspace of V generated or spanned by S and denoted as [S], i.e U=., [S].

e.g. 3W={(x,2y,3z): x,y,z R} is subpsace of R (R).

3W={(x,y,0): x,y R} is subpsace of R (R).

be vector space of all matrices and be an V(F) 1 A F W={x V: Ax=o matrix i over . s subsp a} ce of V. n m n

Page 9: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 9 DST-CIMS

Vector Space

Linear Span:

1 1 2 2 3 3 1 2 3,

be a vector space and . Linear Span of S, is the set of all linear combinations

of finite sets of elements of S

a a a ... a whe

V(F) S( )

re { , , ,.., }

V L(S),

F and

.

L(S) = n n n

1 2 3, , , } .{a a a . S.,a n

Note: is subspace of andL(S) V(F) L(S)= [S].

Linear Dependence (LD):

1 21 2

1 21 2

, ,.., F

V(F) be vector space, and {a ,a ,...,a } V is said to be LD if s.t.

+ + 0; and some 0.a a ..+ ann

n in

Linear Independence (LI):

1 2

1 21 2 F, 1

0, 1 .

V(F) be vector space, and {a ,a ,...,a } V is said to be LI if

+ + 0, a a ..+ a

i

n

n in i n

i n

Basis: i) is basis of vector space , if S consists of LI elements ii. VS V(F =[S]) V(F =L)) (S).

Dimension: V(F) is said to be finite dimensonal if finite subset S V such that V=L(S)=[S].Number of elemnet in the basis of the finite dimensonal V(F) is the dimenson of V(F).

e.g. 31 2S ={(1,0,0),(0,1,0),(0,0,1)} and S ={(1,0,0),(1,1,0),(1,1,1)} are basis of R (R).

Page 10: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 10 DST-CIMS

Vector Space

Inner Product

An inner product on vector space V(R/C) is a functiion < >:V×V R/C, which assigns each ordered pair of vectors a,b in V a scalar <a, b> such that

i. <a,b>= <a,b> [< > denote complex conjugate]ii.<α

a+βb,c>=α<a,c>+β<b,c>iii. <α,α> 0 and <α,α>=0 α=0

V=C[a,b]. Then inner product: x,y ( ) ( )b

ax t y t dt

1 2 1 2

1 1 2 2

V=R . Then inner product of x=(x ,x ,..,x ), y=(y ,y ,..,y ) is given as

x,y x y ,+x y +..+x

nn n

n ny

Norm / Length: Lenght of a vector in V(F): x = <x,x>

Distance: Distance between two vectors x, y in V(F): d(x,y)= x-y x-y,x-y

Note: (V,d) is metric space.

Orthogonality: (V,<>) is inner product and let x, y V. Vectors x and y are said to be orthogonal to each other if : <x,y>=0

Gram-Schmid

Orthogonal set ( 0) LI ; LI Orthogonality; LI Orthogonality t

Orthogonality: Orthogonal x 1

Page 11: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 11 DST-CIMS

Linear Transformation

Definition (LT): U(F) and V(F) be two vector spaces, a from is

function T:U V such that:

T( x+ y)=

Linear Trnsfor

T(x)

mation

+ T(y

U into

); , F and x,y U

V

Linear Operator:

Range Space of LT: T:U(F) V(F) is LT. The range space of T, R(T), is given as follows:

R(T)={T(x) V: x U}

Null Space of LT: T:U(F) V(F) is LT. The null space of T, N(T), is given as follows:

N(T)={x U: T(x)=0 V}

1. R(T) V is subspace of V, N(T) U is subspace of U.

2. If U(F) is finite dimensonal, then R(T) also finite dimensonal.

Note:

Rank and Nullity of LT:

on is function T:V V such that:

T( x+ y)

Linear

= T(x

V

)

(F

+ T(y); , F and x,y

Operato )r

U

1. Rank:

2. Nullity:

Dimenson of range space of LT. (T)=dim(R(T)).

Dimenson of null space of LT. (T) =dim(N(T)).

Note: (T)+ (T)=dim(U) T:U(F) V(F),

Non-Singular Transform: T:U V is if N(T)={0}, i.e., xA LT U non-si and T(ng x)u =la 0r x =0

Singular Transform: T:U V is if x 0 UA LT s suchingu thalar t T(x)=0

Page 12: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 12 DST-CIMS

Matrices

Definition: A set of elements of any field F arranged in the form of a rectangular array having

rows columns is called an matrix over the field F.

mn

m n m n

Unit / Identity Matrix:

then matrix called as square matrix.

a for which constitute principal diagonal.ij

m n A

i j

1 0 00 1 0 1,

I , , [ ]0,

0 0 1

ij ij n n

i ja A

i j

Diagonal Matrix:

11 12 1n

21 22 2n

1 2 mn

a a a a

, [ ] ;

a a

ij m n

m m m n

aa

A A a

a

Square A=[a ] for which a 0 for .ij n n ij i j

Scalar Matrix: A iDia

s agonal

ny matMatrix A=[a ] for

rix awhich a

nd S is sclar matrix then SA = AS for .

= kA

ij n n ii k i j k 0 00 k 0

S

0 0 kn n

1

3 3 3

d 0 0D 0 0 0

0 0 d

Page 13: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 13 DST-CIMS

Matrices

Upper Triangular Matrix: Square matrix A=[a ] is upper triangular, if a 0 whenever .ij n n ij i j

Lower Triangular Matrix: Square matrix A=[a ] is lower triangular if a 0 whenever .ij n n ij i j

Symmetric :

11 12 13 1n

22 23 2n

33 3n

nn n n

a a a a0 a a a0 0 a a

0 0 0 a

A

11

21 22

31 32 33

n1 n2 n3 nn n n

a 0 0 0a a 0 0a a a 0

a a a a

A

Square matrix A=[a ] is symmetric if a a , , .ij n n ij ji i j

3 3

a b cD b e d

c d f

Skew Symmetric: Square matrix A=[a ] is skey symmetric if a a , , .ij n n ij ji i j 3 3

0 h gD -h 0 f

-g -f 0

Page 14: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 14 DST-CIMS

Transpose :

Matrices

Trace :

T

A=[a ] , the n m matrix obtained from A by changing its rows into columns and column into rows is called the transpose of A, and is denoted by A' or A .

ij m n

T

3 44 3

1 2 3 1 2 3 4 2 3 4 2 3 4 1 , 3 4 23 4 2 1 4 1 1

A A

1

A=[a ] square matrix. The sum of the main diagonal element of A is the trace of the

matrix. tr(A) = n

iii

ij n n

a

If A=[a ] , B=[b ] , then C= A B is defined as c =a b ; .

k A=A k [k

Addition:

Scalar Mult.:

(M,+) is abelian group

no.Multiplic

a ]

A=[a ] , B=[b ] , A B is possible ation: w e h n

ij m n ij m n ij ij ij

ij m n

ij m n ij n p

1

in A is equal to

in B. A B is matrix C = [c ] such that:

of column no. of

rows

c a b

ik n p

n

ik ij jkj

n p

11 12 1n

21 22 2n

1 2 mn

a a a a

a am m m n

aa

A

a

Row /Column Vector Representation of Matrix:

1 2 3 , row of matrix is denoted by vector r (a ,a ,a , ,a )thi i i i i ni

1 2 3 column of matrix is denoted by vector c (a ,a ,a , ,a )thi i i i mii

1

21 2 3

rr c ,c ,c , ,c

rn m n

m m n

A

1 2 1 2row vectors r ,r , ,r V (F) and column vectors c ,c , ,c V (F)n mm m

Page 15: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 15 DST-CIMS

Matrices

Row Space And Row Rank of Matrix :

1 2Let R={ r ,r , ,r } the linear span L(R) V (F) is called as row space Row rank of the maof the matrix.

(A) = dim(L(R)) d

trix

im(V (F))= .

nm

nr n

1 2Let C={ c ,c , ,c } the linear span L(C) V (F) is called as column spac Col.

rank of the matrix

e of the matrix.

(A)=dim(C(R)) dim(V (F))= .

mm

mc m

Column Space And Column Rank of Matrix :

Rank of Matrix : (A) min( (A), (A)).r c

(A)= ?r n 1 2{r ,r ,...,r } is LIm

(A)= ?c m 1 2{c ,c ,...,c } is LIm

Determinant of Square Matrix:

1 2

1 2 1 2

Let be a scalar function(not vector or a matrix function) of x x

x x x x

, , x , called the determinant of A, satisfying the following conditions: ( , , , , x ) ( , , i) , , ,x ), where is s.

n

i n i n

f

f cx cf x c

1 2 1 2x x x x

calar.This condition means that if any row is multiplied by a scalar then it is equivalent to multyplying thewhole determinant by the scalar. ( , ,ii) . , + , x ) ( , , , , x , ,x ). Ifi i j n i j nf x cx x f x

1 2 1 2x x x x

scalr

multiple of ith row (col.) is added to the jth row (col.) the value of the determinant remains the same. x is written as sum of two vectors, x y z , then ( , , ,y z , x ) (iii). , , i i i i i i nf f

1 2

1 2

x x

,y , ,x )( , , , , ,x ). This means that if the i-th row (col.) is split as sum of two vectors(col.),y z ,

then the determinant becomes sum of two determinants. (e ,e , ,e , ,e ) 1, w eri hv).

i n

i n i i

i n

f zf

1 2e e ,e , ,e , ,e are the basic unit vectors. This condition says that the determinant of identity matrix is 1.i n

Page 16: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 16 DST-CIMS

Determinant

The conditions (i)-(iv) are called the postulates define the determinant of a square matrix. Standard

notation: |A|, det(A)= determinant of A.

Some Properties of Determinant:11 12

21 22

i. Determinant of a 2 2 matrix A= is |A|=ad-bc.

ii. The det. of square null matrix is zero. Determinant of a square matrix with one or more rows or column null is zero.iii. The de

a aa a

terminant of a diagonal matrix is the product of the diagonal elements. iv. The determinant of traingular matrix is the product of the diagonal elements.v. If any two row (col.) are interchange then the value of determinant of the new matrix is -1 times the value of original determinat.v. The value of determinant of a matrix of real numbers can be negative, positive, or

From zero.

postulate (ii) the value of determinant remains the same if any multiple of any row (col.) added to any other row (col.). Thus if one or more rows (col.) are LD on other rows (col.) then these dependent vi. |A| 0 iff all rows (col.) form

rows (col.) can be made null be LI set of vectors. And hence

Then t

For matrices A and B: |AB|

linea he determinant isr o

=|A|

zeperations. ro. (A)= (A)= .

vii. r c

n n

n

1

1 1 1 1 1 1

|B|

viii. A be any nxn matrix. Then matrix B, if exists, such that, AB=BA =I , denoted as B=A .

1ix. AA A A=I |AA | | A A|=|I|=1 |A||A | 1 |A

|

|A |

n

Page 17: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 17 DST-CIMS

Cofactor Expansion

Minors:

A=[a ] be m n matrix. Delete m rows and n columns, m<n. The determinant of the resulting submatrix is

called a minor. If the ith row and jth columns ar deleted then the determinant of the resulting sij

ubmatrix is called the minor of a .ij

11 22

2 0 1 2 4 1 4A= 1 2 4 then minor of a , minor of a1 5 0 50 1 5

Leading Minors:If the submatrices are formed by deleting the rows and columns from 2nd onward, from the 3rd onward,

and show on then the corresponding minors are called the leading minors.

2, 2 0 12 0 , 1 2 41 2 0 1 5

Cofactors :

aLet A=[a ] be nxn matrix. The cofactor of a is defined as (-1) times the minor . That is, if the cofactor

and minor of a is denoted by C and M respectively then:

ij

i j

ij ij

ij ij ij

C ( 1) Mij

i jij

Page 18: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 18 DST-CIMS

Cofactor Expansion

Evaluation of Determinant:

11 12 111 12 1

11

Let A=[a ] be nxn matrix, and cofactor and minor of a is denoted by C and M Then

C C C

.

A a a a

a

ij ij ij ij

nn

11 12 1

1 2

1 2

112 1

1 2

1 21 2

C C C

M a M ( 1) a M

a a a

a ( 1) M a ( 1) M ( 1) a M ; for i=1,2, n

n

i i in

i I in

nn

i i in

i i i ni i in

Cofactor Matrix:

11 12 1

21 22 2

1 2

Let A=[a ] be nxn matrix, and cofactor of a is denoted by C Then the cofactor matrix of A, cof(A):

cof(A)=

.

|C | |C | |C ||C | |C | |C |

|C | |C

ij ij ij

n

n

n n

| |C |nn

Inverse of Matrix:

1 T

Let A=[a ] be nxn matrix, the inverse of A, if it exist, is given by:

A

1[cof(A)] , |A| 0

| A |

ij

Singular and Non Singular Matrix:

A sqaure matrix A =[a] is said to be non-singular or singular according as |A| 0 or |A|=0 n n

Page 19: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 19 DST-CIMS

Cofactor Expansion

Invertbility of Matrix:

Rank of Matrix:A number r is said to be the rank of a matrix A if it possesses the following two properties:

There is at least one square submatrix of A of size rxr whose det. is not zero.

ii. If matrix contain any

i.

square submatrix of size (r+1)x(r+1), then the det. of every such square matrix

must be zero.

Following are equivalent statement:1 1 1

1 2 1 2

A exist AA A A I |A|=0 A is non-singular (A)=n (A)=n (A)=n

R={r r r } LI C={c ,c } is LI., , ,c ,r c

n n

Page 20: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 20 DST-CIMS

LT using Matrix

1 2 1 2T:U(F) V(F) is LT, B = { , , , } and B' = { , , , } be ordered bases for U and V. Then

B, each of the n vectors T( ) V is uniquely expressed as linear combination of elements

of B'. T(

n m

i j

j

1 1 2 21

1 111 21 1

21 22 m22 2

1 2 mn

)=a a a T( )= i.e.

T( ) a a a aT( )

= ; T [T; B; B'] matrix of

a aT( )

m

j j mj m j ij ii

m

m m m nn m

a

aa

a

T relative to B, B'.

Example: 2

1 2

Let T be a LT on vector space V (F) be defined by T(a,b)=(a,0). Find matrix of T relative to

standard bases B={e , e }={(1,0),(0,1)}

1 1 2 1 1

2 1 2 2 2

B

T(e )=T(1,0)=(1,0)=1(1,0)+0(0,1)=1e 0e T(e ) e1 00 0T(e )=T(0,1)=(0,0)=0(1,0)+0(0,1)=0e 0e T(e ) e

1 0The matrix of T relative to ordered basis B =T =[T;B]=0 0

.

Page 21: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 21 DST-CIMS

Eigen Value and Eigen Vector

Let T:V(F) V(F) be LT. The scalar c F is called a of T if x(=0) V, such that

T(x)=cx

Then is called as

eigen value

vector x ei corresponding togen vector

eigen value c.

Eigen Value and Eigen Vector of LT:

T(x)=cx T(x)=cI(x), where I is identity transform. T(x)-cI(x)=0 (T-cI)(x)=0 T'(x)=

x V such tha

0.,

whe

t T'(x)

re T' is LT an

=0 T' is sin

d T'=

gular. det(T')=0

T-cI.

Eigen Value and Eigen Vector of Matrix:

Let A be nxn matrix. Consider Eq.

Ax= x

where is scalar and x is an nx1 vector. Null vector is trivial solution of this equation. If the equation has

solution for a and

for a non-null x then is called an of A.

And the Non-null x satisfying equation for that particular is call

eigenvalue or characteristic or latent

ed eigenvector or characteristic vector

roo

o

t

la

r

tent vector corresponding to that eigenvalue .

Ax= x Ax= I(x) (A- I)x=0 is homogeneous linear equation have non-null solution

A- I is singular A- I 0

Page 22: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 22 DST-CIMS

Eigen Value and Eigen Vector

Properties:

i. The eigenvalues of a diagonal values of a matrix are its diagonal element.

ii. The eigenvalues of a triangular (upper or lower) matrix are its diagonal elements.

iii. The eigenvalues of a scalar matr

1 2 3

ix with the diagonal elements c each are c repeated n times.

iv. The eigenvalues of a Identity matrix are 1 repeated n times.

v. |A|=

vi. Matrix A is sigular if atleast its one eigenvalue in

1 2

11 22 1 2 3

A and A

If x , x

s zero.

vii. tr(A)=a + a + +a

viii. have the same eigenvalues.

ix. The eigenvectors corresponding different eigenvalue are LI.

x. are two eigenvector corresponding

.T

nn n

1 1 2 2 same eigenvalue then c is also

eigenvector for same eigenvalue.

xi. Eigen value of real symmetric matrix is real.

x. Eigenvectors corresponding different eigenvalue of real symmetric matr

x c x

ix are orthogonal.

Page 23: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 23 DST-CIMS

Similarity of Matrix

1

Let A and B be square matrices of order n. Then B is said to similar to A if there exists a non-

singular matrix P such that

B = P AP.

Diagonalizable Matrix:

Def.

1

A matrix A is said to be diagonalizable if it is similar to a diagonal matrix. Thus A diagonalizable

if there exists an invertable matrix P such that

P AP=D, wher e D is a diagonal matrix.

1. Similarity is equivalence relation.

2. If matrix A is similar to diagonal matrix D, then diagonal elements of D are eigenvalues

Note:

of A.

i. A nxn matrix is diagonalizable iff it possesses n LI eigenvector.

ii. If eigenvalues of an nxn matrix are all distinct then it is always similar to diagonal matrix.

iii. Two nxn matrices with t1 1

Spectral Decomposition for Symmetric

he same set of n distinct eigenvalues are similar.

iv. P AP=D A=PDP is EVD.

v. ( ): Square symmetric matrix A can be expressed

in terms of its

Mat

ei

rix

genva

1 1 1 2 2 2 3 3 3

lue-eigenvector pairs ( ,e ) as

A= e e e e e e e e

i i

T T T Tn n n

Page 24: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 24 DST-CIMS

Singular Value Decomposition

Similarity of Matrix

A singular value and corresponding singular vectors of a rectangular matrix A are, respectively, a scalar σ and a

pair of vectors u and v that satisfyTAv= u and A u= v

With the singular values on the diagonal of a diagonal matrix Σ and the corresponding singular vectors forming

the columns of two orthogonal matrices U and V, we have :TAV= U and A U= V

Since U and V are orthogonal, this becomes the singular value decomposition: TA=U V

Def.: TEvery mxn matrix A can be written A =U V where U is mxm, V is nxn orthogonal matrices and

is mxn diagonal matrix.

T T T T 2 T T 2 T

1. Diagonal Element of termed as singular values of A.

2. Using SVD directly we get A =(U V ) (U V )=V A =U

Columns of U and V represent

Note

the eigenvect

:

r

o s

A V and A U .

T T

2

of AA and A A respectively, and the diagonal

entries of represent their set of eigenvalues.

Page 25: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 25 DST-CIMS

LU Factorization:

Similarity of Matrix

Cholesky Factorization:The Cholesky factorization expresses a symmetric matrix as the product of a triangular

matrix and its transpose.TA R R

where R is an upper triangular matrix. Not all symmetric matrices can be factored in this way; the matrices that

have such a factorization are said to be positive definite. The Cholesky factorization allows the linear system:

to be replaced by to form triangular system of equation. Solved easily by forward and

backward substitution.

Ax=b TR Rx b

LU factorization, or Gaussian elimination, expresses any square matrix A as the product

of a permutation of a lower triangular matrix and an upper triangular matrix

A=LU

where L is a permutation of a lower triangular matrix with ones on its diagonal and U is an upper triangular ma-trix. 1 1 1

11 22A L U U u u u and A U Lnn

QR Factorization: The orthogonal, or QR, factorization expresses any rectangular matrix as the product

of an orthogonal or unitary matrix and an upper triangular matrix.

A=QRwhere Q is orthogonal or unitary, R is upper triangular.

Page 26: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 26 DST-CIMS

APPLICATION

Documents Ranking

Page 27: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 27 DST-CIMS

Rank documents in decreasing order of relevance to the query?

Documents Ranking

A collection consisting of the following five documents:

d1 = LSI tutorials and fast tracks. d2 = Books on semantic analysis. d3 = Learning latent semantic indexing.

d4 = Advances in structures and advances in indexing. d5 = Analysis of latent structures.

queried for latent semantic indexing (q).

Decreasing order of cosine similarities

Assume that:

1. Documents are linearized, tokenized, and their stop words removed. Stemming is not used. Survival terms

are used to construct a term-document matrix A. This matrix is populated with term weights :ij ij i ja L G N

, where frequency of term, ,in document . This is so-called FREQ model.

log( / ), where is the collection size and is the number of documents conatining term .

This is so calle

ij ij

i i i

L f i j

G D d D d i

2

d IDF model. IDF Satand for Inverse Document Frequency.

1 / l; i.e. document lengths are normalized to 1/l. In general, l is the so called L norm or Frobenius

length.

jN

log( / ) .ij ij i ja f D d N

Page 28: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 28 DST-CIMS

Documents Ranking

Query terms are scored using FREQ; i.e., , where is the frequency of term in the query q.2. iq iq iq iqa L f f i

Procedure:

n

n

Tn n

Compute A and q.

2. Normalize the document vectors query vector.

A A

3. Compute q

where n denotes normalized vector.

1.

q q

A .

d1 d2 d3 d4 d5

LSI 1 0 0 0 0Tutorials 1 0 0 0 0fast 1 0 0 0 0tracks 1 0 0 0 0books 0 1 0 0 0semantic 0 1 1 0 0analysis 0 1 0 0 1learning 0 0 1 0 0latent 0 0 1 0 1indexing 0 0 1 1 0advances 0 0 0 2 0structures 0 0 0 1 1

d1 = LSI tutorials and fast tracks.

d2 = Books on semantic analysis.

d3 = Learning latent semantic indexing.

d4 = Advances in structures and advances in indexing.

d5 = Analysis of latent structures.

Documents in collection:Term-Document Matrix

Page 29: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 29 DST-CIMS

Documents Ranking

Step1: Weight Matrix

d1 d2 d3 d4 d5

1log(5/1) 0 0 0 0

1log(5/1) 0 0 0 0

1log(5/1) 0 0 0 0

1log(5/1) 0 0 0 0

0 1log(5/1) 0 0 0

0 1log(5/2) 1log(5/2) 0 0

0 1log(5/2) 0 0 1log(5/2)

0 0 1log(5/1) 0 0

0 0 1log(5/2) 0 1log(5/1)

0 0 1log(5/2) 1log(5/1) 0

0 0 0 2log(5/1) 0

0 0 0 1log(5/1) 1log(5/1)

A= =

d1 d2 d3 d4 d5

0.6990 0 0 0 0

0.6990 0 0 0 0

0.6990 0 0 0 0

0.6990 0 0 0 0

0 0.6990 0 0 0

0 0.3979 0.3979 0 0

0 0.3979 0 0 0.3979

0 0 0.6990 0 0

0 0 0.3979 0 0.6990

0 0 0.3979 0.6990 0

0 0 0 1.3980 0

0 0 0 0.6990 0.6990

d1

0

0

0

0

0

1

0

0

1

1

0

0

q=

Page 30: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 30 DST-CIMS

Documents Ranking

Step2: Normalization:

An=

d1 d2 d3 d4 d5

0.5000 0 0 0 0

0.5000 0 0 0 0

0.5000 0 0 0 0

0.5000 0 0 0 0

0 0.7790 0 0 0

0 0.4434 0.4054 0 0

0 0.4434 0 0 0.5774

0 0 0.7121 0 0

0 0 0.4054 0 0.5774

0 0 0.4054 0.2640 0

0 0 0 0.9277 0

0 0 0 0.2640 0.5774

d1

0

0

0

0

0

0.5774

0

0

0.5774

0.5774

0

0

qn=

2Frobenius norm(L norms, Euclidean lengths) of documents:

Tn =q 0 0 0 0 0 0.5774 0 0 0.5774 0.5774 0 0

Page 31: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 31 DST-CIMS

Documents Ranking

Step3: Compute AnTnq

An=Tnq

d1 d2 d3 d4 d5

0 0.2560 0.7022 0.1524 0.3334

Documents rank as follows: 5 43 2 1d d d d d

Explain any difference in computed results.

Exercises

1. Repeat the above calculations, this time including all stopwords. Explain any difference in computed re-sults.

2. Repeat the above calculations, this time scoring global weights using IDF probabilistic (IDFP):log(( ) / )i iG D d d

Page 32: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 32 DST-CIMS

APPLICATION

Latent Semantic Indexing (LSI)

Using SVD

Page 33: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 33 DST-CIMS

Latent Semantic Indexing

Use of LSI to cluster term, and find the terms that could be used to expand or reformulate the query.

d1 = Shipment of gold damaged in a fire.

d2 = Delivery of silver arrived in a silver truck.

d3 = Shipment of gold arrived in a truck.

Example: Collection consist of following documents:

SVD

Every matrix A of dimensions m n m n  can be decomposed as : A=U V

where

U has dimension m m, and col. are orthogonal, ie. UU

has dimension m n, the only non-zero elements are on main daia

- U U I .

-

T

T Tm m

gonal.

- V has dimension n n and its col. are orthogonal, i.e. VV

A U V

U is m p, with orthogonal col.

is p p, and diagonal.

- V n p with orth

I

-

-

T

T Tn n

p p p

p

p

p

V V

ogonal col.

Assume that the query is gold silver truck.

Page 34: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 34 DST-CIMS

Latent Semantic Indexing (Procedure)

Step1: Score term weights and construct the term – document matrix A and query matrix.

d1 d2 d3

a 1 1 1arrived 0 1 1damaged 1 0 0delivery 0 1 0fire 1 0 0gold 1 0 1in 1 1 1of 1 1 1shipment 1 0 1silver 0 2 0truck 0 1 1

1 1 10 1 11 0 00 1 01 0 01 0 11 1 11 1 11 0 10 2 00 1 1

A= q=

00000100011

Page 35: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 35 DST-CIMS

Latent Semantic Indexing (Procedure)

Step2-1: Decompose matrix A using SVD procedure into U, S and V matrices.

1 1 10 1 11 0 00 1 01 0 01 0 11 1 11 1 11 0 10 2 00 1 1

A=

A=U V

U= =

-0.49447 -0.64918 -0.57799

-0.64582 0.719447 -0.25556

-0.58174 -0.24691 0.774995

V=

Page 36: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 36 DST-CIMS

Latent Semantic Indexing (Procedure)

Step2-2: Decompose matrix A using SVD procedure into U, S and V matrices. A=U V

U= =-0.49447 -0.64918 -0.57799

-0.64582 0.719447 -0.25556

-0.58174 -0.24691 0.774995

V=

-0.42012 -0.0748 -0.04597

-0.29949 0.200092 0.407828

-0.12063 -0.27489 -0.4538

-0.15756 0.304648 -0.20065

-0.12063 -0.27489 -0.4538

-0.26256 -0.37945 0.154674

-0.42012 -0.0748 -0.04597

-0.42012 -0.0748 -0.04597

-0.26256 -0.37945 0.154674

-0.31512 0.609295 -0.40129

-0.29949 0.200092 0.407828

4.098872 0 0

0 2.361571 0

0 0 1.273669

Step3: Rank 2 Approximation :

4.098872 0

0 2.361571

-0.42012 -0.0748

-0.29949 0.200092

-0.12063 -0.27489

-0.15756 0.304648

-0.12063 -0.27489

-0.26256 -0.37945

-0.42012 -0.0748

-0.42012 -0.0748

-0.26256 -0.37945

-0.31512 0.609295

-0.29949 0.200092

Uk= k =-0.49447 -0.64918

-0.64582 0.719447

-0.58174 -0.24691

Vk=

Page 37: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 37 DST-CIMS

Latent Semantic Indexing (Procedure)

Step 4: Find the new term vector coordinates in this reduced 2-dimensonal space.

Rows of U holds eigenvector values. These are coordinates of the individual term vectors. Thus from the

reduced matrix (Uk) :1 a -0.42012 -0.0748

2 arrived -0.29949 0.200092

3 Damaged -0.12063 -0.27489

4 delivery -0.15756 0.304648

5 fire -0.12063 -0.27489

6 gold -0.26256 -0.37945

7 in -0.42012 -0.0748

8 of -0.42012 -0.0748

9 shipment -0.26256 -0.37945

10 silver -0.31512 0.609295

11 truck -0.29949 0.200092

Step 5: Find the new query vector coordinates in the reduced 2-dimensional space. UsingT 1

kq=q U Sk

-0.42012 -0.0748

-0.29949 0.200092

-0.12063 -0.27489

-0.15756 0.304648

-0.12063 -0.27489

-0.26256 -0.37945

-0.42012 -0.0748

-0.42012 -0.0748

-0.26256 -0.37945

-0.31512 0.609295

-0.29949 0.200092

0 0 0 0 0 1 0 0 0 1 1q= 10

4.09891

02.3616

= [-0.2140 -0.1821 ]

Page 38: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 38 DST-CIMS

Latent Semantic Indexing (Procedure)

Step 6: Group terms into clusters

Grouping is done by comparing cosine angles between any two pair of vectors.

The following clusters are obtained:1. a, in of2. gold, shipment3. damaged, fire4. arrived, truck5. silver6. delivery

Some vectors are not shown since these are completely

superimposed. This is the case of points 1 – 4.

If unit vectors are used and small deviation ignored, clusters

3 and 4 and clusters 4 and 5 can be merged.

Page 39: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 39 DST-CIMS

Latent Semantic Indexing (Procedure)

Step 7: Find terms that could be used to expand or reformulate the query

The query is gold silver truck. Note that in relation to the query, clusters 1, 2 and 3 are far away from

the query. Similarity wise these could be viewed as belonging to a “long tail”. If we insist in combin-

ing these with the query, possible expanded queries could be

gold silver truck shipment gold silver truck damaged

gold silver truck shipment damaged gold silver truck damaged in a fire

shipment of gold silver truck damaged in a fire etc…

Looking around the query, the closer clusters are 4, 5, and 6. We could use these clusters to expand

or reformulate the query. For example, the following are some of the expanded queries one could

test.

gold silver truck arrived delivery gold silver truck

gold silver truck delivery gold silver truck delivery arrived etc…

Documents containing these terms should be more relevant to the initial query.

Page 40: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 40 DST-CIMS

APPLICATION

Latent Semantic Indexing (LSI)

Exercise

Page 41: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 41 DST-CIMS

Latent Semantic Indexing (Exercise)

The svd was the original factorization proposed for Latent Semantic Indexing (LSI), the process of replacing

a term-document matrix A with a low-rank approximation Ap which reveals implicit relationships among

documents that don’t necessarily share common terms. Example:

Term D1 D2 D3 D4 D5

twain 53 65 0 30 1

clemens 10 20 40 43 0

huckleberry 30 10 25 52 70

A query on clemens will retrieve D1, D2, D3, and D4.

A query on twain will retrieve D1, D2, and D4.

For p = 2, the svd gives Term D1 D2 D3 D4 D5

twain 49 65 7 34 -5

clemens 23 22 14 30 21

huckleberry 25 9 34 57 63

Now a query on clemens will retrieve all documents.

A query on twain will retrieve D1, D2, D4, and possibly D3.

The negative entry is disturbing to some and motivates the nonnegative factorizations.

Page 42: International Workshop on Machine Learning and Text Analytics (MLTA2013)

BHU Banaras Hindu University 42 DST-CIMS

References

1. Linear Algebra –I module 1, Vector and Matrices, by A.M. MATHAI, Centre for Mathematical Sciences (CMS) Pala.

2. Linear Algebra –II module 2, Determinants and Eigenvalues by A.M. MATHAI, Centre for Mathematical Sciences (CMS) Pala.

3. Introduction to Linear Algebra, Wellesley – Cambridge Press, 1993.

4. Matrix Computation, C. Golub and C. Van Loan, Johns Hopkins University Press, 1989.

5. Linear Algebra, A. R. Vasishtha and J.N. Sharma, Krishana Prakashan.

6. Matrices, A. R. Vasishtha and J.N. Sharma, Krishana Prakashan.

7. Linear Algebra, Ramji Lal, Sail Publication, Allahabad.

8. An Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, Hinrich Schutze,

Cambridge University Press.