10-603/15-826a: multimedia databases and data mining svd - part i (definitions) c. faloutsos

60
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Post on 21-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

10-603/15-826A:Multimedia Databases and Data

Mining

SVD - part I (definitions)

C. Faloutsos

Page 2: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 2

Outline

Goal: ‘Find similar / interesting things’

• Intro to DB

• Indexing - similarity search

• Data Mining

Page 3: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 3

Indexing - Detailed outline• primary key indexing• secondary key / multi-key indexing• spatial access methods• fractals• text• Singular Value Decomposition (SVD)• multimedia• ...

Page 4: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 4

SVD - Detailed outline• Motivation• Definition - properties• Interpretation• Complexity• Case studies• Additional properties

Page 5: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 5

SVD - Motivation• problem #1: text - LSI: find ‘concepts’• problem #2: compression / dim. reduction

Page 6: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 6

SVD - Motivation• problem #1: text - LSI: find ‘concepts’

Page 7: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 7

SVD - Motivation• problem #2: compress / reduce

dimensionality

Page 8: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 8

Problem - specs

• ~10**6 rows; ~10**3 columns; no updates;

• random access to any cell(s) ; small error: OK

Page 9: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 9

SVD - Motivation

Page 10: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 10

SVD - Motivation

Page 11: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 11

SVD - Detailed outline• Motivation• Definition - properties• Interpretation• Complexity• Case studies• Additional properties

Page 12: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 12

SVD - Definition(reminder: matrix multiplication

1 2

3 45 6

x 1

-1

3 x 2 2 x 1

=

Page 13: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 13

SVD - Definition(reminder: matrix multiplication

1 2

3 45 6

x 1

-1

3 x 2 2 x 1

=

3 x 1

Page 14: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 14

SVD - Definition(reminder: matrix multiplication

1 2

3 45 6

x 1

-1

3 x 2 2 x 1

=

-1

3 x 1

Page 15: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 15

SVD - Definition(reminder: matrix multiplication

1 2

3 45 6

x 1

-1

3 x 2 2 x 1

=

-1

-1

3 x 1

Page 16: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 16

SVD - Definition(reminder: matrix multiplication

1 2

3 45 6

x 1

-1=

-1

-1-1

Page 17: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 17

SVD - Definition

A[n x m] = U[n x r] r x r] (V[m x r])T

• A: n x m matrix (eg., n documents, m terms)

• U: n x r matrix (n documents, r concepts)• : r x r diagonal matrix (strength of each

‘concept’) (r : rank of the matrix)• V: m x r matrix (m terms, r concepts)

Page 18: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 18

SVD - Definition• A = U VT - example:

Page 19: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 19

SVD - PropertiesTHEOREM [Press+92]: always possible to

decompose matrix A into A = U VT , where• U, V: unique (*)• U, V: column orthonormal (ie., columns are

unit vectors, orthogonal to each other)– UT U = I; VT V = I (I: identity matrix)

• : eigenvalues are positive, and sorted in decreasing order

Page 20: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 20

SVD - Example• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

datainf.

retrieval

brain lung

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=CS

MD

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 21: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 21

SVD - Example• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

datainf.

retrieval

brain lung

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=CS

MD

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

CS-conceptMD-concept

Page 22: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 22

SVD - Example• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

datainf.

retrieval

brain lung

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=CS

MD

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

CS-conceptMD-concept

doc-to-concept similarity matrix

Page 23: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 23

SVD - Example• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

datainf.

retrieval

brain lung

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=CS

MD

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

‘strength’ of CS-concept

Page 24: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 24

SVD - Example• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

datainf.

retrieval

brain lung

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=CS

MD

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

term-to-conceptsimilarity matrix

CS-concept

Page 25: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 25

SVD - Example• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

datainf.

retrieval

brain lung

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=CS

MD

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

term-to-conceptsimilarity matrix

CS-concept

Page 26: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 26

SVD - Detailed outline• Motivation• Definition - properties• Interpretation• Complexity• Case studies• Additional properties

Page 27: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 27

SVD - Interpretation #1‘documents’, ‘terms’ and ‘concepts’:• U: document-to-concept similarity matrix• V: term-to-concept sim. matrix• : its diagonal elements: ‘strength’ of each

concept

Page 28: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 28

SVD - Interpretation #2• best axis to project on: (‘best’ = min sum of

squares of projection errors)

Page 29: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 29

SVD - Motivation

Page 30: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 30

SVD - interpretation #2

• minimum RMS error

SVD: givesbest axis to project

v1

Page 31: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 31

SVD - Interpretation #2

Page 32: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 32

SVD - Interpretation #2• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

v1

Page 33: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 33

SVD - Interpretation #2• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

variance (‘spread’) on the v1 axis

Page 34: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 34

SVD - Interpretation #2• A = U VT - example:

– U gives the coordinates of the points in the projection axis

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 35: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 35

SVD - Interpretation #2• More details• Q: how exactly is dim. reduction done?

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 36: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 36

SVD - Interpretation #2• More details• Q: how exactly is dim. reduction done?• A: set the smallest eigenvalues to zero:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 37: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 37

SVD - Interpretation #2

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

~9.64 0

0 0x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 38: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 38

SVD - Interpretation #2

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

~9.64 0

0 0x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 39: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 39

SVD - Interpretation #2

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18

0.36

0.18

0.90

0

00

~9.64

x

0.58 0.58 0.58 0 0

x

Page 40: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 40

SVD - Interpretation #2

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

~

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 0 0

0 0 0 0 00 0 0 0 0

Page 41: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 41

SVD - Interpretation #2Exactly equivalent:‘spectral decomposition’ of the matrix:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 42: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 42

SVD - Interpretation #2Exactly equivalent:‘spectral decomposition’ of the matrix:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= x xu1 u2

1

2

v1

v2

Page 43: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 43

SVD - Interpretation #2Exactly equivalent:‘spectral decomposition’ of the matrix:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= u11 vT1 u22 vT

2+ +...n

m

Page 44: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 44

SVD - Interpretation #2Exactly equivalent:‘spectral decomposition’ of the matrix:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= u11 vT1 u22 vT

2+ +...n

m

n x 1 1 x m

r terms

Page 45: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 45

SVD - Interpretation #2approximation / dim. reduction:by keeping the first few terms (Q: how many?)

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= u11 vT1 u22 vT

2+ +...n

m

assume: 1 >= 2 >= ...

Page 46: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 46

SVD - Interpretation #2A (heuristic - [Fukunaga]): keep 80-90% of

‘energy’ (= sum of squares of i ’s)

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= u11 vT1 u22 vT

2+ +...n

m

assume: 1 >= 2 >= ...

Page 47: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 47

SVD - Detailed outline• Motivation• Definition - properties• Interpretation

– #1: documents/terms/concepts– #2: dim. reduction– #3: picking non-zero, rectangular ‘blobs’

• Complexity• Case studies• Additional properties

Page 48: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 48

SVD - Interpretation #3• finds non-zero ‘blobs’ in a data matrix

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 49: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 49

SVD - Interpretation #3• finds non-zero ‘blobs’ in a data matrix

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 50: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 50

SVD - Interpretation #3• Drill: find the SVD, ‘by inspection’!• Q: rank = ??

1 1 1 0 0

1 1 1 0 0

1 1 1 0 0

0 0 0 1 1

0 0 0 1 1

= x x?? ??

??

Page 51: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 51

SVD - Interpretation #3• A: rank = 2 (2 linearly independent

rows/cols)

1 1 1 0 0

1 1 1 0 0

1 1 1 0 0

0 0 0 1 1

0 0 0 1 1

= x x??

??

?? 0

0 ??

??

??

Page 52: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 52

SVD - Interpretation #3• A: rank = 2 (2 linearly independent

rows/cols)

1 1 1 0 0

1 1 1 0 0

1 1 1 0 0

0 0 0 1 1

0 0 0 1 1

= x x?? 0

0 ??

1 0

1 0

1 0

0 1

0 11 1 1 0 0

0 0 0 1 1

orthogonal??

Page 53: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 53

SVD - Interpretation #3• column vectors: are orthogonal - but not

unit vectors:

1 1 1 0 0

1 1 1 0 0

1 1 1 0 0

0 0 0 1 1

0 0 0 1 1

= x x?? 0

0 ??

1/sqrt(3) 0

1/sqrt(3) 0

1/sqrt(3) 0

0 1/sqrt(2)

0 1/sqrt(2)

1/sqrt(3) 1/sqrt(3) 1/sqrt(3) 0 0

0 0 0 1/sqrt(2) 1/sqrt(2)

Page 54: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 54

SVD - Interpretation #3• and the eigenvalues are:

1 1 1 0 0

1 1 1 0 0

1 1 1 0 0

0 0 0 1 1

0 0 0 1 1

= x x3 0

0 2

1/sqrt(3) 0

1/sqrt(3) 0

1/sqrt(3) 0

0 1/sqrt(2)

0 1/sqrt(2)

1/sqrt(3) 1/sqrt(3) 1/sqrt(3) 0 0

0 0 0 1/sqrt(2) 1/sqrt(2)

Page 55: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 55

SVD - Interpretation #3• Q: How to check we are correct?

1 1 1 0 0

1 1 1 0 0

1 1 1 0 0

0 0 0 1 1

0 0 0 1 1

= x x3 0

0 2

1/sqrt(3) 0

1/sqrt(3) 0

1/sqrt(3) 0

0 1/sqrt(2)

0 1/sqrt(2)

1/sqrt(3) 1/sqrt(3) 1/sqrt(3) 0 0

0 0 0 1/sqrt(2) 1/sqrt(2)

Page 56: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 56

SVD - Interpretation #3• A: SVD properties:

– matrix product should give back matrix A– matrix U should be column-orthonormal, i.e.,

columns should be unit vectors, orthogonal to each other

– ditto for matrix V– matrix should be diagonal, with positive

values

Page 57: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 57

SVD - Detailed outline• Motivation• Definition - properties• Interpretation• Complexity• Case studies• Additional properties

Page 58: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 58

SVD - Complexity• O( n * m * m) or O( n * n * m) (whichever

is less)• less work, if we just want eigenvalues• or if we want first k eigenvectors• or if the matrix is sparse [Berry]• Implemented: in any linear algebra package

(LINPACK, matlab, Splus, mathematica ...)

Page 59: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 59

SVD - conclusions so far• SVD: A= U VT : unique (*)• U: document-to-concept similarities• V: term-to-concept similarities• : strength of each concept• dim. reduction: keep the first few strongest

eigenvalues (80-90% of ‘energy’)– SVD: picks up linear correlations

• SVD: picks up non-zero ‘blobs’

Page 60: 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

Multi DB and D.M. Copyright: C. Faloutsos (2001) 60

References• Berry, Michael: http://www.cs.utk.edu/~lsi/

• Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition, Academic Press.

• Press, W. H., S. A. Teukolsky, et al. (1992). Numerical Recipes in C, Cambridge University Press.