250 svd dfn - carnegie mellon school of computer...
TRANSCRIPT
-
C. Faloutsos 15-826
1
15-826: Multimedia Databases and Data Mining
Lecture #16: SVD - part I (definitions)C. Faloutsos
15-826 Copyright (c) 2019 C. Faloutsos 2
Must-read Material
• Numerical Recipes in C ch. 2.6; • MM Textbook Appendix D
http://www.cs.cmu.edu/~christos/courses/826.S08/readinglist.htmlhttp://www.cs.cmu.edu/~christos/courses/826.S08/readinglist.html
-
C. Faloutsos 15-826
2
15-826 Copyright (c) 2019 C. Faloutsos 3
Outline
Goal: ‘Find similar / interesting things’• Intro to DB• Indexing - similarity search• Data Mining
15-826 Copyright (c) 2019 C. Faloutsos 4
Indexing - Detailed outline• primary key indexing• secondary key / multi-key indexing• spatial access methods• fractals• text• Singular Value Decomposition (SVD)• multimedia• ...
-
C. Faloutsos 15-826
3
15-826 Copyright (c) 2019 C. Faloutsos 5
SVD - Detailed outline• Motivation• Definition - properties• Interpretation• Complexity• Case studies• Additional properties
15-826 Copyright (c) 2019 C. Faloutsos 6
Problem
• How to find ‘concepts’ in a set of doc’s?– (~ clusters)
-
C. Faloutsos 15-826
4
15-826 Copyright (c) 2019 C. Faloutsos 7
Conclusion
• How to find ‘concepts’ in a set of doc’s?– (~ clusters)
• SVD (= LSI) on the document-term matrix
15-826 Copyright (c) 2019 C. Faloutsos 8
SVD - Motivation• problem #1: text - LSI: find ‘concepts’• problem #2: compression / dim. reduction
-
C. Faloutsos 15-826
5
15-826 Copyright (c) 2019 C. Faloutsos 9
SVD - Motivation• problem #1: text - LSI: find ‘concepts’
15-826 Copyright (c) 2019 C. Faloutsos P1-10
SVD - Motivation• Customer-product, for recommendation
system:
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
brea
dlet
tuce
beef
vegetarians
meat eaters
tomato
sch
icken
-
C. Faloutsos 15-826
6
15-826 Copyright (c) 2019 C. Faloutsos 11
SVD - Motivation• problem #2: compress / reduce
dimensionality
15-826 Copyright (c) 2019 C. Faloutsos 12
Problem - specs
• ~10**6 rows; ~10**3 columns; no updates;• random access to any cell(s) ; small error: OK
-
C. Faloutsos 15-826
7
15-826 Copyright (c) 2019 C. Faloutsos 13
SVD - Motivation
15-826 Copyright (c) 2019 C. Faloutsos 14
SVD - Motivation
-
C. Faloutsos 15-826
8
15-826 Copyright (c) 2019 C. Faloutsos 15
SVD - Detailed outline• Motivation• Definition - properties• Interpretation• Complexity• Case studies• Additional properties
15-826 Copyright (c) 2019 C. Faloutsos 16
SVD - Definition(reminder: matrix multiplication
1 23 45 6
x 1-1
3 x 2 2 x 1
=
-
C. Faloutsos 15-826
9
15-826 Copyright (c) 2019 C. Faloutsos 17
SVD - Definition(reminder: matrix multiplication
1 23 45 6
x 1-1
3 x 2 2 x 1
=
3 x 1
15-826 Copyright (c) 2019 C. Faloutsos 18
SVD - Definition(reminder: matrix multiplication
1 23 45 6
x 1-1
3 x 2 2 x 1
=-1
3 x 1
-
C. Faloutsos 15-826
10
15-826 Copyright (c) 2019 C. Faloutsos 19
SVD - Definition(reminder: matrix multiplication
1 23 45 6
x 1-1
3 x 2 2 x 1
=-1-1
3 x 1
15-826 Copyright (c) 2019 C. Faloutsos 20
SVD - Definition(reminder: matrix multiplication
1 23 45 6
x 1-1
=-1-1-1
-
C. Faloutsos 15-826
11
15-826 Copyright (c) 2019 C. Faloutsos 21
SVD - Definition• A = U L VT - example:
15-826 Copyright (c) 2019 C. Faloutsos 22
SVD - DefinitionA[n x m] = U[n x r] L [ r x r] (V[m x r])T
• A: n x m matrix (eg., n documents, m terms)
• U: n x r matrix (n documents, r concepts)• L: r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix)
• V: m x r matrix (m terms, r concepts)
-
C. Faloutsos 15-826
12
15-826 Copyright (c) 2019 C. Faloutsos 23
SVD - PropertiesTHEOREM [Press+92]: always possible to
decompose matrix A into A = U L VT , where• U, L, V: unique (*)• U, V: column orthonormal (ie., columns are unit
vectors, orthogonal to each other)– UT U = I; VT V = I (I: identity matrix)
• L: singular are positive, and sorted in decreasing order
15-826 Copyright (c) 2019 C. Faloutsos 24
SVD - Example• A = U L VT - example:
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
datainf.
retrievalbrain lung
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
=CS
MD
9.64 00 5.29x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
-
C. Faloutsos 15-826
13
15-826 Copyright (c) 2019 C. Faloutsos 25
SVD - Example• A = U L VT - example:
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
datainf.
retrievalbrain lung
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
=CS
MD
9.64 00 5.29x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
CS-conceptMD-concept
15-826 Copyright (c) 2019 C. Faloutsos 26
SVD - Example• A = U L VT - example:
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
datainf.
retrievalbrain lung
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
=CS
MD
9.64 00 5.29x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
CS-conceptMD-concept
doc-to-concept similarity matrix
-
C. Faloutsos 15-826
14
15-826 Copyright (c) 2019 C. Faloutsos 27
SVD - Example• A = U L VT - example:
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
datainf.
retrievalbrain lung
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
=CS
MD
9.64 00 5.29x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
‘strength’ of CS-concept
15-826 Copyright (c) 2019 C. Faloutsos 28
SVD - Example• A = U L VT - example:
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
datainf.
retrievalbrain lung
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
=CS
MD
9.64 00 5.29x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
term-to-conceptsimilarity matrix
CS-concept
-
C. Faloutsos 15-826
15
15-826 Copyright (c) 2019 C. Faloutsos 29
SVD - Example• A = U L VT - example:
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
datainf.
retrievalbrain lung
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
=CS
MD
9.64 00 5.29x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
term-to-conceptsimilarity matrix
CS-concept
15-826 Copyright (c) 2019 C. Faloutsos 30
SVD - Detailed outline• Motivation• Definition - properties• Interpretation• Complexity• Case studies• Additional properties
-
C. Faloutsos 15-826
16
15-826 Copyright (c) 2019 C. Faloutsos 31
SVD - Interpretation #1‘documents’, ‘terms’ and ‘concepts’:• U: document-to-concept similarity matrix• V: term-to-concept sim. matrix• L: its diagonal elements: ‘strength’ of each
concept
15-826 Copyright (c) 2019 C. Faloutsos 32
SVD – Interpretation #1‘documents’, ‘terms’ and ‘concepts’:Q: if A is the document-to-term matrix, what
is AT A?A:Q: A AT ?A:
-
C. Faloutsos 15-826
17
15-826 Copyright (c) 2019 C. Faloutsos 33
SVD – Interpretation #1‘documents’, ‘terms’ and ‘concepts’:Q: if A is the document-to-term matrix, what
is AT A?A: term-to-term ([m x m]) similarity matrixQ: A AT ?A: document-to-document ([n x n]) similarity
matrix
15-826 Copyright (c) 2019 C. Faloutsos P1-34Copyright: Faloutsos, Tong (2009) 2-34
SVD properties• V are the eigenvectors of the covariance
matrix ATA
• U are the eigenvectors of the Gram (inner-product) matrix AAT
Further reading:1. Ian T. Jolliffe, Principal Component Analysis (2nd ed), Springer, 2002.2. Gilbert Strang, Linear Algebra and Its Applications (4th ed), Brooks Cole, 2005.
-
C. Faloutsos 15-826
18
15-826 Copyright (c) 2019 C. Faloutsos 35
SVD - Interpretation #2• best axis to project on: (‘best’ = min sum
of squares of projection errors)
15-826 Copyright (c) 2019 C. Faloutsos 36
SVD - Motivation
-
C. Faloutsos 15-826
19
15-826 Copyright (c) 2019 C. Faloutsos 37
SVD - interpretation #2
• minimum RMS error
SVD: givesbest axis to project
v1
first singular
vector
15-826 Copyright (c) 2019 C. Faloutsos 38
SVD - Interpretation #2
-
C. Faloutsos 15-826
20
15-826 Copyright (c) 2019 C. Faloutsos 39
SVD - Interpretation #2• A = U L VT - example:
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
=9.64 00 5.29x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
xv1
15-826 Copyright (c) 2019 C. Faloutsos 40
SVD - Interpretation #2• A = U L VT - example:
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
=9.64 00 5.29x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
variance (‘spread’) on the v1 axis
-
C. Faloutsos 15-826
21
15-826 Copyright (c) 2019 C. Faloutsos 41
SVD - Interpretation #2• A = U L VT - example:
– U L gives the coordinates of the points in the projection axis
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
=9.64 00 5.29x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
15-826 Copyright (c) 2019 C. Faloutsos 42
SVD - Interpretation #2• More details• Q: how exactly is dim. reduction done?
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
=9.64 00 5.29x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
-
C. Faloutsos 15-826
22
15-826 Copyright (c) 2019 C. Faloutsos 43
SVD - Interpretation #2• More details• Q: how exactly is dim. reduction done?• A: set the smallest singular values to zero:1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
=9.64 00 5.29x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
15-826 Copyright (c) 2019 C. Faloutsos 44
SVD - Interpretation #2
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
~9.64 00 0x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
-
C. Faloutsos 15-826
23
15-826 Copyright (c) 2019 C. Faloutsos 45
SVD - Interpretation #2
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
~9.64 00 0x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
15-826 Copyright (c) 2019 C. Faloutsos 46
SVD - Interpretation #2
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
0.180.360.180.90000
~9.64x
0.58 0.58 0.58 0 0
x
-
C. Faloutsos 15-826
24
15-826 Copyright (c) 2019 C. Faloutsos 47
SVD - Interpretation #2
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
~
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 0 00 0 0 0 00 0 0 0 0
15-826 Copyright (c) 2019 C. Faloutsos 48
SVD - Interpretation #2Exactly equivalent:‘spectral decomposition’ of the matrix:
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
=9.64 00 5.29x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
-
C. Faloutsos 15-826
25
Preliminaries: Spectral form
15-826 Copyright (c) 2019 C. Faloutsos 49
× × ×+=
𝐔 × 𝐕& 𝑢1× 𝑣1𝑇 𝑢2× 𝑣2𝑇
15-826 Copyright (c) 2019 C. Faloutsos 50
SVD – Spectral formExactly equivalent:‘spectral decomposition’ of the matrix:
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
= x xu1 u2l1
l2
v1v2
-
C. Faloutsos 15-826
26
15-826 Copyright (c) 2019 C. Faloutsos 51
SVD – Spectral formExactly equivalent:‘spectral decomposition’ of the matrix:
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
= u1l1 vT1 u2l2 vT2+ +...n
m
15-826 Copyright (c) 2019 C. Faloutsos 52
SVD – Spectral formExactly equivalent:‘spectral decomposition’ of the matrix:
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
= u1l1 vT1 u2l2 vT2+ +...n
m
n x 1 1 x m
r terms
-
C. Faloutsos 15-826
27
15-826 Copyright (c) 2019 C. Faloutsos 53
SVD – Spectral formapproximation / dim. reduction:by keeping the first few terms (Q: how many?)
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
= u1l1 vT1 u2l2 vT2+ +...n
m
assume: l1 >= l2 >= ...
15-826 Copyright (c) 2019 C. Faloutsos 54
SVD - Spectral formA (heuristic - [Fukunaga]): keep 80-90% of ‘energy’ (= sum of squares of li ’s)
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
= u1l1 vT1 u2l2 vT2+ +...n
m
assume: l1 >= l2 >= ...
-
C. Faloutsos 15-826
28
15-826 Copyright (c) 2019 C. Faloutsos 55
SVD - Spectral formAlso, if there are clear blocks, each corresponds
to a ‘concept’/singular-vector-pair
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
= u1l1 vT1 u2l2 vT2+ +...n
m
assume: l1 >= l2 >= ...
15-826 Copyright (c) 2019 C. Faloutsos 56
SVD – Spectral form
× × ×+=
𝐔 × 𝐕& 𝑢1× 𝑣1𝑇 𝑢2× 𝑣2𝑇
-
C. Faloutsos 15-826
29
15-826 Copyright (c) 2019 C. Faloutsos 57
SVD – Spectral formArithmetic example:
1 23 45 6
× 10 2−10 −2 =135× 10 2 +
246× −10 −2
=−10 −2−10 −2−10 −2
+=
15-826 Copyright (c) 2019 C. Faloutsos 58
SVD - Detailed outline• Motivation• Definition - properties• Interpretation
– #1: documents/terms/concepts– #2: dim. reduction– #3: picking non-zero, rectangular ‘blobs’
• Complexity• Case studies• Additional properties
-
C. Faloutsos 15-826
30
15-826 Copyright (c) 2019 C. Faloutsos 59
SVD - Interpretation #3• finds non-zero ‘blobs’ in a data matrix
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
=9.64 00 5.29x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
15-826 Copyright (c) 2019 C. Faloutsos 60
SVD - Interpretation #3• finds non-zero ‘blobs’ in a data matrix
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
=9.64 00 5.29x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
-
C. Faloutsos 15-826
31
15-826 Copyright (c) 2019 C. Faloutsos P1-61
SVD - Interpretation #3• finds non-zero ‘blobs’ in a data matrix =• ‘communities’ (bi-partite cores, here)1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
Row 1
Row 4
Col 1
Col 3
Col 4Row 5
Row 7
SVD – Interpretation #3
• Another pictorial example• And its connection to EigenSpokes
15-826 Copyright (c) 2019 C. Faloutsos 62
-
C. Faloutsos 15-826
32
SVD – Interpretation #3
• It finds blocks
N fans
Midols
‘music lovers’‘singers’
‘sports lovers’‘athletes’
‘citizens’‘politicians’
~ + +
Copyright (c) 2019 C. Faloutsos 6315-826
SVD – Interpretation #3
• It finds blocks
15-826 Copyright (c) 2019 C. Faloutsos 64
N fans
Midols
‘music lovers’‘singers’
‘sports lovers’‘athletes’
‘citizens’‘politicians’
~ + +
A) Even if shuffled!
-
C. Faloutsos 15-826
33
SVD – Interpretation #3
• It finds blocks
15-826
N fans
Midols
‘music lovers’‘singers’
‘sports lovers’‘athletes’
‘citizens’‘politicians’
~ + +
Copyright (c) 2019 C. Faloutsos 65
B) Even if ‘salt+pepper’ noise
Toy example – EigenSpokes
Copyright (c) 2019 C. Faloutsos 66
u0
u1 v1
v0
u0 u1
v0 v1
EigenPlots15-826
‘fans’
‘idols’
-
C. Faloutsos 15-826
34
Toy example – EigenSpokes
Copyright (c) 2019 C. Faloutsos 67
u0
u1 v1
v015-826
‘fans’
‘idols’
u0 u1
v0 v1
CMU SCS
EigenSpokes• Eigenvectors of adjacency matrix
�u1 �ui68Copyright (c) 2019 C. Faloutsos15-826
N
N
-
C. Faloutsos 15-826
35
CMU SCS
EigenSpokes• Eigenvectors of adjacency matrix
�u1 �ui69Copyright (c) 2019 C. Faloutsos15-826
N
N
SVD - Drill
• Compute SVD by hand– On a block-structured, toy matrix– Exploiting the properties of SVD
15-826 Copyright (c) 2019 C. Faloutsos 70
-
C. Faloutsos 15-826
36
15-826 Copyright (c) 2019 C. Faloutsos 71
SVD - Drill• Drill: find the SVD, ‘by inspection’!• Q: rank = ??
1 1 1 0 01 1 1 0 01 1 1 0 00 0 0 1 10 0 0 1 1
= x x?? ??
??
15-826 Copyright (c) 2019 C. Faloutsos 72
SVD - Drill• A: rank = 2 (2 linearly independent
rows/cols)
1 1 1 0 01 1 1 0 01 1 1 0 00 0 0 1 10 0 0 1 1
= x x??
??
?? 00 ??
??
??
-
C. Faloutsos 15-826
37
15-826 Copyright (c) 2019 C. Faloutsos 73
SVD - Drill• A: rank = 2 (2 linearly independent
rows/cols)
1 1 1 0 01 1 1 0 01 1 1 0 00 0 0 1 10 0 0 1 1
= x x?? 00 ??
1 01 01 00 10 1
1 1 1 0 00 0 0 1 1
orthogonal??
15-826 Copyright (c) 2019 C. Faloutsos 74
SVD - Drill• column vectors: are orthogonal - but not
unit vectors:
1 1 1 0 01 1 1 0 01 1 1 0 00 0 0 1 10 0 0 1 1
= x x?? 00 ??
1/sqrt(3) 1/sqrt(3) 1/sqrt(3) 0 00 0 0 1/sqrt(2) 1/sqrt(2)
1/sqrt(3) 01/sqrt(3) 01/sqrt(3) 0
0 1/sqrt(2)0 1/sqrt(2)
-
C. Faloutsos 15-826
38
15-826 Copyright (c) 2019 C. Faloutsos 75
SVD - Drill• and the singular values are:
1 1 1 0 01 1 1 0 01 1 1 0 00 0 0 1 10 0 0 1 1
= x x3 00 2
1/sqrt(3) 1/sqrt(3) 1/sqrt(3) 0 00 0 0 1/sqrt(2) 1/sqrt(2)
1/sqrt(3) 01/sqrt(3) 01/sqrt(3) 0
0 1/sqrt(2)0 1/sqrt(2)
15-826 Copyright (c) 2019 C. Faloutsos 76
SVD - Drill• Q: How to check we are correct?
1 1 1 0 01 1 1 0 01 1 1 0 00 0 0 1 10 0 0 1 1
= x x3 00 2
1/sqrt(3) 1/sqrt(3) 1/sqrt(3) 0 00 0 0 1/sqrt(2) 1/sqrt(2)
1/sqrt(3) 01/sqrt(3) 01/sqrt(3) 0
0 1/sqrt(2)0 1/sqrt(2)
-
C. Faloutsos 15-826
39
15-826 Copyright (c) 2019 C. Faloutsos 77
SVD - Drill• A: SVD properties:
– matrix product should give back matrix A– matrix U should be column-orthonormal, i.e.,
columns should be unit vectors, orthogonal to each other
– ditto for matrix V– matrix L should be diagonal, with positive
values
15-826 Copyright (c) 2019 C. Faloutsos 78
SVD - Detailed outline• Motivation• Definition - properties• Interpretation• Complexity• Case studies• Additional properties
-
C. Faloutsos 15-826
40
15-826 Copyright (c) 2019 C. Faloutsos 79
SVD - Complexity• O( n * m * m) or O( n * n * m) (whichever
is less)• less work, if we just want singular values• or if we want first k singular vectors• or if the matrix is sparse [Berry]• Implemented: in any linear algebra package
(LINPACK, matlab, Splus/R, mathematica ...)
15-826 Copyright (c) 2019 C. Faloutsos 80
SVD - conclusions so far• SVD: A= U L VT : unique (*)• U: document-to-concept similarities• V: term-to-concept similarities• L: strength of each concept• dim. reduction: keep the first few strongest
singular values (80-90% of ‘energy’)– SVD: picks up linear correlations
• SVD: picks up non-zero ‘blobs’
-
C. Faloutsos 15-826
41
15-826 Copyright (c) 2019 C. Faloutsos 81
Conclusion
• How to find ‘concepts’ in a set of doc’s?– (~ clusters)
• SVD (= LSI) on the document-term matrix
15-826 Copyright (c) 2019 C. Faloutsos 82
References• Berry, Michael: http://www.cs.utk.edu/~lsi/• Fukunaga, K. (1990). Introduction to Statistical
Pattern Recognition, Academic Press.• Press, W. H., S. A. Teukolsky, et al. (1992) (3rd
ed.: 2007): Numerical Recipes in C, Cambridge University Press.