250 svd dfn - carnegie mellon school of computer...

41
C. Faloutsos 15-826 1 15-826: Multimedia Databases and Data Mining Lecture #16: SVD - part I (definitions) C. Faloutsos 15-826 Copyright (c) 2019 C. Faloutsos 2 Must-read Material Numerical Recipes in C ch. 2.6; MM Textbook Appendix D

Upload: others

Post on 01-Feb-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

  • C. Faloutsos 15-826

    1

    15-826: Multimedia Databases and Data Mining

    Lecture #16: SVD - part I (definitions)C. Faloutsos

    15-826 Copyright (c) 2019 C. Faloutsos 2

    Must-read Material

    • Numerical Recipes in C ch. 2.6; • MM Textbook Appendix D

    http://www.cs.cmu.edu/~christos/courses/826.S08/readinglist.htmlhttp://www.cs.cmu.edu/~christos/courses/826.S08/readinglist.html

  • C. Faloutsos 15-826

    2

    15-826 Copyright (c) 2019 C. Faloutsos 3

    Outline

    Goal: ‘Find similar / interesting things’• Intro to DB• Indexing - similarity search• Data Mining

    15-826 Copyright (c) 2019 C. Faloutsos 4

    Indexing - Detailed outline• primary key indexing• secondary key / multi-key indexing• spatial access methods• fractals• text• Singular Value Decomposition (SVD)• multimedia• ...

  • C. Faloutsos 15-826

    3

    15-826 Copyright (c) 2019 C. Faloutsos 5

    SVD - Detailed outline• Motivation• Definition - properties• Interpretation• Complexity• Case studies• Additional properties

    15-826 Copyright (c) 2019 C. Faloutsos 6

    Problem

    • How to find ‘concepts’ in a set of doc’s?– (~ clusters)

  • C. Faloutsos 15-826

    4

    15-826 Copyright (c) 2019 C. Faloutsos 7

    Conclusion

    • How to find ‘concepts’ in a set of doc’s?– (~ clusters)

    • SVD (= LSI) on the document-term matrix

    15-826 Copyright (c) 2019 C. Faloutsos 8

    SVD - Motivation• problem #1: text - LSI: find ‘concepts’• problem #2: compression / dim. reduction

  • C. Faloutsos 15-826

    5

    15-826 Copyright (c) 2019 C. Faloutsos 9

    SVD - Motivation• problem #1: text - LSI: find ‘concepts’

    15-826 Copyright (c) 2019 C. Faloutsos P1-10

    SVD - Motivation• Customer-product, for recommendation

    system:

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    brea

    dlet

    tuce

    beef

    vegetarians

    meat eaters

    tomato

    sch

    icken

  • C. Faloutsos 15-826

    6

    15-826 Copyright (c) 2019 C. Faloutsos 11

    SVD - Motivation• problem #2: compress / reduce

    dimensionality

    15-826 Copyright (c) 2019 C. Faloutsos 12

    Problem - specs

    • ~10**6 rows; ~10**3 columns; no updates;• random access to any cell(s) ; small error: OK

  • C. Faloutsos 15-826

    7

    15-826 Copyright (c) 2019 C. Faloutsos 13

    SVD - Motivation

    15-826 Copyright (c) 2019 C. Faloutsos 14

    SVD - Motivation

  • C. Faloutsos 15-826

    8

    15-826 Copyright (c) 2019 C. Faloutsos 15

    SVD - Detailed outline• Motivation• Definition - properties• Interpretation• Complexity• Case studies• Additional properties

    15-826 Copyright (c) 2019 C. Faloutsos 16

    SVD - Definition(reminder: matrix multiplication

    1 23 45 6

    x 1-1

    3 x 2 2 x 1

    =

  • C. Faloutsos 15-826

    9

    15-826 Copyright (c) 2019 C. Faloutsos 17

    SVD - Definition(reminder: matrix multiplication

    1 23 45 6

    x 1-1

    3 x 2 2 x 1

    =

    3 x 1

    15-826 Copyright (c) 2019 C. Faloutsos 18

    SVD - Definition(reminder: matrix multiplication

    1 23 45 6

    x 1-1

    3 x 2 2 x 1

    =-1

    3 x 1

  • C. Faloutsos 15-826

    10

    15-826 Copyright (c) 2019 C. Faloutsos 19

    SVD - Definition(reminder: matrix multiplication

    1 23 45 6

    x 1-1

    3 x 2 2 x 1

    =-1-1

    3 x 1

    15-826 Copyright (c) 2019 C. Faloutsos 20

    SVD - Definition(reminder: matrix multiplication

    1 23 45 6

    x 1-1

    =-1-1-1

  • C. Faloutsos 15-826

    11

    15-826 Copyright (c) 2019 C. Faloutsos 21

    SVD - Definition• A = U L VT - example:

    15-826 Copyright (c) 2019 C. Faloutsos 22

    SVD - DefinitionA[n x m] = U[n x r] L [ r x r] (V[m x r])T

    • A: n x m matrix (eg., n documents, m terms)

    • U: n x r matrix (n documents, r concepts)• L: r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix)

    • V: m x r matrix (m terms, r concepts)

  • C. Faloutsos 15-826

    12

    15-826 Copyright (c) 2019 C. Faloutsos 23

    SVD - PropertiesTHEOREM [Press+92]: always possible to

    decompose matrix A into A = U L VT , where• U, L, V: unique (*)• U, V: column orthonormal (ie., columns are unit

    vectors, orthogonal to each other)– UT U = I; VT V = I (I: identity matrix)

    • L: singular are positive, and sorted in decreasing order

    15-826 Copyright (c) 2019 C. Faloutsos 24

    SVD - Example• A = U L VT - example:

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    datainf.

    retrievalbrain lung

    0.18 00.36 00.18 00.90 00 0.530 0.800 0.27

    =CS

    MD

    9.64 00 5.29x

    0.58 0.58 0.58 0 00 0 0 0.71 0.71

    x

  • C. Faloutsos 15-826

    13

    15-826 Copyright (c) 2019 C. Faloutsos 25

    SVD - Example• A = U L VT - example:

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    datainf.

    retrievalbrain lung

    0.18 00.36 00.18 00.90 00 0.530 0.800 0.27

    =CS

    MD

    9.64 00 5.29x

    0.58 0.58 0.58 0 00 0 0 0.71 0.71

    x

    CS-conceptMD-concept

    15-826 Copyright (c) 2019 C. Faloutsos 26

    SVD - Example• A = U L VT - example:

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    datainf.

    retrievalbrain lung

    0.18 00.36 00.18 00.90 00 0.530 0.800 0.27

    =CS

    MD

    9.64 00 5.29x

    0.58 0.58 0.58 0 00 0 0 0.71 0.71

    x

    CS-conceptMD-concept

    doc-to-concept similarity matrix

  • C. Faloutsos 15-826

    14

    15-826 Copyright (c) 2019 C. Faloutsos 27

    SVD - Example• A = U L VT - example:

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    datainf.

    retrievalbrain lung

    0.18 00.36 00.18 00.90 00 0.530 0.800 0.27

    =CS

    MD

    9.64 00 5.29x

    0.58 0.58 0.58 0 00 0 0 0.71 0.71

    x

    ‘strength’ of CS-concept

    15-826 Copyright (c) 2019 C. Faloutsos 28

    SVD - Example• A = U L VT - example:

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    datainf.

    retrievalbrain lung

    0.18 00.36 00.18 00.90 00 0.530 0.800 0.27

    =CS

    MD

    9.64 00 5.29x

    0.58 0.58 0.58 0 00 0 0 0.71 0.71

    x

    term-to-conceptsimilarity matrix

    CS-concept

  • C. Faloutsos 15-826

    15

    15-826 Copyright (c) 2019 C. Faloutsos 29

    SVD - Example• A = U L VT - example:

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    datainf.

    retrievalbrain lung

    0.18 00.36 00.18 00.90 00 0.530 0.800 0.27

    =CS

    MD

    9.64 00 5.29x

    0.58 0.58 0.58 0 00 0 0 0.71 0.71

    x

    term-to-conceptsimilarity matrix

    CS-concept

    15-826 Copyright (c) 2019 C. Faloutsos 30

    SVD - Detailed outline• Motivation• Definition - properties• Interpretation• Complexity• Case studies• Additional properties

  • C. Faloutsos 15-826

    16

    15-826 Copyright (c) 2019 C. Faloutsos 31

    SVD - Interpretation #1‘documents’, ‘terms’ and ‘concepts’:• U: document-to-concept similarity matrix• V: term-to-concept sim. matrix• L: its diagonal elements: ‘strength’ of each

    concept

    15-826 Copyright (c) 2019 C. Faloutsos 32

    SVD – Interpretation #1‘documents’, ‘terms’ and ‘concepts’:Q: if A is the document-to-term matrix, what

    is AT A?A:Q: A AT ?A:

  • C. Faloutsos 15-826

    17

    15-826 Copyright (c) 2019 C. Faloutsos 33

    SVD – Interpretation #1‘documents’, ‘terms’ and ‘concepts’:Q: if A is the document-to-term matrix, what

    is AT A?A: term-to-term ([m x m]) similarity matrixQ: A AT ?A: document-to-document ([n x n]) similarity

    matrix

    15-826 Copyright (c) 2019 C. Faloutsos P1-34Copyright: Faloutsos, Tong (2009) 2-34

    SVD properties• V are the eigenvectors of the covariance

    matrix ATA

    • U are the eigenvectors of the Gram (inner-product) matrix AAT

    Further reading:1. Ian T. Jolliffe, Principal Component Analysis (2nd ed), Springer, 2002.2. Gilbert Strang, Linear Algebra and Its Applications (4th ed), Brooks Cole, 2005.

  • C. Faloutsos 15-826

    18

    15-826 Copyright (c) 2019 C. Faloutsos 35

    SVD - Interpretation #2• best axis to project on: (‘best’ = min sum

    of squares of projection errors)

    15-826 Copyright (c) 2019 C. Faloutsos 36

    SVD - Motivation

  • C. Faloutsos 15-826

    19

    15-826 Copyright (c) 2019 C. Faloutsos 37

    SVD - interpretation #2

    • minimum RMS error

    SVD: givesbest axis to project

    v1

    first singular

    vector

    15-826 Copyright (c) 2019 C. Faloutsos 38

    SVD - Interpretation #2

  • C. Faloutsos 15-826

    20

    15-826 Copyright (c) 2019 C. Faloutsos 39

    SVD - Interpretation #2• A = U L VT - example:

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    0.18 00.36 00.18 00.90 00 0.530 0.800 0.27

    =9.64 00 5.29x

    0.58 0.58 0.58 0 00 0 0 0.71 0.71

    xv1

    15-826 Copyright (c) 2019 C. Faloutsos 40

    SVD - Interpretation #2• A = U L VT - example:

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    0.18 00.36 00.18 00.90 00 0.530 0.800 0.27

    =9.64 00 5.29x

    0.58 0.58 0.58 0 00 0 0 0.71 0.71

    x

    variance (‘spread’) on the v1 axis

  • C. Faloutsos 15-826

    21

    15-826 Copyright (c) 2019 C. Faloutsos 41

    SVD - Interpretation #2• A = U L VT - example:

    – U L gives the coordinates of the points in the projection axis

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    0.18 00.36 00.18 00.90 00 0.530 0.800 0.27

    =9.64 00 5.29x

    0.58 0.58 0.58 0 00 0 0 0.71 0.71

    x

    15-826 Copyright (c) 2019 C. Faloutsos 42

    SVD - Interpretation #2• More details• Q: how exactly is dim. reduction done?

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    0.18 00.36 00.18 00.90 00 0.530 0.800 0.27

    =9.64 00 5.29x

    0.58 0.58 0.58 0 00 0 0 0.71 0.71

    x

  • C. Faloutsos 15-826

    22

    15-826 Copyright (c) 2019 C. Faloutsos 43

    SVD - Interpretation #2• More details• Q: how exactly is dim. reduction done?• A: set the smallest singular values to zero:1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    0.18 00.36 00.18 00.90 00 0.530 0.800 0.27

    =9.64 00 5.29x

    0.58 0.58 0.58 0 00 0 0 0.71 0.71

    x

    15-826 Copyright (c) 2019 C. Faloutsos 44

    SVD - Interpretation #2

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    0.18 00.36 00.18 00.90 00 0.530 0.800 0.27

    ~9.64 00 0x

    0.58 0.58 0.58 0 00 0 0 0.71 0.71

    x

  • C. Faloutsos 15-826

    23

    15-826 Copyright (c) 2019 C. Faloutsos 45

    SVD - Interpretation #2

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    0.18 00.36 00.18 00.90 00 0.530 0.800 0.27

    ~9.64 00 0x

    0.58 0.58 0.58 0 00 0 0 0.71 0.71

    x

    15-826 Copyright (c) 2019 C. Faloutsos 46

    SVD - Interpretation #2

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    0.180.360.180.90000

    ~9.64x

    0.58 0.58 0.58 0 0

    x

  • C. Faloutsos 15-826

    24

    15-826 Copyright (c) 2019 C. Faloutsos 47

    SVD - Interpretation #2

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    ~

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 0 00 0 0 0 00 0 0 0 0

    15-826 Copyright (c) 2019 C. Faloutsos 48

    SVD - Interpretation #2Exactly equivalent:‘spectral decomposition’ of the matrix:

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    0.18 00.36 00.18 00.90 00 0.530 0.800 0.27

    =9.64 00 5.29x

    0.58 0.58 0.58 0 00 0 0 0.71 0.71

    x

  • C. Faloutsos 15-826

    25

    Preliminaries: Spectral form

    15-826 Copyright (c) 2019 C. Faloutsos 49

    × × ×+=

    𝐔 × 𝐕& 𝑢1× 𝑣1𝑇 𝑢2× 𝑣2𝑇

    15-826 Copyright (c) 2019 C. Faloutsos 50

    SVD – Spectral formExactly equivalent:‘spectral decomposition’ of the matrix:

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    = x xu1 u2l1

    l2

    v1v2

  • C. Faloutsos 15-826

    26

    15-826 Copyright (c) 2019 C. Faloutsos 51

    SVD – Spectral formExactly equivalent:‘spectral decomposition’ of the matrix:

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    = u1l1 vT1 u2l2 vT2+ +...n

    m

    15-826 Copyright (c) 2019 C. Faloutsos 52

    SVD – Spectral formExactly equivalent:‘spectral decomposition’ of the matrix:

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    = u1l1 vT1 u2l2 vT2+ +...n

    m

    n x 1 1 x m

    r terms

  • C. Faloutsos 15-826

    27

    15-826 Copyright (c) 2019 C. Faloutsos 53

    SVD – Spectral formapproximation / dim. reduction:by keeping the first few terms (Q: how many?)

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    = u1l1 vT1 u2l2 vT2+ +...n

    m

    assume: l1 >= l2 >= ...

    15-826 Copyright (c) 2019 C. Faloutsos 54

    SVD - Spectral formA (heuristic - [Fukunaga]): keep 80-90% of ‘energy’ (= sum of squares of li ’s)

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    = u1l1 vT1 u2l2 vT2+ +...n

    m

    assume: l1 >= l2 >= ...

  • C. Faloutsos 15-826

    28

    15-826 Copyright (c) 2019 C. Faloutsos 55

    SVD - Spectral formAlso, if there are clear blocks, each corresponds

    to a ‘concept’/singular-vector-pair

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    = u1l1 vT1 u2l2 vT2+ +...n

    m

    assume: l1 >= l2 >= ...

    15-826 Copyright (c) 2019 C. Faloutsos 56

    SVD – Spectral form

    × × ×+=

    𝐔 × 𝐕& 𝑢1× 𝑣1𝑇 𝑢2× 𝑣2𝑇

  • C. Faloutsos 15-826

    29

    15-826 Copyright (c) 2019 C. Faloutsos 57

    SVD – Spectral formArithmetic example:

    1 23 45 6

    × 10 2−10 −2 =135× 10 2 +

    246× −10 −2

    =−10 −2−10 −2−10 −2

    +=

    15-826 Copyright (c) 2019 C. Faloutsos 58

    SVD - Detailed outline• Motivation• Definition - properties• Interpretation

    – #1: documents/terms/concepts– #2: dim. reduction– #3: picking non-zero, rectangular ‘blobs’

    • Complexity• Case studies• Additional properties

  • C. Faloutsos 15-826

    30

    15-826 Copyright (c) 2019 C. Faloutsos 59

    SVD - Interpretation #3• finds non-zero ‘blobs’ in a data matrix

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    0.18 00.36 00.18 00.90 00 0.530 0.800 0.27

    =9.64 00 5.29x

    0.58 0.58 0.58 0 00 0 0 0.71 0.71

    x

    15-826 Copyright (c) 2019 C. Faloutsos 60

    SVD - Interpretation #3• finds non-zero ‘blobs’ in a data matrix

    1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    0.18 00.36 00.18 00.90 00 0.530 0.800 0.27

    =9.64 00 5.29x

    0.58 0.58 0.58 0 00 0 0 0.71 0.71

    x

  • C. Faloutsos 15-826

    31

    15-826 Copyright (c) 2019 C. Faloutsos P1-61

    SVD - Interpretation #3• finds non-zero ‘blobs’ in a data matrix =• ‘communities’ (bi-partite cores, here)1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1

    Row 1

    Row 4

    Col 1

    Col 3

    Col 4Row 5

    Row 7

    SVD – Interpretation #3

    • Another pictorial example• And its connection to EigenSpokes

    15-826 Copyright (c) 2019 C. Faloutsos 62

  • C. Faloutsos 15-826

    32

    SVD – Interpretation #3

    • It finds blocks

    N fans

    Midols

    ‘music lovers’‘singers’

    ‘sports lovers’‘athletes’

    ‘citizens’‘politicians’

    ~ + +

    Copyright (c) 2019 C. Faloutsos 6315-826

    SVD – Interpretation #3

    • It finds blocks

    15-826 Copyright (c) 2019 C. Faloutsos 64

    N fans

    Midols

    ‘music lovers’‘singers’

    ‘sports lovers’‘athletes’

    ‘citizens’‘politicians’

    ~ + +

    A) Even if shuffled!

  • C. Faloutsos 15-826

    33

    SVD – Interpretation #3

    • It finds blocks

    15-826

    N fans

    Midols

    ‘music lovers’‘singers’

    ‘sports lovers’‘athletes’

    ‘citizens’‘politicians’

    ~ + +

    Copyright (c) 2019 C. Faloutsos 65

    B) Even if ‘salt+pepper’ noise

    Toy example – EigenSpokes

    Copyright (c) 2019 C. Faloutsos 66

    u0

    u1 v1

    v0

    u0 u1

    v0 v1

    EigenPlots15-826

    ‘fans’

    ‘idols’

  • C. Faloutsos 15-826

    34

    Toy example – EigenSpokes

    Copyright (c) 2019 C. Faloutsos 67

    u0

    u1 v1

    v015-826

    ‘fans’

    ‘idols’

    u0 u1

    v0 v1

    CMU SCS

    EigenSpokes• Eigenvectors of adjacency matrix

    �u1 �ui68Copyright (c) 2019 C. Faloutsos15-826

    N

    N

  • C. Faloutsos 15-826

    35

    CMU SCS

    EigenSpokes• Eigenvectors of adjacency matrix

    �u1 �ui69Copyright (c) 2019 C. Faloutsos15-826

    N

    N

    SVD - Drill

    • Compute SVD by hand– On a block-structured, toy matrix– Exploiting the properties of SVD

    15-826 Copyright (c) 2019 C. Faloutsos 70

  • C. Faloutsos 15-826

    36

    15-826 Copyright (c) 2019 C. Faloutsos 71

    SVD - Drill• Drill: find the SVD, ‘by inspection’!• Q: rank = ??

    1 1 1 0 01 1 1 0 01 1 1 0 00 0 0 1 10 0 0 1 1

    = x x?? ??

    ??

    15-826 Copyright (c) 2019 C. Faloutsos 72

    SVD - Drill• A: rank = 2 (2 linearly independent

    rows/cols)

    1 1 1 0 01 1 1 0 01 1 1 0 00 0 0 1 10 0 0 1 1

    = x x??

    ??

    ?? 00 ??

    ??

    ??

  • C. Faloutsos 15-826

    37

    15-826 Copyright (c) 2019 C. Faloutsos 73

    SVD - Drill• A: rank = 2 (2 linearly independent

    rows/cols)

    1 1 1 0 01 1 1 0 01 1 1 0 00 0 0 1 10 0 0 1 1

    = x x?? 00 ??

    1 01 01 00 10 1

    1 1 1 0 00 0 0 1 1

    orthogonal??

    15-826 Copyright (c) 2019 C. Faloutsos 74

    SVD - Drill• column vectors: are orthogonal - but not

    unit vectors:

    1 1 1 0 01 1 1 0 01 1 1 0 00 0 0 1 10 0 0 1 1

    = x x?? 00 ??

    1/sqrt(3) 1/sqrt(3) 1/sqrt(3) 0 00 0 0 1/sqrt(2) 1/sqrt(2)

    1/sqrt(3) 01/sqrt(3) 01/sqrt(3) 0

    0 1/sqrt(2)0 1/sqrt(2)

  • C. Faloutsos 15-826

    38

    15-826 Copyright (c) 2019 C. Faloutsos 75

    SVD - Drill• and the singular values are:

    1 1 1 0 01 1 1 0 01 1 1 0 00 0 0 1 10 0 0 1 1

    = x x3 00 2

    1/sqrt(3) 1/sqrt(3) 1/sqrt(3) 0 00 0 0 1/sqrt(2) 1/sqrt(2)

    1/sqrt(3) 01/sqrt(3) 01/sqrt(3) 0

    0 1/sqrt(2)0 1/sqrt(2)

    15-826 Copyright (c) 2019 C. Faloutsos 76

    SVD - Drill• Q: How to check we are correct?

    1 1 1 0 01 1 1 0 01 1 1 0 00 0 0 1 10 0 0 1 1

    = x x3 00 2

    1/sqrt(3) 1/sqrt(3) 1/sqrt(3) 0 00 0 0 1/sqrt(2) 1/sqrt(2)

    1/sqrt(3) 01/sqrt(3) 01/sqrt(3) 0

    0 1/sqrt(2)0 1/sqrt(2)

  • C. Faloutsos 15-826

    39

    15-826 Copyright (c) 2019 C. Faloutsos 77

    SVD - Drill• A: SVD properties:

    – matrix product should give back matrix A– matrix U should be column-orthonormal, i.e.,

    columns should be unit vectors, orthogonal to each other

    – ditto for matrix V– matrix L should be diagonal, with positive

    values

    15-826 Copyright (c) 2019 C. Faloutsos 78

    SVD - Detailed outline• Motivation• Definition - properties• Interpretation• Complexity• Case studies• Additional properties

  • C. Faloutsos 15-826

    40

    15-826 Copyright (c) 2019 C. Faloutsos 79

    SVD - Complexity• O( n * m * m) or O( n * n * m) (whichever

    is less)• less work, if we just want singular values• or if we want first k singular vectors• or if the matrix is sparse [Berry]• Implemented: in any linear algebra package

    (LINPACK, matlab, Splus/R, mathematica ...)

    15-826 Copyright (c) 2019 C. Faloutsos 80

    SVD - conclusions so far• SVD: A= U L VT : unique (*)• U: document-to-concept similarities• V: term-to-concept similarities• L: strength of each concept• dim. reduction: keep the first few strongest

    singular values (80-90% of ‘energy’)– SVD: picks up linear correlations

    • SVD: picks up non-zero ‘blobs’

  • C. Faloutsos 15-826

    41

    15-826 Copyright (c) 2019 C. Faloutsos 81

    Conclusion

    • How to find ‘concepts’ in a set of doc’s?– (~ clusters)

    • SVD (= LSI) on the document-term matrix

    15-826 Copyright (c) 2019 C. Faloutsos 82

    References• Berry, Michael: http://www.cs.utk.edu/~lsi/• Fukunaga, K. (1990). Introduction to Statistical

    Pattern Recognition, Academic Press.• Press, W. H., S. A. Teukolsky, et al. (1992) (3rd

    ed.: 2007): Numerical Recipes in C, Cambridge University Press.