dimensionality reduction
DESCRIPTION
Dimensionality Reduction. Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: Documents: Features: Thousands of words, millions of word pairs Surveys – Netflix: 480k users x 177k movies. Dimensionality Reduction. Compress / reduce dimensionality: - PowerPoint PPT PresentationTRANSCRIPT
Dimensionality Reduction
Slides by Jure Leskovec 2
Dimensionality Reduction
• High-dimensional == many features• Find concepts/topics/genres:
– Documents:• Features: Thousands of words, millions of word pairs
– Surveys – Netflix: 480k users x 177k movies
Slides by Jure Leskovec 3
Dimensionality Reduction
• Compress / reduce dimensionality:– 106 rows; 103 columns; no updates– random access to any cell(s); small error: OK
Slides by Jure Leskovec 4
Dimensionality Reduction
• Assumption: Data lies on or near a low d-dimensional subspace
• Axes of this subspace are effective representation of the data
Slides by Jure Leskovec 5
Why Reduce Dimensions?
Why reduce dimensions?• Discover hidden correlations/topics
– Words that occur commonly together• Remove redundant and noisy features
– Not all words are useful• Interpretation and visualization• Easier storage and processing of the data
Slides by Jure Leskovec 6
SVD - Definition
A[m x n] = U[m x r] [ r x r] (V[n x r])T
• A: Input data matrix– m x n matrix (e.g., m documents, n terms)
• U: Left singular vectors – m x r matrix (m documents, r concepts)
• : Singular values– r x r diagonal matrix (strength of each ‘concept’)
(r : rank of the matrix A)• V: Right singular vectors
– n x r matrix (n terms, r concepts)
Slides by Jure Leskovec 7
SVD
Am
n
m
n
U
VT
T
Slides by Jure Leskovec 8
SVD
Am
n
+
1u1v1 2u2v2
σi … scalarui … vectorvi … vector
T
Slides by Jure Leskovec 9
SVD - Properties
It is always possible to decompose a real matrix A into A = U VT , where
• U, , V: unique• U, V: column orthonormal:
– UT U = I; VT V = I (I: identity matrix)– (Cols. are orthogonal unit vectors)
• : diagonal– Entries (singular values) are positive,
and sorted in decreasing order (σ1 σ2 ... 0)
Slides by Jure Leskovec 10
SVD – Example: Users-to-Movies
• A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 3 0 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=
SciFi
Romnce
9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
Slides by Jure Leskovec 11
SVD – Example: Users-to-Movies
• A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
SciFi-conceptRomance-concept
SciFi
Romnce
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
Slides by Jure Leskovec 12
SVD - Example
• A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
SciFi-conceptRomance-concept
U is “user-to-concept” similarity matrix
SciFi
Romnce
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
Slides by Jure Leskovec 13
SVD - Example
• A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
‘strength’ of SciFi-concept
SciFi
Romnce
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
Slides by Jure Leskovec 14
SVD - Example
• A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
V is “movie-to-concept”similarity matrix
SciFi-concept
SciFi
Romnce
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
Slides by Jure Leskovec 15
SVD - Example
• A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
SciFi-concept
SciFi
Romnce
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
V is “movie-to-concept”similarity matrix
Slides by Jure Leskovec 16
SVD - Interpretation #1
‘movies’, ‘users’ and ‘concepts’:• U: user-to-concept similarity matrix
• V: movie-to-concept sim. matrix
• : its diagonal elements: ‘strength’ of each concept
Slides by Jure Leskovec 17
SVD - interpretation #2
SVD gives best axis to project on:
• ‘best’ = min sum of squares of projection errors
• minimum reconstruction error
v1
first right singular vector
Movie 1 rating
Mo
vie
2 ra
tin
g
Slides by Jure Leskovec 18
SVD - Interpretation #2
• A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
v1
=
v1
first right singular vector
Movie 1 rating
Mo
vie
2 ra
tin
g
Slides by Jure Leskovec 19
SVD - Interpretation #2
• A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
variance (‘spread’) on the v1 axis
=
Slides by Jure Leskovec 20
SVD - Interpretation #2
More details• Q: How exactly is dim. reduction done?
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x=
Slides by Jure Leskovec 21
SVD - Interpretation #2
More details• Q: How exactly is dim. reduction done?• A: Set the smallest singular values to zero
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
A=
Slides by Jure Leskovec 22
SVD - Interpretation #2
More details• Q: How exactly is dim. reduction done?• A: Set the smallest singular values to zero
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
9.64 0
0 0x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
A=
~
Slides by Jure Leskovec 23
SVD - Interpretation #2
More details• Q: How exactly is dim. reduction done?• A: Set the smallest singular values to zero:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
9.64 0
0 0x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
A=
~
Slides by Jure Leskovec 24
SVD - Interpretation #2
More details• Q: How exactly is dim. reduction done?• A: Set the smallest singular values to zero:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18
0.36
0.18
0.90
0
00
9.64x
0.58 0.58 0.58 0 0
x
A=
~
Slides by Jure Leskovec 25
SVD - Interpretation #2
More details• Q: How exactly is dim. reduction done?• A: Set the smallest singular values to zero
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
~
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 0 0
0 0 0 0 00 0 0 0 0
A=
B=
Frobenius norm:
ǁMǁF = Σij Mij2
ǁA-BǁF = Σij (Aij-Bij)2
is “small”
Slides by Jure Leskovec 26
A U
Sigma
VT=
B U
Sigma
VT
=
B is approx A
27
SVD – Best Low Rank Approx.• Theorem: Let A = U VT (σ1σ2…, rank(A)=r)
then B = U S VT – S = diagonal nxn matrix where si=σi (i=1…k) else si=0
is a best rank-k approximation to A:– B is solution to minB ǁA-BǁF where rank(B)=k
• We will need 2 facts:– where M = P Q R is SVD of M– U VT - U S VT = U ( - S) VT
Slides by Jure Leskovec
Σ
𝜎 11
𝜎 𝑟𝑟
Slides by Jure Leskovec 28
SVD – Best Low Rank Approx.• We will need 2 facts:
– where M = P Q R is SVD of M
– U VT - U S VT = U ( - S) VT
We apply:-- P column orthonormal-- R row orthonormal-- Q is diagonal
Slides by Jure Leskovec 29
SVD – Best Low Rank Approx.• A = U VT , B = U S VT (σ1σ2… 0, rank(A)=r)
– S = diagonal nxn matrix where si=σi (i=1…k) else si=0
then B is solution to minB ǁA-BǁF , rank(B)=kWhy?
• We want to choose si to minimize we set si=σi (i=1…k) else si=0
r
kii
r
kii
k
iiis s
i1
2
1
2
1
2)(min
r
iiisFFkBrankBsSBA
i1
2
)(,)(minminmin
U VT - U S VT = U ( - S) VT
Slides by Jure Leskovec 30
SVD - Interpretation #2
Equivalent:‘spectral decomposition’ of the matrix:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
= x xu1 u2
σ1
σ2
v1
v2
Slides by Jure Leskovec 31
SVD - Interpretation #2Equivalent:‘spectral decomposition’ of the matrix
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
= u1σ1 vT1 u2σ2 vT
2+ +...
n
m
n x 1 1 x m
k terms
Assume: σ1 σ2 σ3 ... 0
Why is setting small σs the thing to do?Vectors ui and vi are unit length, so σi scales them.So, zeroing small σs introduces less error.
Slides by Jure Leskovec 32
SVD - Interpretation #2
Q: How many σs to keep?A: Rule-of-a thumb:
keep 80-90% of ‘energy’ (=σi2)
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
= u1σ1 vT1 u2σ2 vT
2+ +...n
m
assume: σ1 σ2 σ3 ...
Slides by Jure Leskovec 33
SVD - Complexity
• To compute SVD:– O(nm2) or O(n2m) (whichever is less)
• But:– Less work, if we just want singular values– or if we want first k singular vectors– or if the matrix is sparse
• Implemented in linear algebra packages like– LINPACK, Matlab, SPlus, Mathematica ...
Slides by Jure Leskovec 34
SVD - Conclusions so far
• SVD: A= U VT: unique– U: user-to-concept similarities– V: movie-to-concept similarities– : strength of each concept
• Dimensionality reduction: – keep the few largest singular values
(80-90% of ‘energy’)– SVD: picks up linear correlations
Slides by Jure Leskovec 35
Case study: How to query?
Q: Find users that like ‘Matrix’ and ‘Alien’
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 3 0 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=
SciFi
Romnce
9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
Slides by Jure Leskovec 36
Case study: How to query?
Q: Find users that like ‘Matrix’A: Map query into a ‘concept space’ – how?
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 3 0 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=
SciFi
Romnce
9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
Slides by Jure Leskovec 37
Case study: How to query?
Q: Find users that like ‘Matrix’A: map query vectors into ‘concept space’ – how?
5 0 0 0 0
q=
MatrixAl
ien
v1
q
v2
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
Project into concept space:Inner product with each ‘concept’ vector vi
Slides by Jure Leskovec 38
Case study: How to query?
Q: Find users that like ‘Matrix’A: map the vector into ‘concept space’ – how?
v1
q
q*v1
5 0 0 0 0
q=
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
v2
MatrixAl
ien
Project into concept space:Inner product with each ‘concept’ vector vi
Slides by Jure Leskovec 39
Case study: How to query?
Compactly, we have:qconcept = q V
E.g.: 0.58 0
0.58 0
0.58 0
0 0.71
0 0.71
movie-to-concept similarities
= 2.9 0
SciFi-concept
5 0 0 0 0
q=
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
Slides by Jure Leskovec 40
Case study: How to query?How would the user d that rated
(‘Alien’, ‘Serenity’) be handled?dconcept = d V
E.g.:0.58 0
0.58 0
0.58 0
0 0.71
0 0.71
movie-to-concept similarities
= 5.22 0
SciFi-concept
0 4 5 0 0
d=
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
Slides by Jure Leskovec 41
Case study: How to query?
Observation: User d that rated (‘Alien’, ‘Serenity’) will be similar to query “user” q that rated (‘Matrix’), although d did not rate ‘Matrix’!
0 4 5 0 0
d= 2.9 0
SciFi-concept
5 0 0 0 0
5.22 0
q=
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
Similarity = 0 Similarity ≠ 0
Slides by Jure Leskovec 42
SVD: Drawbacks
+ Optimal low-rank approximation:• in Frobenius norm
- Interpretability problem:– A singular vector specifies a linear
combination of all input columns or rows- Lack of sparsity:
– Singular vectors are dense!
=
U
VT