![Page 1: Principal Component Analysisfiles.meetup.com/18665604/PWLB161216-PCA.pdf · Principal Component Analysis – the original paper ADRIAN FLOREA PAPERS WE LOVE – BUCHAREST CHAPTER](https://reader033.vdocuments.mx/reader033/viewer/2022050418/5f8d89c5b11f266f02536506/html5/thumbnails/1.jpg)
Principal Component Analysis – the original paper ADRIAN FLOREA
PAPERS WE LOVE – BUCHAREST CHAPTER – MEETUP #12
DECEMBER 16 TH, 2016
![Page 2: Principal Component Analysisfiles.meetup.com/18665604/PWLB161216-PCA.pdf · Principal Component Analysis – the original paper ADRIAN FLOREA PAPERS WE LOVE – BUCHAREST CHAPTER](https://reader033.vdocuments.mx/reader033/viewer/2022050418/5f8d89c5b11f266f02536506/html5/thumbnails/2.jpg)
Karl Pearson, 1901
K. Pearson, "On lines and planes of closest fit to systems of points in space", The London, Edinburgh and Dublin Philosophical Magazine and Journal of Science, Sixth Series, 2, pp. 559-572 (1901)
(1857-1936)
![Page 3: Principal Component Analysisfiles.meetup.com/18665604/PWLB161216-PCA.pdf · Principal Component Analysis – the original paper ADRIAN FLOREA PAPERS WE LOVE – BUCHAREST CHAPTER](https://reader033.vdocuments.mx/reader033/viewer/2022050418/5f8d89c5b11f266f02536506/html5/thumbnails/3.jpg)
Dimensionality reduction (Pearson’s words)
(1901)
![Page 4: Principal Component Analysisfiles.meetup.com/18665604/PWLB161216-PCA.pdf · Principal Component Analysis – the original paper ADRIAN FLOREA PAPERS WE LOVE – BUCHAREST CHAPTER](https://reader033.vdocuments.mx/reader033/viewer/2022050418/5f8d89c5b11f266f02536506/html5/thumbnails/4.jpg)
Dimensionality reduction (115 years later)
(2016)
71291 points 200 features 2 principal components
https://projector.tensorflow.org
![Page 5: Principal Component Analysisfiles.meetup.com/18665604/PWLB161216-PCA.pdf · Principal Component Analysis – the original paper ADRIAN FLOREA PAPERS WE LOVE – BUCHAREST CHAPTER](https://reader033.vdocuments.mx/reader033/viewer/2022050418/5f8d89c5b11f266f02536506/html5/thumbnails/5.jpg)
Dimensionality reduction
n
d
n
r
• n observations • d possibly correlated features
• r << d • n observations • r uncorrelated features • retain most of the data variability
transform
Original data set
New data set
![Page 6: Principal Component Analysisfiles.meetup.com/18665604/PWLB161216-PCA.pdf · Principal Component Analysis – the original paper ADRIAN FLOREA PAPERS WE LOVE – BUCHAREST CHAPTER](https://reader033.vdocuments.mx/reader033/viewer/2022050418/5f8d89c5b11f266f02536506/html5/thumbnails/6.jpg)
Introduction
niRxxxx
xxxX
dT
idiii
T
n
,1,)(
)(
21
21
),,,span( s.t. ,},,1|{
),,,span(
21,
21
djij
T
i
d
i
d
d
uuuxuudiRu
eeexRx
xUaaUUxU
UUUU
aUxuauauaxuuux TTT
T
ddd
1
221121
orthogonal is basis lorthonorma an are of Columns
),,,span(
![Page 7: Principal Component Analysisfiles.meetup.com/18665604/PWLB161216-PCA.pdf · Principal Component Analysis – the original paper ADRIAN FLOREA PAPERS WE LOVE – BUCHAREST CHAPTER](https://reader033.vdocuments.mx/reader033/viewer/2022050418/5f8d89c5b11f266f02536506/html5/thumbnails/7.jpg)
Goal: find an “optimal” basis
),,,(
)(
),,,(
)(
21
21
21
21
d
T
dU
d
T
d
uuuspan
aaa
eeespan
xxxT
=
r << d p
rese
rves
mo
st o
f “v
aria
bili
ty”
xPxxUUx
xUaxUa
aUx
uauauax
r
T
rrT
rr
T
rr
rr
'''
' 2211
'x
ra
![Page 8: Principal Component Analysisfiles.meetup.com/18665604/PWLB161216-PCA.pdf · Principal Component Analysis – the original paper ADRIAN FLOREA PAPERS WE LOVE – BUCHAREST CHAPTER](https://reader033.vdocuments.mx/reader033/viewer/2022050418/5f8d89c5b11f266f02536506/html5/thumbnails/8.jpg)
r=1 Find unitary that maximizes the variance of the projected points on it.
Projection of on
1 , uuuT
ix :u
uauxuxxuxu
xuxuxxx ii
T
ii
T
i
i
T
iii
)('||||||||
||||),cos(||||||'||
uuuXuuxxn
uuxxun
xuxun
xun
xun
an
TTn
i
T
ii
n
i
TT
ii
T
u
n
i
T
i
T
i
Tn
i i
Tn
i i
Tn
i uiu
)cov()1
()(1
))((1
)(1
)0(1
)(1
11
2
11
2
1
2
1
22
1 ,max uuuuTT
u
0 Assume u
![Page 9: Principal Component Analysisfiles.meetup.com/18665604/PWLB161216-PCA.pdf · Principal Component Analysis – the original paper ADRIAN FLOREA PAPERS WE LOVE – BUCHAREST CHAPTER](https://reader033.vdocuments.mx/reader033/viewer/2022050418/5f8d89c5b11f266f02536506/html5/thumbnails/9.jpg)
First principal component
0220)(0),(
),(max ),1(),(,u
uuuuuuuu
uL
uLuuuuuL
TT
TT
uu is an eigenvalue of the covariance matrix with the associated eigenvector u
1
22maxmax
uuu
TT
uuuuu
The largest eigenvalue is the projected variance on the associated eigenvector that is the direction of most variance, called the first principal component.
![Page 10: Principal Component Analysisfiles.meetup.com/18665604/PWLB161216-PCA.pdf · Principal Component Analysis – the original paper ADRIAN FLOREA PAPERS WE LOVE – BUCHAREST CHAPTER](https://reader033.vdocuments.mx/reader033/viewer/2022050418/5f8d89c5b11f266f02536506/html5/thumbnails/10.jpg)
![Page 11: Principal Component Analysisfiles.meetup.com/18665604/PWLB161216-PCA.pdf · Principal Component Analysis – the original paper ADRIAN FLOREA PAPERS WE LOVE – BUCHAREST CHAPTER](https://reader033.vdocuments.mx/reader033/viewer/2022050418/5f8d89c5b11f266f02536506/html5/thumbnails/11.jpg)
1
)(min
uu
uMSE
T
u The equivalent optimization problem solved by Pearson
![Page 12: Principal Component Analysisfiles.meetup.com/18665604/PWLB161216-PCA.pdf · Principal Component Analysis – the original paper ADRIAN FLOREA PAPERS WE LOVE – BUCHAREST CHAPTER](https://reader033.vdocuments.mx/reader033/viewer/2022050418/5f8d89c5b11f266f02536506/html5/thumbnails/12.jpg)
An equivalent objective function
n
i i
T
ii
T
iii
n
i i
T
iii
n
i i xxxxxn
xxxxn
xxn
uMSE1
2
1
2
1)'''2||(||
1)'()'(
1||'||
1)(
But:
n
i i
TT
i
T
i
TT
iii
T
i uxuuxuuxuxxn
uMSEuxux1
2 ))())(()(2||(||1
)()('
)))((||||(1
)))(())((2||(||1 2
11
2 uxxuxn
uuuxxuuxxuxn
T
ii
T
i
n
i
n
i
TT
ii
TT
ii
T
i
2
const
1
22
1maxmax)(min)(min
||||))(||||(
1uu
T
u
T
uu
n
i
iTT
ii
T
i
n
iuuuuuMSE
n
xuuuxxux
n
Maximizing the projected variance is equivalent with minimizing the mean squared error
2max)(min
uuuuMSE
![Page 13: Principal Component Analysisfiles.meetup.com/18665604/PWLB161216-PCA.pdf · Principal Component Analysis – the original paper ADRIAN FLOREA PAPERS WE LOVE – BUCHAREST CHAPTER](https://reader033.vdocuments.mx/reader033/viewer/2022050418/5f8d89c5b11f266f02536506/html5/thumbnails/13.jpg)
r=2 We have found the vector of the most variance (r = 1 case) 1u
0
1
max
1
2
uv
vv
T
T
vv
vuvuuvuvuvvu
uuvuvuuvvv
L
uvvvvvvL
TTTTTT
TTT
TTT
11111111
11111
1
2202222
220220
)1(),,(
vv So is an eigenvector of v
2
2maxmax
uvvthe second largest eigenvalue of
The second principal component is the eigenvector associated with the second largest eigenvalue, 2
The total variance is invariant to a change of basis!
d
i i
d
i i 11
2
![Page 14: Principal Component Analysisfiles.meetup.com/18665604/PWLB161216-PCA.pdf · Principal Component Analysis – the original paper ADRIAN FLOREA PAPERS WE LOVE – BUCHAREST CHAPTER](https://reader033.vdocuments.mx/reader033/viewer/2022050418/5f8d89c5b11f266f02536506/html5/thumbnails/14.jpg)
PCA algorithm 1. Normalize to zero-mean, unit-variance
2. Calculate the covariance matrix of the normalized data set
3. Find all eigenvalues of and arrange them in descending order
4. Choose r for the top r eigenvalues (and corresponding eigenvectors/principal components)
021 d
rr uuuU 21 the reduced basis (of principal components)
,nixUa i
T
ri 1 , the reduced dimensionality data set
![Page 15: Principal Component Analysisfiles.meetup.com/18665604/PWLB161216-PCA.pdf · Principal Component Analysis – the original paper ADRIAN FLOREA PAPERS WE LOVE – BUCHAREST CHAPTER](https://reader033.vdocuments.mx/reader033/viewer/2022050418/5f8d89c5b11f266f02536506/html5/thumbnails/15.jpg)
How to choose r?
]9.0,7.0[ ,|min
1
1
di
iir
Kaiser criterion (Henry Felix Kaiser, 1960) }1|min{ iir
}|min{ 1
dir d
i
screeplot(modelname)
![Page 16: Principal Component Analysisfiles.meetup.com/18665604/PWLB161216-PCA.pdf · Principal Component Analysis – the original paper ADRIAN FLOREA PAPERS WE LOVE – BUCHAREST CHAPTER](https://reader033.vdocuments.mx/reader033/viewer/2022050418/5f8d89c5b11f266f02536506/html5/thumbnails/16.jpg)
PCA Pros & Cons PROS
• dimensionality reduction can help learning
• removes noise
• can deal with large data sets
CONS
• hard to interpret
• sample dependent
• linear (
![Page 17: Principal Component Analysisfiles.meetup.com/18665604/PWLB161216-PCA.pdf · Principal Component Analysis – the original paper ADRIAN FLOREA PAPERS WE LOVE – BUCHAREST CHAPTER](https://reader033.vdocuments.mx/reader033/viewer/2022050418/5f8d89c5b11f266f02536506/html5/thumbnails/17.jpg)
Bibliography • A.A. Farag, S. Elhabian, "A Tutorial on Principal Component Analysis" (2009) - http://dai.fmph.uniba.sk/courses/ml/sl/PCA.pdf
• N. de Freitas, "CPSC 340 - Machine Learning and Data Mining Lectures", University of British Columbia (2012) - http://www.cs.ubc.ca/~nando/340-2012/lectures.php
• I. Georgescu, “Inteligență computațională“, Editura ASE (2015)
• I.T. Jolliffe, "Principal Component Analysis", 2nd ed., Springer (2002)
• J.N. Kutz, "Data-Driven Modeling & Scientific Computation. Methods for Complex Systems & Big Data", Oxford University Press (2013)
•J. N. Kutz, "Singular Value Decomposition" - http://courses.washington.edu/am301/page1/page15/video.html
• K. Pearson, "On lines and planes of closest to systems of points in space", Philos. Mag, Sixth Series, 2, pp. 559-572 (1901) - http://stat.smmu.edu.cn/history/pearson1901.pdf
• M.J. Zaki, W. Meira Jr., “Data Mining and Analysis. Fundamental Concepts and Algorithms”, Cambridge University Press (2014)