dimensionality reduction ii: ica
TRANSCRIPT
![Page 1: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/1.jpg)
Dimensionality Reduction II: ICASam Norman-HaignereJan 21, 2016
![Page 2: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/2.jpg)
• Many signals reflect linear mixtures of multiple ‘sources’ :
⇒ Audio signals from multiple speakers
⇒ Neuroimaging measures of neural activity
(e.g. EEG, fMRI, calcium imaging)
• Often want to recover underlying sources from the mixed signal
⇒ ICA algorithms provide general-purpose statistical machinery
(given certain key assumptions)
Motivation
![Page 3: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/3.jpg)
• Several sounds being played simultaneously
• Microphones at different locations record the mixed signal
Classic Example: Cocktail Party Problem
![Page 4: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/4.jpg)
• Several sounds being played simultaneously
• Microphones at different locations record the mixed signal
ICA can recovers individual sounds sources
⇒ Only true if # microphones ≥ # sources
http://research.ics.aalto.fi/ica/cocktail/cocktail_en.cgi
Classic Example: Cocktail Party Problem
![Page 5: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/5.jpg)
Why is this a classic demo?
⇒ Impressive
⇒ Assumptions of ICA fit the problem well:
1. Sound source waveforms close to independent
2. Audio mixing truly linear
3. Sound source waveforms have non-Gaussian amplitude distribution
Classic Example: Cocktail Party Problem
HISTOGRAM OF SPEECH WITH GAUSSIAN OVERLAID
![Page 6: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/6.jpg)
Frequently used to denoise EEG timecourses
⇒ Artifacts (e.g. eye blinks) mostly independent of neural activity and have non-Gaussian amplitude distribution
⇒ EEG channels modeled as linear mixture of artifacts and neural activity
Neuroimaging Examples: EEG
![Page 7: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/7.jpg)
fMRI ‘voxels’ contain hundreds of thousands of neurons
⇒ Plausibly contain neural populations with distinct selectivity
fMRI responses (reflecting blood) approximately linear function of neural activity
⇒ Use component analysis to unmix responses from different neural populations?
Neuroimaging Examples: fMRI
![Page 8: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/8.jpg)
Working Example from My Research (for Convenience)
1. Man speaking2. Flushing toilet3. Pouring liquid4. Tooth-brushing5. Woman speaking6. Car accelerating7. Biting and chewing8. Laughing9. Typing10. Car engine starting11. Running water12. Breathing13. Keys jangling14. Dishes clanking
15. Ringtone16. Microwave17. Dog barking18. Walking (hard surface)19. Road traffic20. Zipper21. Cellphone vibrating22. Water dripping23. Scratching24. Car windows25. Telephone ringing26. Chopping food27. Telephone dialing28. Girl speaking
29. Car horn30. Writing31. Computer startup sound32. Background speech33. Songbird34. Pouring water35. Pop song36. Water boiling37. Guitar38. Coughing39. Crumpling paper40. Siren…
• Measured fMRI responses to 165 natural sounds:
![Page 9: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/9.jpg)
• Measured fMRI responses to 165 natural sounds:
• For each voxel, measure average response to each sound
Working Example from My Research (for Convenience)
20 60 100 1400
1
2
3
20 60 100 1400
1
2
3
20 60 100 1400
1
2
3
All 165 Sounds
Resp
onse
%
Sig
nal C
hang
e
![Page 10: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/10.jpg)
• Measured fMRI responses to 165 natural sounds:
• For each voxel, measure average response to each sound
• Compile all voxel responses into a matrix
Working Example from My Research (for Convenience)
11065 Voxels
165
Soun
ds
Response Magnitude
Hypothesis: Perhaps a small number of neural populations – each with a canonical response to the sound set – explain the response of thousands of voxels?
![Page 11: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/11.jpg)
Linear Model of Voxel Responses
Voxel responses modeled as weighted sum of response profiles
= + + … + So
unds
Response Magnitude
v =NX
i=1
riwiw2w1 w3
r1 r2 r3v
![Page 12: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/12.jpg)
11065 Voxels
165
Soun
ds
11065 Voxels
Matrix Factorization
Number ofComponents
Factor response matrix into set of components, each with:1. Response profile to all 165 sounds2. Voxel weights specifying contribution of each component to each voxel
Response Magnitude
Res
pons
e Pr
ofile
1R
espo
nse
Profi
le 2
…
Voxel Weight Vector 1Voxel Weight Vector 2
…
165
Soun
ds
⇡
![Page 13: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/13.jpg)
11065 Voxels
165
Soun
ds
11065 Voxels
Matrix Factorization
Number ofComponents
Factor response matrix into set of components, each with:1. Response profile to all 165 sounds2. Voxel weights specifying contribution of each component to each voxel
Response Magnitude
Res
pons
e Pr
ofile
1R
espo
nse
Profi
le 2
…
Voxel Weight Vector 1Voxel Weight Vector 2
…
165
Soun
ds
SoundsResponse
SoundsResponse
SoundsResponse
SoundsResponse
⇡
![Page 14: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/14.jpg)
11065 Voxels
165
Soun
ds
11065 Voxels
Matrix Factorization
Number ofComponents
Factor response matrix into set of components, each with:1. Response profile to all 165 sounds2. Voxel weights specifying contribution of each component to each voxel
Response Magnitude
Res
pons
e Pr
ofile
1R
espo
nse
Profi
le 2
…
Voxel Weight Vector 1Voxel Weight Vector 2
…
165
Soun
ds
⇡
![Page 15: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/15.jpg)
11065 Voxels
165
Soun
ds
11065 Voxels
Matrix Factorization
Number ofComponents
Matrix approximation ill-posed (many equally good solutions)⇒ Must be constrained with additional assumptions⇒ Different techniques make different assumptions
Response Magnitude
Res
pons
e Pr
ofile
1R
espo
nse
Profi
le 2
…
Voxel Weight Vector 1Voxel Weight Vector 2
…
165
Soun
ds
⇡
![Page 16: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/16.jpg)
11065 Voxels
165
Soun
ds
11065 Voxels
Principal Components Analysis (PCA)
Number ofComponents
For PCA to infer underlying components, they must:1. Have uncorrelated response profiles and voxel weights2. Explain different amounts of response variance
Response Magnitude
Voxel Weight Vector 1Voxel Weight Vector 2
…
165
Soun
ds
⇡
Res
pons
e Pr
ofile
1R
espo
nse
Profi
le 2
…
![Page 17: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/17.jpg)
11065 Voxels
165
Soun
ds
11065 Voxels
Independent Components Analysis (ICA)
Number ofComponentsResponse Magnitude
Voxel Weight Vector 1Voxel Weight Vector 2
…
165
Soun
ds
⇡
Res
pons
e Pr
ofile
1R
espo
nse
Profi
le 2
…For ICA to infer underlying components, they must:1. Have non-Gaussian and statistically independent voxel weights
![Page 18: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/18.jpg)
An Aside on Statistical IndependenceSaying that voxel weights are independent means:⇒ The weight of one component tells you nothing about the weight of another
Statistical independence a stronger assumption uncorrelatedness
⇒ All independent variables are uncorrelated⇒ Not all uncorrelated variables are independent:
p(w1, w2) = p(w1)p(w2)
r =
![Page 19: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/19.jpg)
An Aside on non-GaussianityMany ways for a distribution to be non-Gaussian:
Voxel Weight
More Sparse
Gaussian
Prob
abilit
y
0.2
-5 0 5
0.4
Non-Gaussian
Less Sparse
-5 0 5Voxel Weight
0.4
0.4Skewed
-5 0 5Voxel Weight
0.2
0.4
![Page 20: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/20.jpg)
Non-Gaussinity and Statistical IndepedenceCentral limit theorem (non-technical):
Sums of independent non-Gaussian distributions become more Gaussian
Consequence: Maximally non-Gaussian projections of the data are more likely to be sources
What if the sources have a Gaussian distribution?
Out of luck: Sums of Gaussian distributions remain Gaussian
![Page 21: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/21.jpg)
0
0.2
0.4
0.6
0.8
1
Piano
Man sp
eaking
Pop song
Voxe
l Res
pons
e
00.2
0.4
00.2
0.4
0
0.5
1
PianoMan speakingPo
p so
ng
Speech-‐selec)ve popula)on Music-‐selec)ve popula)on
Toy Example
![Page 22: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/22.jpg)
0
0.2
0.4
0.6
0.8
1
Piano
Man sp
eaking
Pop song
Voxe
l Res
pons
e
00.2
0.4
00.2
0.4
0
0.5
1
PianoMan speakingPo
p so
ng
Toy Example
Speech-‐selec)ve popula)on Music-‐selec)ve popula)on
![Page 23: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/23.jpg)
0
0.2
0.4
0.6
0.8
1
Piano
Man sp
eaking
Pop song
Voxe
l Res
pons
e
00.2
0.4
00.2
0.4
0
0.5
1
PianoMan speakingPo
p so
ng
Toy Example
Speech-‐selec)ve popula)on Music-‐selec)ve popula)on
![Page 24: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/24.jpg)
0
0.2
0.4
0.6
0.8
1
Piano
Man sp
eaking
Pop song
Voxe
l Res
pons
e
00.2
0.4
00.2
0.4
0
0.5
1
PianoMan speakingPo
p so
ng
Toy Example
Speech-‐selec)ve popula)on Music-‐selec)ve popula)on
![Page 25: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/25.jpg)
0
0.2
0.4
0.6
0.8
1
Piano
Man sp
eaking
Pop song
Voxe
l Res
pons
e
00.2
0.4
00.2
0.4
0
0.5
1
PianoMan speakingPo
p so
ng
Toy Example
Speech-‐selec)ve popula)on Music-‐selec)ve popula)on
![Page 26: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/26.jpg)
Toy Example
02
4
0
2
4
−1
0
1
2
3
4
5
PianoMan Speaking
Pop
Song
![Page 27: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/27.jpg)
Toy Example
02
4
0
2
4
−1
0
1
2
3
4
5
PianoMan Speaking
Pop
Song
“True” dimensions
![Page 28: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/28.jpg)
“True” dimensions
Limita)on of PCA
02
4
0
2
4
−1
0
1
2
3
4
5
PianoMan Speaking
Pop
Song
• PCA infers the right subspace• But the specific directions are misalignedPCA Dimensions
−3 0 3
−3
0
3
Projection on PC1Pr
ojec
tion
on P
C2
![Page 29: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/29.jpg)
02
4
0
2
4
−1
0
1
2
3
4
5
PianoMan Speaking
Pop
Song
−3 0 3
−3
0
3
Projection on PC1Pr
ojec
tion
on P
C2
“True” dimensions
Limita)on of PCA
PCA Dimensions
Can recover the “true” dimensions by rotating in the whitened PCA space
![Page 30: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/30.jpg)
Rota)ng PCA Dimensions
02
4
0
2
4
−1
0
1
2
3
4
5
PianoMan Speaking
Pop
Song
0 0.13 0.25 0.38 0.50
0.2
0.4
0.6
0.8
1
Rotation Angle (Pi Radians)
Inde
pend
ence
(Neg
entro
py)
![Page 31: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/31.jpg)
0 0.13 0.25 0.38 0.50
0.2
0.4
0.6
0.8
1
Rotation Angle (Pi Radians)
Inde
pend
ence
(Neg
entro
py)
02
4
0
2
4
−1
0
1
2
3
4
5
PianoMan Speaking
Pop
Song
Rota)ng PCA Dimensions
![Page 32: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/32.jpg)
0 0.13 0.25 0.38 0.50
0.2
0.4
0.6
0.8
1
Rotation Angle (Pi Radians)
Inde
pend
ence
(Neg
entro
py)
02
4
0
2
4
−1
0
1
2
3
4
5
PianoMan Speaking
Pop
Song
Rota)ng PCA Dimensions
![Page 33: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/33.jpg)
0 0.13 0.25 0.38 0.50
0.2
0.4
0.6
0.8
1
Rotation Angle (Pi Radians)
Inde
pend
ence
(Neg
entro
py)
02
4
0
2
4
−1
0
1
2
3
4
5
PianoMan Speaking
Pop
Song
Rota)ng PCA Dimensions
![Page 34: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/34.jpg)
0 0.13 0.25 0.38 0.50
0.2
0.4
0.6
0.8
1
Rotation Angle (Pi Radians)
Inde
pend
ence
(Neg
entro
py)
02
4
0
2
4
−1
0
1
2
3
4
5
PianoMan Speaking
Pop
Song
Rota)ng PCA Dimensions
![Page 35: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/35.jpg)
−3 0 3
−3
0
3
Projection on IC1Pr
ojec
tion
on IC
20
24
0
2
4
−1
0
1
2
3
4
5
PianoMan Speaking
Pop
Song
ICA Dimensions
ICA rotates PCA components to maximize statistical independence / non-Gaussianity
![Page 36: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/36.jpg)
What if the data is Gaussian?
−20
2
−3−2
−10
12
3−3
−2
−1
0
1
2
3
PianoMan Speaking
Pop
Song
−3 0 3
−3
0
3
PCA1 WeightsPC
A2 W
eigh
ts
PCA Dimensions
![Page 37: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/37.jpg)
−3 0 3
−3
0
3
PCA1 Weights
PCA2
Wei
ghts
For Gaussian distributions
⇒ Projections on PCA components are circularly symmetric
⇒ No “special directions”
For non-Gaussian distributions:
⇒ Can recover latent components by searching for “special directions” that have maximally non-Gaussian projections
−3 0 3
−3
0
3
Projection on PC1
Proj
ectio
n on
PC
2
Gaussian Data Non-Gaussian Data
![Page 38: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/38.jpg)
A Simple 2-Step Recipe
1. PCA: whiten data
⇒ Possibly discard low-variance components
⇒ How many components to discard?
2. ICA: rotate whitened PCA components to maximize non-Gaussianity
⇒ How to measure non-Gaussianity?
⇒ How to maximize your non-Gaussianity measure?
![Page 39: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/39.jpg)
Measuring Non-Gaussianity: Negentropy (gold standard)
• Definition: difference in entropy from a Gaussian
• Gaussian distribution is maximally entropic (for fixed variance)
⇒ All non-Gaussian distributions have positive negentropy
• Maximizing negentropy closely related to minimizing mutual information
• Cons: in practice, can be hard to measure and optimize
J(y) = H(ygauss)�H(y)
![Page 40: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/40.jpg)
Measuring Non-Gaussianity: Kurtosis (approximation)
• Definition: 4th moment of the distribution
• Useful for sparse, ‘heavy tailed’ distributions (which are common)
E[y4]
Weights
More Sparse
Gaussian
Prob
abilit
y
0.2
-5 0 5
0.4
Non-Gaussian
![Page 41: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/41.jpg)
Measuring Non-Gaussianity: Kurtosis (approximation)
• Definition: 4th moment of the distribution
• Useful for sparse, ‘heavy tailed’ distributions (which are common)
⇒ Many audio sources have a sparse distribution of amplitudes
E[y4]
Histogram of Speech Waveform Amplitudes
Prob
abilit
y
Waveform Amplitudes
Bell & Sejnowski, 1995
![Page 42: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/42.jpg)
Measuring Non-Gaussianity: Kurtosis (approximation)
• Definition: 4th moment of the distribution
• Useful for sparse, ‘heavy tailed’ distributions (which are common)
⇒ Many audio sources have a sparse distribution of amplitudes
⇒ Natural images tend to be sparse (e.g. Olshausen & Field, 1997)
• Very easy to measure and optimize
• Cons: only useful if the source distributions are sparse, sensitive to outliers
E[y4]
![Page 43: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/43.jpg)
Measuring Non-Gaussianity: Skew (approximation)
• Definition: 3rd moment of the distribution
• Useful for distributions with a single heavy tail
SkewedProbability
-5 0 5Weights
0.2
0.4
Gaussian Non-Gaussian
E[y3]
![Page 44: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/44.jpg)
Measuring Non-Gaussianity: Skew (approximation)
• Definition: 3rd moment of the distribution
• Useful for distributions with a single heavy tail
• Again easy to measure and optimize
• Only useful if the source distributions are skewed
E[y3]
![Page 45: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/45.jpg)
Measuring Non-Gaussianity
Bottom line:
• Negentropy a general-purpose measure of non-Gaussianity, but often hard to use in practice
• Parametric measures can be more effective if tailored to the non-Gaussianity of the source distribution
![Page 46: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/46.jpg)
Non-Gaussianity Maximization
• Brute-force search
⇒ e.g. iteratively rotate pairs of components to maximize non-Gaussianity
⇒ Easy-to-implement, effective in low-dimensional spaces
• Gradient-based (many variants)
⇒ More complicated to implement, effective in high dimensions
• All optimization algorithms attempt to find local, not global, optima
⇒ Useful to test stability of local optima
⇒ e.g. run algorithm many times from random starting points
![Page 47: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/47.jpg)
A Simple 2-Step Recipe Applied to fMRI Data!
1. PCA: whiten data
⇒ Possibly discard low-variance components
⇒ How many components to discard?
2. ICA: rotate whitened PCA components to maximize non-Gaussianity
⇒ How to measure non-Gaussianity?
⇒ How to maximize your non-Gaussianity measure?
![Page 48: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/48.jpg)
A Simple 2-Step Recipe Applied to fMRI Data!
1. PCA: whiten data
⇒ Possibly discard low-variance components
⇒ How many components to discard?
2. ICA: rotate whitened PCA components to maximize non-Gaussianity
⇒ How to measure non-Gaussianity?
⇒ How to maximize your non-Gaussianity measure?
![Page 49: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/49.jpg)
Choosing the Number of Components
Using cross-validation to select components
⇒ Project voxel responses onto principal components (using subset of data)
⇒ Predict responses from left-out data using different numbers of components
N=6
3 6 9 12 150.3
0.4
0.5
Number of Components
Corr
ela
tion (
r)
Over-fitting
Voxel Prediction Accuracy vs
Number of Components
![Page 50: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/50.jpg)
A Simple 2-Step Recipe Applied to fMRI Data!
1. PCA: whiten data
⇒ Possibly discard low-variance components
⇒ Keep top 6 Components
2. ICA: rotate whitened PCA components to maximize non-Gaussianity
⇒ How to measure non-Gaussianity?
⇒ How to maximize your non-Gaussianity measure?
![Page 51: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/51.jpg)
A Simple 2-Step Recipe Applied to fMRI Data!
1. PCA: whiten data
⇒ Possibly discard low-variance components
⇒ Keep top 6 Components
2. ICA: rotate whitened PCA components to maximize non-Gaussianity
⇒ Use negentropy to measure non-Gaussianity
⇒ Maximize negentropy via brute-force rotation
⇒ Feasible because:
1. Many voxels / data points (>10,000 voxels)
2. Low-dimensional data (just 6 dimensions)
![Page 52: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/52.jpg)
Rotating the Top 6 Principal Components
0
0.05
0.1
0.15
0.2
0.25
0.3
Non
-Gau
ssia
nity
(Neg
entro
py)
0
0.05
0.1
0.15
0.2
0.25
0.3
Principal Components
-45 0 45 -45 0 45
Components Discovered viaIterative Rotation
Single Pair of Components
• Negentropy of principal components can be increased by rotation
• Rotation algorithm discovers clear optimum
![Page 53: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/53.jpg)
We now have 6 dimensions, each with:
1. A response profile (165-dimensional vector)
2. A weight vector, specifying its contribution to each voxel
⇒ Both response profile and anatomy unconstrained
20 60 100 1400
0.5
1
All 165 Sounds
Rela
tive
resp
onse
(au)
Schematic Response Profile Weight Vector
Component Weight
Probing the Inferred Components
![Page 54: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/54.jpg)
All 6 inferred components have interpretable properties⇒ 2 components highly selective for speech and music, respectively
Music
Speech
Music Sounds
Music-Selective Neural Population
Location in the Brain Response to Sounds
Sounds
Res
pons
eM
agni
tude
Res
pons
eM
agni
tude
Location in the Brain Response to Sounds
Speech-Selective Neural Population
Non-Music Sounds
Speech SoundsNon-Speech Sounds
![Page 55: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/55.jpg)
All 6 inferred components have interpretable properties⇒ 2 components highly selective for speech and music, respectively
Music-selectivity highly diluted in raw voxels⇒ fMRI signals likely blur activity from different neural populations
PCA components differ qualitatively from ICA components, with less clear functional/anatomical properties⇒ ICA components have response profiles with substantial correlations
![Page 56: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/56.jpg)
Conclusions
1. Core assumptions of ICA
⇒ Measured signals reflects linear mixture of underlying sources
⇒ Sources are non-Gaussian and statistically independent
2. When core assumptions hold, the method is very effective, and requires few additional assumptions about the nature of the underlying sources
![Page 57: Dimensionality Reduction II: ICA](https://reader030.vdocuments.mx/reader030/viewer/2022012620/61a0a6a22fb0fb6cf0620679/html5/thumbnails/57.jpg)
Exercise: 2-Step Recipe Applied to the Cocktail Party
Dataset
⇒ 5 audio soundtracks mixed from 3 sources
2-step recipe:
1. Project data onto the top 3 principal components
2. Iteratively rotate pairs of components to maximize negentropy
⇒ Listen to the unmixed signals!