blind information processing: microarray data hyejin kim, dukhee kimseungjin choi department of...
TRANSCRIPT
Blind Information Processing: Blind Information Processing: Microarray Data Microarray Data
Hyejin Kim , Dukhee KimSeungjin ChoiHyejin Kim , Dukhee KimSeungjin Choi
Department of Computer Science and Engineering,
Department of Chemical Engineering
POSTECH, Korea
Outline
Blind Information Processing? Independent Component Analysis (ICA)
Application of ICA to Microarray Data Time courses
Yeast cell cycle data
Information Processing
Blind InformationProcessing
Little Prior Knowledge
Latent Variable Models
Data Space(observation)
Latent Variable Space
Generative Model(FA, PPCA, ICA, GTM)
Recognition Model(PCA, ICA, SOM)
What is ICA?
ICA is a statistical method, the goal of which is to decompose given
multivariate data into a linear sum of statistically independent
components.
For example, given two-dimensional vector , x = [ x1 x2 ] T , ICA aims
at finding the following decomposition
saasa
axx
222
121
21
11
2
1
2211 ss aax
where a1, a2 are basis vectors and s1, s2 are basis coefficients
Constraint: Basis coefficients s1 and s2 are statistically independent.
Information Geometry of ICA
s
y
yp
Mutual information
Marginal mismatch Product manifold
PCA vs ICA
Linear Transform Compression Classification
PCA Orthogonal transform Second-order statistics Optimal coding in MS sense
ICA Non-orthogonal transform Higher-order statistics Related to the projection pursuit Better than PCA in classification task?
Example of PCA
PCA vs ICA
PCA(orthogonal coordinate)
ICA(non-orthogonal coordinate)
PCA vs ICA
x1 x2ICA
PCA
Microarray Data (1)
Microarray Data Analysis(1)
gene influence profile
Expression mode of a sample
x=
gen
e
ge
ne
sample
sample
influence
influ
en
ce
0 100 200-0.05
0
0.05
cdc28
mode 2
0 50 100 150 200-0.2
0
0.2
cdc28
mode 1
gene expression profile
ICA: Time Courses (1)
Time courses Yeast cell cycle data
77 by 6178 ORF expression ( Spellman et al. 1998 )
Each mode shows specific cell-cycle behavior
ICA modes remain inactive within some of the experi
ments
Dimension reduction improve a prediction of cell-cycl
e regulated genes
ICA: Time Courses (2)
by Liebermeister
Mode176 components
Mode276 components
Mode112 components
Mode112 components
alpha elucidationcdc15 cdc28
PCA Results
0 10 20 30 40 50 60 70 800
0.05
0.1
0.15
0.2
0.25
0.3
0.35
PC ratio
ICA Results(I)
0 20 40 60 80 100 120 140 160-0.5
0
0.5
1fastICA 6 comp4
0 20 40 60 80 100 120 140 160-0.5
0
0.5fastICA 6comp6
ICA Results (II)
Conclusion
Linear models of gene expression Model assumptions
Matrix decomposition is simultaneously To interpret expression pattern and
To cluster co-activated genes
ICA advantage More biological meaningful analysis
No order, No orthogonality
More sensitive to detect expression pattern