blind information processing: microarray data hyejin kim, dukhee kimseungjin choi department of...

18
Blind Information Processing: Blind Information Processing: Microarray Data Microarray Data Hyejin Kim , Dukhee KimSeungjin Choi Hyejin Kim , Dukhee KimSeungjin Choi Department of Computer Science and Engineerin g, Department of Chemical Engineering POSTECH, Korea

Upload: patricia-eaton

Post on 29-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical

Blind Information Processing: Blind Information Processing: Microarray Data Microarray Data

Hyejin Kim , Dukhee KimSeungjin ChoiHyejin Kim , Dukhee KimSeungjin Choi

Department of Computer Science and Engineering,

Department of Chemical Engineering

POSTECH, Korea

Page 2: Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical

Outline

Blind Information Processing? Independent Component Analysis (ICA)

Application of ICA to Microarray Data Time courses

Yeast cell cycle data

Page 3: Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical

Information Processing

Blind InformationProcessing

Little Prior Knowledge

Page 4: Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical

Latent Variable Models

Data Space(observation)

Latent Variable Space

Generative Model(FA, PPCA, ICA, GTM)

Recognition Model(PCA, ICA, SOM)

Page 5: Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical

What is ICA?

ICA is a statistical method, the goal of which is to decompose given

multivariate data into a linear sum of statistically independent

components.

For example, given two-dimensional vector , x = [ x1 x2 ] T , ICA aims

at finding the following decomposition

saasa

axx

222

121

21

11

2

1

2211 ss aax

where a1, a2 are basis vectors and s1, s2 are basis coefficients

Constraint: Basis coefficients s1 and s2 are statistically independent.

Page 6: Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical

Information Geometry of ICA

s

y

yp

Mutual information

Marginal mismatch Product manifold

Page 7: Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical

PCA vs ICA

Linear Transform Compression Classification

PCA Orthogonal transform Second-order statistics Optimal coding in MS sense

ICA Non-orthogonal transform Higher-order statistics Related to the projection pursuit Better than PCA in classification task?

Page 8: Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical

Example of PCA

Page 9: Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical

PCA vs ICA

PCA(orthogonal coordinate)

ICA(non-orthogonal coordinate)

Page 10: Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical

PCA vs ICA

x1 x2ICA

PCA

Page 11: Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical

Microarray Data (1)

Page 12: Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical

Microarray Data Analysis(1)

gene influence profile

Expression mode of a sample

x=

gen

e

ge

ne

sample

sample

influence

influ

en

ce

0 100 200-0.05

0

0.05

cdc28

mode 2

0 50 100 150 200-0.2

0

0.2

cdc28

mode 1

gene expression profile

Page 13: Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical

ICA: Time Courses (1)

Time courses Yeast cell cycle data

77 by 6178 ORF expression ( Spellman et al. 1998 )

Each mode shows specific cell-cycle behavior

ICA modes remain inactive within some of the experi

ments

Dimension reduction improve a prediction of cell-cycl

e regulated genes

Page 14: Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical

ICA: Time Courses (2)

by Liebermeister

Mode176 components

Mode276 components

Mode112 components

Mode112 components

alpha elucidationcdc15 cdc28

Page 15: Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical

PCA Results

0 10 20 30 40 50 60 70 800

0.05

0.1

0.15

0.2

0.25

0.3

0.35

PC ratio

Page 16: Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical

ICA Results(I)

0 20 40 60 80 100 120 140 160-0.5

0

0.5

1fastICA 6 comp4

0 20 40 60 80 100 120 140 160-0.5

0

0.5fastICA 6comp6

Page 17: Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical

ICA Results (II)

Page 18: Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical

Conclusion

Linear models of gene expression Model assumptions

Matrix decomposition is simultaneously To interpret expression pattern and

To cluster co-activated genes

ICA advantage More biological meaningful analysis

No order, No orthogonality

More sensitive to detect expression pattern