bayesian multi-topic microarray analysis with hyperparameter reestimation

32
Tomonari MASADA ( 正正正正 ) NAGASAKI University ( 正正正正 ) [email protected] 1

Upload: tomonari-masada

Post on 27-Jan-2015

115 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

Tomonari MASADA (正田备也 )NAGASAKI University (长崎大学 )

[email protected]

1

Page 2: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

Overview

Problem

Latent Process Decomposition (LPD)

Hyperparameter reestimation (MVB+)

Experiment

Results

Conclusions

2

Page 3: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

Problem

Explain differences among the cells of

different nature (e.g. cancer/normal cells)

by analyzing differences in gene expression

obtained from DNA microarray experiments.

3

Page 4: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

Gene expression

http://bix.ucsd.edu/bioalgorithms/slides.php

Page 5: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

DNA microarray experiment

We can find out

which genes are

used (expressed)

by different types

of cells.

5

Page 6: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

6

Page 7: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

Latent Process Decomposition

8

latent Dirichlet allocation

(LDA)[Blei et al. 01]

latent process decomposition

(LPD)[Rogers et al. 05]

text mining microarray analysis

document sample

word gene

word frequency gene expression level

latent topic latent process

Page 8: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

LPD as a multi-topic model

row = gene, column = sample, color = process

9

Page 9: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

LPD as a generative model

For each sample d, draw a multinomial θd from

a Dirichlet prior Dir(α)

θd : mixing proportions of processes for sample d

For each gene g in each sample d,

Draw a process k from Mult(θd)

Draw a real number from Gaussian N(μgk, λgk)

10

Page 10: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

Inference by VB [Rogers et al. 05]

Variational Bayesian inference

VB is used when EM cannot be used.

Instead of log likelihood,

variational lower bound is maximized.

11

Page 11: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

Variational lower bound

12

gd

gzdggzgz

kg

bagk

a

kg

gk

d k

ndkK

dgdgdggk

dk

xe

a

b

K

bap

,

2

,

1

0

0

,

20001

0000

2

)(exp

2)(

2

)(exp

2)(

)(

),,,,,,,,(

00

0

zx

dddq

bapq

bap

z z

zxz

x

),,,(

),,,,,,,,(log),,,(

),,,,(log

0000

0000

Page 12: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

Inference by MVB [Ying et al. 08]

Marginalized variational Bayesian inference

Marginalizes multinomial parameters

Achieves less approximation than VB

cf. Collapsed variational Bayesian inference

for LDA [Teh et al. 06]

13

Page 13: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

Marginalization in MVB

14

gd

gzdggzgz

kg

bagk

a

kg

gk

d k dk

k dk

K

dgdgdggkx

ea

b

Kn

nK

dbapbap

,

2

,

1

0

0

,

2000

00000000

2

)(exp

2)(

2

)(exp

2)(

)(

)(

)(

),,,,,,,,(),,,,,,,(

00

0

zxzx

ddq

bapq

bap

z z

zxz

x

),,(

),,,,,,,(log),,(

),,,,(log

0000

0000

Page 14: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

15

02

0

00

0

2

2

,,

,,

,,

1

2

1

2

1

1

1

2log)(

2

1

)(2

)1()1(log

bmxl

b

aa

b

xa

lm

b

al

mxlb

aba

dgkdg

gkdgkgk

ddgkgk

gk

d dgdgkgk

gkgk

gk

d dgkgkgk

gkdggkgk

gkgkgk

dgkkgd kgd

dgkdgkkgd kgdkgd

dgkkgd

kgddgk

Page 15: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

Our proposal: MVB+

MVB with hyperparameter reestimation

Empirical Bayes method

○ Estimate hyperparameters by maximizing

variational lower bound

Hand-tuned hyperparameter values often result

in poor quality of inference.

16

Page 16: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

Update formulas in MVB+

17

GK

mkg gk

,0

g k gkgk ba

GKab 0

0

00 log

log)()( b

GK

baa g k gkgk

Inversion of digamma function is required.

Page 17: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

Hyperparameter reestimation

An outstanding trend in Bayesian modeling?

[Asuncion et al. UAI’09]

○ Reestimate hyperparameters of LDA

○ Overturn our common sense!

before: “VB < CVB < CGS”

after: “VB = CVB = CGS” (in perplexity)

[Masada et al. CIKM’09 (poster, to appear)]

18

Page 18: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

Experiments

Datasets available from Web

LK: Leukemia ( 白血病 , 백혈병 )

○ http://www.broadinstitute.org/cgi-bin/cancer/publications/pub_paper.cgi?mode=view&paper_id=63

D1: "Five types of breast cancer”

D2: "Three types of bladder cancer”

D3: "Healthy tissues”

○ http://www.ihes.fr/~zinovyev/princmanif2006/

19

Page 19: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

Data specifications

20

Dataset name (abbreviation) # of samples # of genes

Leukemia (LK) 72 12582

Five types of breast cancer (D1) 286 17816

Three types of bladder cancer (D2) 40 3036

Healthy tissues (D3) 103 10383

Page 20: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

Results

1. Can we achieve inference of better quality?

2. Can we achieve better sample clustering?

3. Are there any qualitative differences

between MVB and MVB+?

21

Page 21: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

22

LK

# of iterations

low

er b

ound

Page 22: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

23

D1

Page 23: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

24

D2

Page 24: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

25

D3

Page 25: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

26

LK

# of processes

low

er b

ound

(afte

r co

nve

rge

nce

)

Page 26: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

27

D1

Page 27: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

28

D2

Page 28: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

29

D3

Page 29: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

Sample clustering evaluation

30

dataset method precision recall F-score

LKMVB+ 0.934+0.007 0.931+0.010 0.932+0.009

MVB 0.930+0.000 0.924+0.000 0.927+0.000

D2MVB+ 0.837+0.038 0.822+0.032 0.829+0.033

MVB 0.779+0.084 0.751+0.069 0.763+0.071

(averaged over 100 trials)

Page 30: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

Qualitative difference (LK)

row = gene, column = sample

MVB+ can preserve diversity of genes31

MVB+ MVB

Page 31: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

Conclusions

Formulas for hyperparameter reestimation

Improvement in inference quality

Larger variational lower bounds

Better sample clustering

Gene diversity preservation

33

Page 32: Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

Future work

Use more data to prove efficiency

Devise collapsed Gibbs sampling for LPD

Accelerate computations

OpenMP, Nvidia CUDA

Provide a method for gene clustering

34