subspace differential coexpression analysis for the discovery of disease-related dysregulations gang...

16
Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach, Chad L. Myers and Vipin Kumar [email protected] http://www-users.cs.umn.edu/~kumar/dmbio/ Department of Computer Science and Engineering RECOMB Systems Biology 12/05/2009

Post on 21-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,

Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations

Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach, Chad L. Myers and Vipin Kumar

[email protected]

http://www-users.cs.umn.edu/~kumar/dmbio/

Department of Computer Science and Engineering

RECOMB Systems Biology 12/05/2009

Page 2: Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,

• Differential Expression (DE)– Traditional analysis targets the changes of

expression level

Differential Expression (DE)

Expression over samples in controls and cases

Exp

ress

ion

le

vel

controls cases

[Golub et al., 1999], [Pan 2002], [Cui and Churchill, 2003] etc.

gene

s

controls cases

[Kostka & Spang, 2005]

Page 3: Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,

Matrix of expression values

• Differential Coexpression (DC)– Targets changes of the coherence of expression

controls casesQuestion: Is this gene interesting, i.e. associated w/ the phenotype?

Answer: No, in term of differential expression (DE).

However, what if there are another two genes ……?

Yes! Expression over samples in controls and cases

Differential Coexpression (DC)

[Silva et al., 1995], [Li, 2002], [Kostka & Spang, 2005], [Rosemary et al., 2008], [Cho et al. 2009] etc.

Biological interpretations of DC: Dysregulation of pathways, mutation of transcriptional factors, etc.

gene

s

controls cases

[Kostka & Spang, 2005]

Page 4: Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,

• Existing work on differential coexpression– Pairs of genes with differential coexpression

• [Silva et al., 1995], [Li, 2002], [Li et al., 2003], [Lai et al. 2004]

– Clustering based differential coexpression analysis• [Ihmels et al., 2005], [Watson., 2006]

– Network based analysis of differential coexpression• [Zhang and Horvath, 2005], [Choi et al., 2005], [Gargalovic et al. 2006],

[Oldham et al. 2006], [Fuller et al., 2007], [Xu et al., 2008]

– Beyond pair-wise (size-k) differential coexpression• [Kostka and Spang., 2004], [Prieto et al., 2006]

– Gene-pathway differential coexpression • [Rosemary et al., 2008]

– Pathway-pathway differential coexpression • [Cho et al., 2009]

Differential Coexpression (DC)

Page 5: Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,

• Full-space differential coexpression

• May have limitations due to the heterogeneity of– Causes of a disease (e.g. genetic difference)

– Populations affected (e.g. demographic difference)

Existing DC work is “full-space”

Motivation:Such subspace patterns may be missed by full-space models

Full-space measures: e.g. correlation difference

Page 6: Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,

• Definition of Subspace Differential Coexpression Pattern– A set of k genes = {g1, g2 ,…, gk}– : Fraction of samples in class A, on which the k genes are coexpressed– : Fraction of samples in class B, on which the k genes are coexpressed

Extension to Subspace Differential Coexpression

Details in [Fang, Kuang, Pandey, Steinbach, Myers and Kumar, PSB 2010]

as a measure of subspace differential coexpression

Problem: given n genes, find all the subsets of genes, s.t. SDC≥d

Given n genes, there are 2n

candidates of SDC pattern!How to effectively handle the

combinatorial search space?

Similar motivation and challenge as biclustering, but here

differetial biclustering !

Page 7: Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,

Direct Mining of Differential Patterns

[Fang, Pandey, Gupta, Steinbach and Kumar, TR 09-011, CS@UMN]

Refined SDC measure: “direct”

A measure M is antimonotonic if V A,B: A B M(A) >= M(B)

Details in [Fang, Kuang, Pandey, Steinbach, Myers and Kumar, PSB 2010]

>>

Page 8: Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,

Advantages: 1) Systematic & direct 2) Completeness 3) Efficiency

null

AB AC AD AE BC BD BE CD CE DE

A B C D E

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

ABCDE

An Association-analysis Approachsystematic and efficient combinatorial search

[ Agrawal et al. 1994]

null

AB AC AD AE BC BD BE CD CE DE

A B C D E

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

ABCDE

Refined SDC measure

A measure M is antimonotonic if V A,B: A B M(A) >= M(B)

Disqualified

Prune all the supersets

Page 9: Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,

• Three lung cancer datasets– [Bhattacharjee et al. 2001], [Stearman et al. 2005], [Su et al. 2007]

• All are from Affymetrix microarrays (first two: HG-U95A, and the third: HG-U133A)

– Lung cancer samples & normal samples

• Combined dataset– More samples– Proper normalizations before combining: (RMA, DWD, XPN)– Lung cancer samples (102)– normal samples (67)

Validation

RMA [Irizarry et al., 2003], DWD [Benito et al., 2004], XPN [Shabalin et al., 2008]

Page 10: Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,

Statistical SignificancePhenotype permutation test (n=1000 )

A

B

C

Page 11: Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,

Could Subspace DC patterns have been discovered in full-space?

Full-space DC measures

DC (Differential Coexpression)

Sub

spac

e D

C m

easu

res

Phenotype permutation based significant cutoff for the full-space measure

88 statistically significant size-3 patterns (stars)

Can also be found in full-space

Can NOT be found in full-space

Page 12: Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,

A 10-gene Subspace DC Pattern

www. ingenuity.com: enriched Ingenuity subnetwork

≈ 60%≈ 10%

Enriched with the TNF-α/NFkB signaling pathway (6/10 overlap with the pathway, P-value: 1.4*10-5)

Suggests that the dysregulation of TNF-α/NFkB pathway may be related to lung cancer

Page 13: Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,

• Specific interpretation– Enriched cancer-related signaling pathways

• TNF-α/NFkB• WNT

– Target gene sets of cancer-related microRNA & TFs• microRNA:

– miR-101 ({PIK3C2B,TSC22D1} + AKAP12)

• Transcriptional factor (TF): – ATF2 ({ETV4,PTHLH} + CBX5)

Biological Interpretations

miR-101 is shown down-regulated in cancer [Friedman et al 2009]

Mutations of ATF2 are shown to be related to cancer [Woo et al. 2002]

Page 14: Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,

• Summary– Proposed the problem definition & a systematic approach for subspace DC

– Subspace DC analysis can identify many statistically significant & biologically relevant patterns that would have been missed in full-space

• Potential Biomedical utility– Study the demographic and genetic difference within each class

– Phenotype classification with subspace DC patterns• Combine DE and Subspace DC patterns

Summary & Future Directions

DE (Differential Expression); DC (Differential Coexpression)

Compare Compare

Page 15: Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,

• Co-authors at Dept. Computer Science, Univ. of Minnesota

• Conference organizers

NSF grants#CRI-0551551#IIS-0308264

#ITR-0325949UMR-IBM-Mayo BICB Fellowship

Acknowledgement

Rui Kuang

Gaurav Pandey

Michael Steinbach

Chad Myers

Vipin Kumar

Data Mining for Biomedical Informatics Group

Comp. Bio. Group

Comp. Bio. & Func. Genomic Group

Page 16: Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,

• Paper – Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,

Chad L. Myers and Vipin Kumar, Subspace Differential Coexpression Analysis: Problem Definition and a General Approach Proceedings of 15th Pacific Symposium on Biocomputing, 2010

• Source codes: http://vk.cs.umn.edu/SDC

• Questions:– Gang Fang: [email protected]

Thanks!