subspace differential coexpression analysis for the discovery of disease-related dysregulations gang...
Post on 21-Dec-2015
217 views
TRANSCRIPT
Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations
Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach, Chad L. Myers and Vipin Kumar
http://www-users.cs.umn.edu/~kumar/dmbio/
Department of Computer Science and Engineering
RECOMB Systems Biology 12/05/2009
• Differential Expression (DE)– Traditional analysis targets the changes of
expression level
Differential Expression (DE)
Expression over samples in controls and cases
Exp
ress
ion
le
vel
controls cases
[Golub et al., 1999], [Pan 2002], [Cui and Churchill, 2003] etc.
gene
s
controls cases
[Kostka & Spang, 2005]
Matrix of expression values
• Differential Coexpression (DC)– Targets changes of the coherence of expression
controls casesQuestion: Is this gene interesting, i.e. associated w/ the phenotype?
Answer: No, in term of differential expression (DE).
However, what if there are another two genes ……?
Yes! Expression over samples in controls and cases
Differential Coexpression (DC)
[Silva et al., 1995], [Li, 2002], [Kostka & Spang, 2005], [Rosemary et al., 2008], [Cho et al. 2009] etc.
Biological interpretations of DC: Dysregulation of pathways, mutation of transcriptional factors, etc.
gene
s
controls cases
[Kostka & Spang, 2005]
• Existing work on differential coexpression– Pairs of genes with differential coexpression
• [Silva et al., 1995], [Li, 2002], [Li et al., 2003], [Lai et al. 2004]
– Clustering based differential coexpression analysis• [Ihmels et al., 2005], [Watson., 2006]
– Network based analysis of differential coexpression• [Zhang and Horvath, 2005], [Choi et al., 2005], [Gargalovic et al. 2006],
[Oldham et al. 2006], [Fuller et al., 2007], [Xu et al., 2008]
– Beyond pair-wise (size-k) differential coexpression• [Kostka and Spang., 2004], [Prieto et al., 2006]
– Gene-pathway differential coexpression • [Rosemary et al., 2008]
– Pathway-pathway differential coexpression • [Cho et al., 2009]
Differential Coexpression (DC)
• Full-space differential coexpression
• May have limitations due to the heterogeneity of– Causes of a disease (e.g. genetic difference)
– Populations affected (e.g. demographic difference)
Existing DC work is “full-space”
Motivation:Such subspace patterns may be missed by full-space models
Full-space measures: e.g. correlation difference
• Definition of Subspace Differential Coexpression Pattern– A set of k genes = {g1, g2 ,…, gk}– : Fraction of samples in class A, on which the k genes are coexpressed– : Fraction of samples in class B, on which the k genes are coexpressed
Extension to Subspace Differential Coexpression
Details in [Fang, Kuang, Pandey, Steinbach, Myers and Kumar, PSB 2010]
as a measure of subspace differential coexpression
Problem: given n genes, find all the subsets of genes, s.t. SDC≥d
Given n genes, there are 2n
candidates of SDC pattern!How to effectively handle the
combinatorial search space?
Similar motivation and challenge as biclustering, but here
differetial biclustering !
Direct Mining of Differential Patterns
[Fang, Pandey, Gupta, Steinbach and Kumar, TR 09-011, CS@UMN]
Refined SDC measure: “direct”
A measure M is antimonotonic if V A,B: A B M(A) >= M(B)
Details in [Fang, Kuang, Pandey, Steinbach, Myers and Kumar, PSB 2010]
>>
≈
Advantages: 1) Systematic & direct 2) Completeness 3) Efficiency
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
An Association-analysis Approachsystematic and efficient combinatorial search
[ Agrawal et al. 1994]
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
Refined SDC measure
A measure M is antimonotonic if V A,B: A B M(A) >= M(B)
Disqualified
Prune all the supersets
• Three lung cancer datasets– [Bhattacharjee et al. 2001], [Stearman et al. 2005], [Su et al. 2007]
• All are from Affymetrix microarrays (first two: HG-U95A, and the third: HG-U133A)
– Lung cancer samples & normal samples
• Combined dataset– More samples– Proper normalizations before combining: (RMA, DWD, XPN)– Lung cancer samples (102)– normal samples (67)
Validation
RMA [Irizarry et al., 2003], DWD [Benito et al., 2004], XPN [Shabalin et al., 2008]
Statistical SignificancePhenotype permutation test (n=1000 )
A
B
C
Could Subspace DC patterns have been discovered in full-space?
Full-space DC measures
DC (Differential Coexpression)
Sub
spac
e D
C m
easu
res
Phenotype permutation based significant cutoff for the full-space measure
88 statistically significant size-3 patterns (stars)
Can also be found in full-space
Can NOT be found in full-space
A 10-gene Subspace DC Pattern
www. ingenuity.com: enriched Ingenuity subnetwork
≈ 60%≈ 10%
Enriched with the TNF-α/NFkB signaling pathway (6/10 overlap with the pathway, P-value: 1.4*10-5)
Suggests that the dysregulation of TNF-α/NFkB pathway may be related to lung cancer
• Specific interpretation– Enriched cancer-related signaling pathways
• TNF-α/NFkB• WNT
– Target gene sets of cancer-related microRNA & TFs• microRNA:
– miR-101 ({PIK3C2B,TSC22D1} + AKAP12)
• Transcriptional factor (TF): – ATF2 ({ETV4,PTHLH} + CBX5)
Biological Interpretations
miR-101 is shown down-regulated in cancer [Friedman et al 2009]
Mutations of ATF2 are shown to be related to cancer [Woo et al. 2002]
• Summary– Proposed the problem definition & a systematic approach for subspace DC
– Subspace DC analysis can identify many statistically significant & biologically relevant patterns that would have been missed in full-space
• Potential Biomedical utility– Study the demographic and genetic difference within each class
– Phenotype classification with subspace DC patterns• Combine DE and Subspace DC patterns
Summary & Future Directions
DE (Differential Expression); DC (Differential Coexpression)
Compare Compare
• Co-authors at Dept. Computer Science, Univ. of Minnesota
• Conference organizers
NSF grants#CRI-0551551#IIS-0308264
#ITR-0325949UMR-IBM-Mayo BICB Fellowship
Acknowledgement
Rui Kuang
Gaurav Pandey
Michael Steinbach
Chad Myers
Vipin Kumar
Data Mining for Biomedical Informatics Group
Comp. Bio. Group
Comp. Bio. & Func. Genomic Group
• Paper – Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,
Chad L. Myers and Vipin Kumar, Subspace Differential Coexpression Analysis: Problem Definition and a General Approach Proceedings of 15th Pacific Symposium on Biocomputing, 2010
• Source codes: http://vk.cs.umn.edu/SDC
• Questions:– Gang Fang: [email protected]
Thanks!