department of biomedical informatics mining gene co-expression network for cancer biomarker...

41
Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC Biomedical Informatics Shared Resource

Upload: sherman-williamson

Post on 12-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

Mining Gene Co-expression Network for Cancer Biomarker Prediction

Kun HuangDepartment of Biomedical Informatics

OSUCCC Biomedical Informatics Shared Resource

Page 2: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

2

Outline

• Introduction• Co-expression network for Breast cancer

• Frequent cancer co-expression network• Tissue-tissue network between stroma and

tumor mass• Other applications

• Chronic lymphocytic leukemia• Glioblastoma

• Discussion

Page 3: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

3

• Correlation / co-expression• Time-course data

• Bayesian network• Boolean network• …

Page 4: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

4

Boolean Network

Sahoo et al. Genome Biology 2008 9:R157  

Page 5: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

5

Gene Co-Expression

HMMR siRNA

Page 6: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

r = 1 r = -1

Ranges from 1 to -1.

Pearson Correlation Coefficient

Page 7: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

7

• Expansion• Negative correlation• Multiple breast cancer datasets• More anchor genes• …

• Is there a way to find all highly correlated genes in multiple datasets?

• Do these genes form clusters?

Gene Co-Expression Network

Page 8: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

8

Frequent Gene Co-expression Network Mining

• Genes appear in tight networks in multiple disease datasets may indicate functionally related biological modules, therefore can provide insights on the disease cell physiology and new direction for the research.

Page 9: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

9

Frequent network mining • CODENSE

• Search for frequent coherent dense subgraphs across large numbers of massive graphs

• Unsupervised bottom-up clustering on unweighted, undirected network

Page 10: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

10

Data selection and correlation

• Selected 23 datasets from Gene Expression Omnibus (GEO): • Search term “metastatic cancer”• Contain both control and tumor, # sample > 8• Only primary tumor biopsy

• Correlation : │PCC│ > 0.75 (really high similarity)• For CODENSE:

• Edge support appears in at least 4 datasets• Connectivity ratio r > 40% (r = L / [n(n-1)/2] )• # of nodes > 20

Page 11: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

11

Results from CODENSE• 44 networks (clusters) are identified• # of nodes: 21 ~ 74 (average 44)• Connectivity: 0.41 ~ 0.78

Page 12: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

12

GO Enrichment Analysis on the Networks

• Networks with enriched GO terms associated with at least 1/3 of the genes• Immune response/system – 15• Protein translation (ribosome) – 5• Development – 4• Metabolism and energy (oxidative phosphorylation or

monocarboxylic acid metabolism) – 3• Cell cycle – 2• Muscle contraction – 1

• 14 networks do NOT satisfy the above criterion• Potential new functions• New interactions

Page 13: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

13

Use cluster 2 to predict survival outcome

• NKI-295 dataset• Supervised clustering: k-means, k=2, 100 random

initialization• Kaplan Meier curve and log-rank test for survival analysis

and comparison• Test for different patient groups

Page 14: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

14

Predict Survival Outcome

Page 15: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

15

Predict Survival Outcome

Page 16: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

Relation to BRCA1

Page 17: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

Finding New Gene FunctionsTest siRNA HDR

(relative)Centrosome

(Hela cell)Centrosome

(B.C. cell)1 Control siRNA 1 2% 4%2 BRCA1 0.16 ND 22%3 BRCA2 0.02 ND ND4 HMMR 1.33 10% 19%5 KIAA0101 0.52 10% 19%6 ASPM 1.0 2% ND7 NUSAP1 0.5 5% ND8 DLG7 0.5 ND ND9 KIF14 0.33 20% ND

10 KIF23 ND 27% 15%

Page 18: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

Finding New Gene Functions

KIAA0101

Page 19: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

ER-Negative Breast Cancer

Page 20: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

ER-Negative Breast Cancer

Page 21: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

21

Tumor Microenvironment (TME)

21

Cell, Volume 100, Issue 1, 7 January 2000, Pages 57-70

Kalluri et al. Nature Reviews Cancerpublished online 30 March 2006 | doi:10.1038/nrc1877

Page 22: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

Bipartite Graph

Network Density (r) For a bipartite network

with M+N nodes (M nodes in one side and N nodes in the other) and K edges

r = K/MN. For a weighted bipartite

network with M+N nodes and K edges

r = Σi=1,…K Wi/MN.

Stroma Tumor

Page 23: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

Bipartite Quasi-clique Discovery Algorithm

A Greedy Algorithm Original algorithm for quasi-clique finding is from

Ou and Zhang (2007). A new multimembership clustering method. J. of Ind. and Man. Opt., 3(4): 619-624.

Modified for the bipartite graph Four steps:

1. Set the threshold on edge weight w0 = g•max(wi).2. Initialize a new search: pick the edge with the maximal weight (larger than

w0) that has not been assigned to any network as the first edge of a new network.

3. Grow: alternatively adding nodes to the network from both sides which contribute most to the network density if the contribution to the density is higher than an adaptive threshold defined by two parameters l and t; 3.1. stop when no new node can be added; go to Step 2.

4. Merge: iteratively merge networks with more than 50% overlap (w.r.t. to the smaller one).

Page 24: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

24

Select a breast cancer dataset from GEO:GSE5847 contains 47 samples with separate microarray data for stroma and tumor separated using laser capture microdissection

Compute Pearson Correlation Coefficients (PCC) for every pair of gene between the stroma and the tumor

Select the top 10 networks for further analysis

Use the PCC values as the weights for the edges and set the three parameters (0.7)g , (2)l , and t (2) to run the bipartite quasi-clique finding algorithm

Workflow

Page 25: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

• Stroma-tumor networkStroma GO BP (p-values) Tumor GO BP (p-values) Common

1 19 M phase/cell cycle (1e-6) 68 Cell cycle/M phase (0) 13

2 35 Cell-cell signaling (8e-6)Neuropeptide signaling pathway (0.000686)

49 Cell-cell signaling (0.000549)Synaptic transmission (0.000084)

32

3 33 Immune response (0)Response to virus (0)

36 Immune response (0)Response to virus (0)

31

4 29 Immune response (0) 29 Immune response (0)Positive regulation of B cell proliferation (0.000536)

18

5 23 Cell-cell signaling (0.003158)Synaptic transmission (0.005969)

28 Secretion to cell (0.000218)Cell-cell signaling (0.000441)

15

6 17 M phase (0)Cell cycle (2e-6)

31 M phase (0)Mitosis (0)

7

7 12 Immune response(8e-6) 27 Follicle-stimulation hormone secretion (33e-6)

10

8 13 Extracellular space (1e-6) (CC) 19 Extracellular space (9.7e-5) (CC) 7

9 7 Wound healing (0.000114) 25 Amine metabolic process (3.5e-6) 6

10 5 24 Extracellular region part (0.000267) (CC) 4

Results

Page 26: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

• Tumor microenvironmentNetwork Stroma Tumor

8 AMPH, DCN, ELN, FBN1, HTRA1, LRRC17, OMD, PDGFRL, SFRP4, SGCD, SPON1, TGFB3, ZFHX4

AMPH, ANGPTL2, BNC2, COL1A1, DPT, ECM2, EHD2, FAT4, GLT8D2, GUCY1B3, HTRA1, KCNJ8, LRRC17, MMP2, OLFML1, PDGFRL, SFRP4, SPON1, ZFHX4

Extracellular Matrix Network

Page 27: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

27

Outline

• Introduction• Co-expression network for Breast cancer

• Frequent cancer co-expression network• Tissue-tissue network between stroma and

tumor mass• Other applications

• Chronic lymphocytic leukemia (CLL)• Glioblastoma

• Discussion

Page 28: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

28

CLL Prognostic Biomarker

• CLL is the most common adult leukemia in the western world. It is highly heterogeneous, can be indolent or progressive.

• Prognosis at early stage is crucial to progressive patient survival as well as to indolent patients to avoid unnecessary adverse treatment.

• Biological prognostic markers:• Serum markers (TK, B2M, sCD23)• FISH cytogenetics• IgVH mutational status - Determination is time consuming and expansive

• CD38 expression - Actually independent of IgVH mutational status

• ZAP-70 expression - Not 100% correlated to IgVH mutational status, only accurate when patients in the progressive stage

Page 29: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

29

Network 17

• 51 genes, including ZAP-70 and CD38• r = 0.4142• Including known ZAP-70 interacting genes - CD8A, CD3G,

CD3D, CD247

Page 30: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

30

Highly enriched Functions of Immune Response

Page 31: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

31

Workflow of CLL Prognostic Biomarker Selection

Further select prognostic biomarkers by testing on separate CLL dataset

Genes with exp fold change > 1.5p <0.05

Test the prediction accuracy of each gene on IgVH mutation status

Cross validation

Select a group of feature genes that can

differentiate IgVH mu +/- groups

mRMR40

10

Identify potential prognostic biomarkers 5

Compute gene exp level difference on IgVH

mu+/- groups

11

6

40

Page 32: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

32

Differential Expression of Genes between IgVH mutation +/-

Genesp-values

(Unmutated vs Mutated)

Mean fold change

(Unmutated vs Mutated)

p-value (Patients vs

Normal)

SH2D1AIL2RBKLRK1CD247GZMBCD3GCD3DGZMKCD8ANKG7ZAP70LAG3

1.3E-38.1E-54.9E-31.6E-43.1E-30.0171.4E-40.0229.9E-58.3E-47.9E-40.023

1.9441.8211.8131.8071.7191.6851.6211.5861.5761.560-1.403-1.598

0.0894.8E-160.00797.1E-86.2E-11

0.414.3E-169.2E-113.5E-91.3E-9

5.5E-120.028

Page 33: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

33

Prediction of IgVH Mutational Status with Individual Genes

Genes Accuracy Subcellular

location

SH2D1AIL2RBKLRK1CD247GZMBCD3GCD3DGZMKCD8ANKG7ZAP70LAG3

ZAP70+IL2RBZAP70+IL2RB+CD8A

57.32%68.84%63.67%66.03%57.13%62.52%64.27%57.58%68.31%64.94%68.46%59.53%73.22%74.62%

cytoplasmicmembranemembranemembranesecreted

membranemembranesecreted

membranemembrane

cytoplasmicmembrane

--

• Two groups of patients (GDS1494): 49 IgVH mu- ; 51 IgVH mu+• Each gene / gene set was tested independently• A linear classifier with 20% hold out and 100 repeats

Page 34: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

34

Top Ten Genes Selected by mRMR

Rank Name mRMR Score

123456789

10

IL2RBLAG3

RASGRP1CD8AXCL1ZAP70CD79AFMNL1KLRK1CST7

0.1010.0200.0290.0210.0110.0180.0010.0000.0000.002

Page 35: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

35

Cross Check with Outcome Data

• LAG3 : involved in T-cell-dependent B-cell activation, reported recently to be highly correlated to IgVH mutational status

• IL2RB: involved in endocytosis and transduction of mitogenic signal of IL2, expression on B-cells was linked to CLL

• CD8A and CD247: expression of CD8A on B-cells has been linked to CLL

• KLRK1: involved in immune surveillance exerted by T/B-cells

Using GSE10138

Page 36: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

36

Application to GBM

Cluster p-value # Genes

A 0.0010946 79

B 0.00054934 87

C 0.0016763 23

D 0.0063116 466

E 0.0057298 154

F 0.000957 79

G 0.0010599 29

H 0.0086392 303

I 0.00098224 39

J 0.0097023 21

K 0.0061901 97

L 0.000352 42

Page 37: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

37

ID Name P-value

Term in Query

Term in Genome

1 GO:0002376 immune system process

5.380E-37

119 1258

2 GO:0006955 immune response 4.421E-35

95 832

3 GO:0006952 defense response 1.172E-22

75 760

4 GO:0002684 positive regulation of immune system

process

2.644E-21

47 303

5 GO:0002682 regulation of immune system

process

1.374E-20

58 493

6 GO:0050776 regulation of immune response

1.825E-17

41 277

7 GO:0009611 response to wounding

4.149E-17

62 658

8 GO:0050778 positive regulation of immune response

6.545E-16

33 188

9 GO:0006954 inflammatory response

1.640E-15

47 414

10

GO:0002252 immune effector process

4.530E-14

36 260

Functional Enrichment analysis using IPA for cluster D.

Page 38: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

38

IDName

P-value

Term in

Query

Term in Genome

1GO:0008219 cell death 8.245

E-328 1385

2GO:0016265 death 8.829

E-328 1390

3GO:0010033 response to organic

substance1.299E-2

17 605

4GO:0012501 programmed cell death 1.735

E-226 1279

5GO:0034097 response to cytokine

stimulus1.742E-2

7 93

6GO:0009628 response to abiotic

stimulus2.393E-2

14 443

7GO:0048545 response to steroid

hormone stimulus2.469E-2

10 225

8GO:0051093 negative regulation of

developmental process3.889E-2

18 728

9GO:0006915 apoptosis 4.244

E-225 1265

Functional Enrichment analysis using IPA for cluster E. The x-axis shows the log (base 10) of p-values of the enriched terms using the Fisher’s exact tests.

GO Enrichment results using ToppGene for Cluster E (GO: Biological Processes)

Page 39: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

Summary and Future Work

Gene co-expression networks provide rich information in predicting gene functions and disease mechanisms

Need to be integrated with other networks such as PPI

Page 40: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

Summary and Future Work

Ongoing work 1: More biological and clinical validation Tissue microarray – at protein level

Ongoing work 2: Multiple tissue network for TME Microarray data for epithelial cells, fibroblast cells,

endothelial cells, macrophages Moving to RNA-seq

Ongoing work 3: Biclique mining algorithm using frequent item set

and graph summarization

Page 41: Department of Biomedical Informatics Mining Gene Co-expression Network for Cancer Biomarker Prediction Kun Huang Department of Biomedical Informatics OSUCCC

Department of Biomedical Informatics

Summary and Future Work

Ongoing work 4: Integrating multiple networks – disease network,

phenotype network

Barabasi A-L, Network medicine – from obesity to “Diseasome”, NEJM, 357(4): 404-407, 2007.