Download - BioRDF Breakout

Transcript
Page 1: BioRDF Breakout

1

BioRDF Breakout

Introduction – Kei Cheung Mage-tab – Michael Miller vOID – Jun Zhao (remote) aTag – Matthias Samwald (remote) Discussion – All

Page 2: BioRDF Breakout

2

BioRDF Breakout: Microarray Use Case

Kei Cheung, Ph.D.

Associate Professor

Yale Center for Medical Informatics

HCLS IG Face-to-Face Meeting, Santa Clara, California, November 2-3, 2009

Page 3: BioRDF Breakout

3

Introduction

Whole-genome expression profiling has created a revolution in the way we study disease and basic biology.

DNA microarrays allow scientists to quantify thousands of genomic features in a single experiment

Since 1997, the number of published results based on an analysis of gene expression microarray data has grown from 30 to over 5,000 publications per year

Major public microarray data repositories have been created in different countries (e.g., NCBI GEO, EBI ArrayExpress, and CIBEX)

Page 4: BioRDF Breakout

4

Microarray Workflow

Page 5: BioRDF Breakout

5

An Example of differentially expressed genes

Page 6: BioRDF Breakout

6

Importance of Integrating Microarray Data Due to the high cost and low reproducibility of many

microarray experiments, it is not surprising to find a limited number of patient samples in each study,

Very few common identified marker genes among different studies involving patients with the same disease.

It is of great interest and challenge to merge data sets from multiple studies to increase the sample size, which may in turn increase the power of statistical inferences.

The integration of external information resources is essential in interpreting intrinsic patterns and relationships in large-scale gene expression data

Page 7: BioRDF Breakout

7

Microarray Data Standards

MGED MIAME MAGE-ML MAGE-TAB

Page 8: BioRDF Breakout

8

Some Examples

Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes (Jiang et al. 2004 BMC Bioinformatics)

Large-scale integration of cancer microarray data identifies a robust common cancer signature (Xu et al. 2007 BMC Bioinformatics)

What about neurosciences?

Page 9: BioRDF Breakout

9

Access to and Use of Microarray data in Neuroscience NIH Neuroscience Microarray Consortium Public repositories such as GEO and

ArrayExpress (including data generated from neuroscience microarray experiments)

Brain atlases (e.g., Allen Brain Atlas and GenSAT)

Page 10: BioRDF Breakout

10

Ontology-Based IntegrationMicroarray experiment 1

Microarray experiment 2

Brain region (e.g., entorhinal cortex, hippocampus, primary visual cortex)

Layer (e.g., Layer 2 of the enthorhinal cortex)

Neuron (e.g., stellate island neuron, pyramidal neuron)Part-of

Part-of

Neuron ontology

Input to

Page 11: BioRDF Breakout

11

Example Federated Queries

Retrieve a list of differentially expressed genes between different brain regions (e.g., hippocampus and entorhinal cortex) for normally aged human subjects.

Retrieve a list of differentially expressed genes for the same brain region of normal human subjects and AD patients.

Using these lists of genes one can issue (federated) queries to retrieve additional information about the genes for various types of analyses (e.g., GO term enrichment).

Page 12: BioRDF Breakout

12

Microarray Experiment DescriptionsE-GEOD-3296 Transcription profiling of primary mouse embryonic fibroblasts (MEFs) from C57B1/6x129/Sv F2 e14.5 embryos that contain a deletion in the CH1 domain of three of four alleles of CBP and p300The CH1 protein interaction domain of the transcriptional coactivators p300 and CBP is thought to interact with HIF-1alpha and this interaction is thought to be critical to the expression of HIF-1alpha target genes in response to hypoxia. Trichostatin A (TSA), an inhibitor of histone deacetylases, has been reported to repress the expression of HIF-1alpha target genes. To test the requirement of the CH1 domain and TSA for gene expression in response to dipyridyl (a hypoxia mimetic), primary mouse embryonic fibroblasts (MEFs) were generated from C57Bl/6x129/Sv F2 e14.5 embryos that contain a deletion in the CH1 domain of three of four alleles of CBP and p300. The remaining allele of p300 or CBP was a conditional knock out allele. Control MEFs with only a single conditional knockout allele of p300 or CBP were also generated. At passage 3 MEFs were infected with Cre Adenovirus and grown until they had expanded at least 100 fold. Subconfluent MEFs were treated with ethanol vehicle or 100ng/ml TSA with 5% carbon dioxide at 37 C in a humid chamber for 30 min., followed by ethanol vehicle or 100 umdipyridyl (DP) for an additional 3hrs. Immediately after treatment, cells were lysed in Trizol for RNA extraction.

E-GEOD-3327 Transcription profiling of different regions of mouse brain to study adult mouse gene expression patterns in common strains.Adult mouse gene expression patterns in common strains. Experiment Overall Design: six mouse strains and seven brain regions were analyzed

E-GEOD-358 Transcription profiling of rat whole brain samples from animals with repeated exposure to the anaesthetic isoflurane12 Controls, 3 5-exposures, 3 10-exposures. Rats were exposed to 90 minutes of 1.0% isoflurane twice a day for a total of 5 or 10 exposures. Animals did not require intubation. All exposures and hybridizations were performed at the Univ. of Pennsylvania

Page 13: BioRDF Breakout

13

Open Biomedical Annotator

Page 14: BioRDF Breakout

14

Some Results

Two microarray experiments (E-GEOD-4034, E-GEOD-4035) contain the following set of terms: fear, hippocampus, mouse.

These microarray experiments study the role of hippocampus in fear using mouse as the model.

Page 15: BioRDF Breakout

15

Analysis tools

BioConductor GenePattern Genespring

Page 16: BioRDF Breakout

16

Intercommunity collaboration

HCLS (BioRDF) MGED (ArrayExpress) NIF (NeuroLex) Ontology community (NCBO)

Page 17: BioRDF Breakout

17

Web of silos

cel, gpr, etc

Page 18: BioRDF Breakout

18

Semantic Web = Brilliant Web!

Page 19: BioRDF Breakout

19

The End

Page 20: BioRDF Breakout

20

Discussion

What is the RDF structure Extension of SPARQL to empower data analysis Workflow and provenance Visualization How to integrate database and literature Integration of other types of data Inter-community collaboration Translational use cases

Page 21: BioRDF Breakout

21

What should be the RDF structure?

Experiments Samples Experimental conditions/factors Gene lists Arrays/chips Raw/processed data (e.g., CEL, GPR,

gene matrix)

Page 22: BioRDF Breakout

22

Extension of SPARQL

Hierarchical queries Statistical analyses/tests Enrichment analysis

Page 23: BioRDF Breakout

23

Workflow and provenance

Taverna Biomoby Genepattern

Page 24: BioRDF Breakout

24

Visualization

Cytoscape TreeView

Page 25: BioRDF Breakout

25

How to integrate database and literature

Page 26: BioRDF Breakout

26

Inter-community Collaboration

NCBO SWAN

Page 27: BioRDF Breakout

27

What other types of data can be integrated with microarray data

Page 28: BioRDF Breakout

28

Translational use cases


Top Related