deepblue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets

Post on 15-Apr-2017

145 Views

Category:

Science

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DeepBlue epigenomic data serverprogrammatic data retrieval and analysis of epigenome region sets.

Felipe Albrecht, Markus ListMax Planck Institute for Informatics

June 23, 2016

Problems with the epigenomic data deluge

Data access

Data analysis

Scalabiltiy

• More than a simple data archive

• Organizes the data with a defined vocabulary and ontologies (defined by IHEC)

• Full-text search

• Web interface

• Detailed documentation

• Data operations – filter by regions attributes (location,

score) – data intersection– flank and extend region– aggregate/summarize

• Download only the relevant data

• Use your favorite language: – R, Python, JavaScript, etc

Signal, peak and methylation

– Histone marks– DNA Methylation - WGBS, RRBS– RNA-seq (mRNA, shRNA knockdown)– Chromatin accessibility (DNAse, NOMe)– Transcription factors binding sites– Gene annotation sets (GENCODE)

Module that periodically updates DeepBlue with new data

DeepBlue data

Data from the following Epigenome Mapping Consortia

• BLUEPRINT Epigenome• DEEP (for DEEP members)• ENCODE• Roadmap Epigenomic

More than 56.000 experiments and 18 Tb of data

• Besides the experiment name, all experiments have 5 mandatory metadata fields that are part of controlled vocabularies:

– Genome assembly– BioSources (cell lines, cell types, tissues, organs) -

CL, EFO, and UBERON ontology

– Epigenetic mark– Technique– Project

• It is possible to include key-value strings with extra information about the epigenomic experiment

Experiments metadata

Web interfacehttp://deepblue.mpi-inf.mpg.de

Access, full-text search, quick overview, and download the data

Data Grid

July 10, 2016 8/19

http://deepblue.mpi-inf.mpg.de/R

• Intuitive access for R users

• Connect to others R/Bioconductor packages to facilitate downstream analysis: such as GenomicRanges and GVIz

• Documentation (examples and vignette)

• Submitted to Bioconductor

Summarize and plot the average DNA Methylation level accross multiple files

More examples at

http://deepblue.mpi-inf.mpg.de/R

Acknowledgements

Thomas LengauerChristoph Bock

Joachim BüchGeorg Friedrich

Markus ListPeter EbertFabian Müller

Obaro Odiete

Albrecht,F., List,M., Bock,C. and Lengauer,T. (2016) DeepBlue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets. Nucleic Acids Research, doi:10.1093/nar/gkw211

top related