deepblue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets

10
DeepBlue epigenomic data server programmatic data retrieval and analysis of epigenome region sets. Felipe Albrecht, Markus List Max Planck Institute for Informatics June 23, 2016

Upload: felipe-albrecht

Post on 15-Apr-2017

145 views

Category:

Science


2 download

TRANSCRIPT

Page 1: DeepBlue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets

DeepBlue epigenomic data serverprogrammatic data retrieval and analysis of epigenome region sets.

Felipe Albrecht, Markus ListMax Planck Institute for Informatics

June 23, 2016

Page 2: DeepBlue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets

Problems with the epigenomic data deluge

Data access

Data analysis

Scalabiltiy

Page 3: DeepBlue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets

• More than a simple data archive

• Organizes the data with a defined vocabulary and ontologies (defined by IHEC)

• Full-text search

• Web interface

• Detailed documentation

• Data operations – filter by regions attributes (location,

score) – data intersection– flank and extend region– aggregate/summarize

• Download only the relevant data

• Use your favorite language: – R, Python, JavaScript, etc

Page 4: DeepBlue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets

Signal, peak and methylation

– Histone marks– DNA Methylation - WGBS, RRBS– RNA-seq (mRNA, shRNA knockdown)– Chromatin accessibility (DNAse, NOMe)– Transcription factors binding sites– Gene annotation sets (GENCODE)

Module that periodically updates DeepBlue with new data

DeepBlue data

Data from the following Epigenome Mapping Consortia

• BLUEPRINT Epigenome• DEEP (for DEEP members)• ENCODE• Roadmap Epigenomic

More than 56.000 experiments and 18 Tb of data

Page 5: DeepBlue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets

• Besides the experiment name, all experiments have 5 mandatory metadata fields that are part of controlled vocabularies:

– Genome assembly– BioSources (cell lines, cell types, tissues, organs) -

CL, EFO, and UBERON ontology

– Epigenetic mark– Technique– Project

• It is possible to include key-value strings with extra information about the epigenomic experiment

Experiments metadata

Page 6: DeepBlue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets

Web interfacehttp://deepblue.mpi-inf.mpg.de

Access, full-text search, quick overview, and download the data

Page 7: DeepBlue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets

Data Grid

Page 8: DeepBlue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets

July 10, 2016 8/19

http://deepblue.mpi-inf.mpg.de/R

• Intuitive access for R users

• Connect to others R/Bioconductor packages to facilitate downstream analysis: such as GenomicRanges and GVIz

• Documentation (examples and vignette)

• Submitted to Bioconductor

Page 9: DeepBlue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets

Summarize and plot the average DNA Methylation level accross multiple files

More examples at

http://deepblue.mpi-inf.mpg.de/R

Page 10: DeepBlue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets

Acknowledgements

Thomas LengauerChristoph Bock

Joachim BüchGeorg Friedrich

Markus ListPeter EbertFabian Müller

Obaro Odiete

Albrecht,F., List,M., Bock,C. and Lengauer,T. (2016) DeepBlue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets. Nucleic Acids Research, doi:10.1093/nar/gkw211