deepblue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets
TRANSCRIPT
DeepBlue epigenomic data serverprogrammatic data retrieval and analysis of epigenome region sets.
Felipe Albrecht, Markus ListMax Planck Institute for Informatics
June 23, 2016
Problems with the epigenomic data deluge
Data access
Data analysis
Scalabiltiy
• More than a simple data archive
• Organizes the data with a defined vocabulary and ontologies (defined by IHEC)
• Full-text search
• Web interface
• Detailed documentation
• Data operations – filter by regions attributes (location,
score) – data intersection– flank and extend region– aggregate/summarize
• Download only the relevant data
• Use your favorite language: – R, Python, JavaScript, etc
Signal, peak and methylation
– Histone marks– DNA Methylation - WGBS, RRBS– RNA-seq (mRNA, shRNA knockdown)– Chromatin accessibility (DNAse, NOMe)– Transcription factors binding sites– Gene annotation sets (GENCODE)
Module that periodically updates DeepBlue with new data
DeepBlue data
Data from the following Epigenome Mapping Consortia
• BLUEPRINT Epigenome• DEEP (for DEEP members)• ENCODE• Roadmap Epigenomic
More than 56.000 experiments and 18 Tb of data
• Besides the experiment name, all experiments have 5 mandatory metadata fields that are part of controlled vocabularies:
– Genome assembly– BioSources (cell lines, cell types, tissues, organs) -
CL, EFO, and UBERON ontology
– Epigenetic mark– Technique– Project
• It is possible to include key-value strings with extra information about the epigenomic experiment
Experiments metadata
Web interfacehttp://deepblue.mpi-inf.mpg.de
Access, full-text search, quick overview, and download the data
Data Grid
July 10, 2016 8/19
http://deepblue.mpi-inf.mpg.de/R
• Intuitive access for R users
• Connect to others R/Bioconductor packages to facilitate downstream analysis: such as GenomicRanges and GVIz
• Documentation (examples and vignette)
• Submitted to Bioconductor
Summarize and plot the average DNA Methylation level accross multiple files
More examples at
http://deepblue.mpi-inf.mpg.de/R
Acknowledgements
Thomas LengauerChristoph Bock
Joachim BüchGeorg Friedrich
Markus ListPeter EbertFabian Müller
Obaro Odiete
Albrecht,F., List,M., Bock,C. and Lengauer,T. (2016) DeepBlue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets. Nucleic Acids Research, doi:10.1093/nar/gkw211