data integration across omics landscapes bing zhang, ph.d. department of biomedical informatics...

24
Data integration across omics landscapes Bing Zhang, Ph.D. Department of Biomedical Informatics Vanderbilt University School of Medicine [email protected]

Upload: georgina-wheeler

Post on 17-Dec-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

Data integration across omics landscapes

Bing Zhang, Ph.D.Department of Biomedical Informatics

Vanderbilt University School of Medicine

[email protected]

Omics data integration

CNCP20122

DNA

mRNA

Protein

Elephant

Informatics approaches to integrate genomic and proteomic data

CNCP20123

Genomic data

Proteomic data

Novel biological insights

Genomic data

Improved proteomic data analysis

Protein expression

MS/MS

Protein PTMMS/MS, protein arraysPro

teo

me

CPTAC

CNV

LOH

DNA Methylation

Exon expression

Junction expression

Gene expression

Mutations

Sequence variants

arrayCGH, SNP Array

SNP Array

Methylation Array

Array, RNA-Seq

RNA-Seq

Array, RNA-Seq

Exome SequencingRNA-Seq

Exome SequencingRNA-Seq

Ge

no

me

Tra

ns

cri

pto

me

EG

Technology Data Type

TCGA

The Cancer Genome Atlas

Clinical Proteomic Tumor Analysis Consortium

Using genomic data to improve proteomic data analysis Project 1. customProDB: generating customized protein

databases to enhance protein identification in shotgun proteomics

Project 2. NetWalker: prioritizing candidate gene lists for targeted MRM analysis

Integrating genomic and proteomic data to gain novel biological insights Project 3. miRNA-mediated regulation: understanding post-

transcriptional mechanisms regulating human gene expression

Project 4. NetGestalt: viewing and correlating cancer omics data within a biological network context

Informatics approaches to integrate genomic and proteomic data

CNCP20124

customProDB: motivation

CNCP20125

Database search

commonly used databaseExpressed proteins

Unexpressed proteins

Proteins with sequence variation

Increased sensitivity

Reduced ambiguity

Variant peptides

Customized protein database from RNA-Seq data

CNCP20126

Wang et al., J Proteome Res, 2012

R package

Compatible with both DNA and RNA sequencing data

Sample specific database and consensus database

Application to the CPTAC project

Spectral library

CustomProDB: moving forward

CNCP20127

Wang et al., manuscript in preparation

miRNA regulation: motivation

miRNA expression

mRNA expression

Protein/mRNA ratio

Protein expression

mRNA decay

Translation repression

Combined effect

Inverse correlation

8 CNCP2012

miRNA regulation: data preparation

9 colorectal cancer cell lines

Protein expression data: Current study

mRNA expression data: GSE10843

miRNA expression data: GSE10833

9 CNCP2012

miRNA regulation: data analysis workflow

10

Liu et al., manuscript in preparation

CNCP2012

Early studies suggest a major role of translational repression Olsen et al. Dev Biol, 1999; Zeng et al., Molecular Cell, 2001

Recent large-scale studies suggest a predominant role of mRNA decay Baek et al., Nature, 2008; Selbach et al., Nature, 2008; Guo et al.,

Nature, 2010 Our study suggested equally important roles of mRNA decay and

translational repression Translational repression was involved in 58% and played a major role in

30% of all predicted miRNA-targeted interactions Most miRNAs exert their effect through both mRNA decay and

translational repression Sequence features known to drive site efficacy in mRNA decay were

generally not applicable to translational repression

miRNA regulation: mRNA decay or translational repression?

11 CNCP2012

miR-138 prefers translational repression

12 CNCP2012

NetGestalt: motivation

CNCP201213

DNAmutation

methylation

mRNAexpression

splicing

Proteinexpressionmodification

Phenotype

Network

NetGestalt: scalable network representation

CNCP201214

Total number of modules (size >30): 92

Functional homogeneity: 63 (69%)

Spatial homogeneity: 55 (60%)

Dynamic homogeneity: 69 (75%)

Homogeneity of any type: 82 (89%)

3 2 1 0

Proteins

Viewing data as tracks Heat map (e.g. gene expression data) Bar chart (e.g. fold changes, p values) Binary track (e.g. significant genes,

GO) Comparing binary tracks

Clickable Venn diagram Enrichment analysis

Network modules GO terms Pathways

Navigating at different scales Zoom Pan 2D graph visualization

NetGestalt: viewing and cross-correlating data

CNCP201215Shi et al., manuscript under revision

CNCP201216

Browsing data sources

Viewing data as tracks

Comparing tracks

Identifying modules

Annotating modules

Moving across scales

CNCP201217

Luminal B

Basal

Pro

teo

mic

s

-log(p) signed

Diff proteins

-log(p) signed

Diff proteins

Luminal B

Basal

-log(p) signed

Diff genes

PN

NL

TC

GA

RulerNetwork modules

Van

dy

Mic

roar

ray

Browsing data sources

Viewing data as tracks

Comparing tracks

Identifying modules

Annotating modules

Moving across scales

CNCP201218

Luminal B

Basal

Pro

teo

mic

s

-log(p) signed

Diff proteins

-log(p) signed

Diff proteins

Luminal B

Basal

-log(p) signed

Diff genes

PN

NL

TC

GA

RulerNetwork modules

Van

dy

Mic

roar

ray

45%51%

4%

0%

Browsing data sources

Viewing data as tracks

Comparing tracks

Identifying modules

Annotating modules

Moving across scales

CNCP201219

Vandy

PNNL

-log(p) signed

-log(p) signed

Luminal B

Basal

-log(p) signed

Ruler

Network modules

Microarray

Luminal BBasal

En

rich

ed

Mo

du

les

Browsing data sources

Viewing data as tracks

Comparing tracks

Identifying modules

Annotating modules

Moving across scales

CNCP201220

Browsing data sources

Viewing data as tracks

Comparing tracks

Identifying modules

Annotating modules

Moving across scales

Vandy

PNNL

-log(p) signed (Vandy)

-log(p) signed (PNNL)

Luminal B

Basal

-log(p) signed

Ruler

Network modules

MicroarrayLuminal BBasal

Enr

iche

d M

odul

es

MRM targets

DNA damage response

Gene symbol

CNCP201221

Browsing data sources

Viewing data as tracks

Comparing tracks

Identifying modules

Annotating modules

Moving across scales

Vandy

PNNL

Luminal B

Basal

-log(p) signed

Ruler

Network modules

MicroarrayLuminal BBasal

Enr

iche

d M

odul

es

MRM targets

DNA damage response

Gene symbol

-log(p) signed (Vandy)

-log(p) signed (PNNL)

CNCP201222

Browsing data sources

Viewing data as tracks

Comparing tracks

Identifying modules

Annotating modules

Moving across scales

Luminal B

Basal

Pro

teo

mic

s

-log(p) signed

Luminal B

Basal

-log(p) signed

Ruler

Network modules

Mic

roa

rra

yE

nri

ch

ed

M

od

ule

s

Proteomics

Microarray

T cell activation

Using genomic data to improve proteomic data analysis Project 1. customProDB: generating customized protein

databases to enhance protein identification in shotgun proteomics

Project 2. NetWalker: prioritizing candidate gene lists for targeted MRM analysis

Integrating genomic and proteomic data to gain novel biological insights Project 3. miRNA-mediated regulation: understanding post-

transcriptional mechanisms regulating human gene expression

Project 4. NetGestalt: viewing and correlating cancer omics data within a biological network context

Informatics approaches to integrate genomic and proteomic data

CNCP201223

Qi Liu

Jing Wang Xiaojing Wang

Jing Zhu

Dan Liebler

Rob Slebos

Dave Tabb

Zhiao Shi

Acknowledgement

CNCP201224

Funding: NIGMS R01GM088822NCI U24CA159988NCI P50CA095103