integrative omics analysis
Post on 02-Jan-2016
86 Views
Preview:
DESCRIPTION
TRANSCRIPT
Integrative omics analysis
Qi LiuCenter for Quantitative Sciences
Vanderbilt University School of Medicineqi.liu@vanderbilt.edu
Content
• Introduction • Data Sources• Methods• Tools• Things to be aware
Why?
http://jdr.sagepub.com/content/90/5/561
GenomicsWGS, WES
TranscriptomicsRNA-Seq
Epigenomics Bisulfite-Seq
ChIP-Seq
Small indels
point mutation
Copy number variation
Structural variation
Differential expression
Gene fusion
Alternative splicing
RNA editing
Methylation
Histone modification
Transcription Factor binding
Functional effect of mutation
Network and pathway analysis
Integrative analysis
Further understanding of cancer and clinical applications
Technologies Data Analysis Integration and interpretationPatient
What? at least two different types of omics data
Objectives
1. Understand relationships between different types of molecular data
2. Understand the phenotype – latent: disease subtype– Observable: patient outcome
Data sourcesTCGA
https://tcga-data.nci.nih.gov/tcga/
http://www.nature.com/ng/journal/v45/n10/full/ng.2764.html
Firehosehttp://gdac.broadinstitute.org/
cBioPortalhttp://www.cbioportal.org/public-portal/index.do
ICGChttps://icgc.org/
COSMIC
ENCODEhttp://genome.ucsc.edu/ENCODE/
http://www.nature.com/news/encode-the-human-encyclopaedia-1.11312http://genome-mirror.duhs.duke.edu/ENCODE/
FANTOMhttp://fantom.gsc.riken.jp/5/
GTEXhttp://www.gtexportal.org/home/
Methods
• Sequential or overlap analysis• Clustering• Correlation analysis• Linear regression• Network based analysis• Bayesian• …..
Sequential or overlap analysis
• Confirmation or refinement of findings– Each data are independently analyzed to get a list
of interesting entities– Lists of interesting entities are linked together
• Chin, K. et al. Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell 10, 529–541 (2006).
• Lando, M. et al. Gene dosage, expression, and ontology analysis identifies driver genes in the carcinogenesis and chemoradioresistance of cervical cancer. PLoS Genet. 5, e1000719 (2009).
• Beroukhim, R. et al. The landscape of somatic copy-number alteration across human cancers. Nature 463, 899–905 (2010).
Correlation analysisReveal the relationships between different molecular layers
– The strength of association indicates in trans-regulation.
miRNA
GSE10843
GSE10833
microRNA
miRNA-mRNA correlation
miRNA-ratio correlation
miRNA-protein correlation
mRNA decay
Translational repression
Combined effect
Association of sequence features with estimated mRNA decay or translation
repression
Site type
Site location
Local AU-context
Additional 3’ pairing
Significant inverse Correlation (p<0.005)
Supported by TargetScan, miRanda or MirTarget2
microRNA-target interactions
7235 functional relationships
Binding evidence
580 interactions60miRNAs423 genes
Sequence features on site efficacymicroRNA-target interactions
mRNAi
protein/mRNAratio
protein
the relative contribution of translation repression
79 miRNAs
5144 genes
Integrative method
Features on site efficacy for these two regulation types
mRNA decay : 8mer is efficientTanslational repression :8mer site do not show significant efficacy
mRNA decay : 3’UTR>ORF>5’UTRtranslational repression :marginal significance in ORF
Features on site efficacy for these two regulation types
AU-rich context appears to favor both mRNA decay and translational repression
3’ pairing enhance mRNA decay , but disfavor efficacy for translational repression
miRNA-target Interactions60 miRNAs , 423 genes580 interactions , in which 332 (57.2%) was discovered by the integration of proteomics data
miRNA-mRNA miRNA-ratio
miRNA-protein
212 147
31 295
156
0
miRNA-mRNA
TargetScan
miRanda
MirTarget2
miRNA-ratio
miRNA-protein
Function
Sequence
miR-138 prefers translational repression SW620 and SW480 (derived from the same patient)
SW620 SW480source lymph node primary
metastasis high poor
miR-138 (log2)
3.06 6.39
• Estimate the strength of association between different data
• Predict the outcome by modeling the combined effect of multiple types of data
Linear regression
Linear regression
• Linear regression
• Ridge—L2 penalized• Lasso—L1 penalized• Elastic net—L1+L2 penalized
ClusteringUnsupervised clustering of omics data to find inherent structures
– Using common latent variables among all data types
Network based analysis
--using inferred networks or known network interactions to guide analysis
Illustrative example of SNF steps
The advantage of the integrative procedure is that weak similarities (low-weight edges) disappear, helping to reduce the noise, and strong similarities (high-weight edges) present in one or more networks are added to the others. Additionally, low-weight edges supported by all networks are retained depending on how tightly connected their neighborhoods are across networks.
Patient similarities for each data types compared to SNF fused similarity
Comparison of SNF with icluster and concatenation
Methods
Methods
Extension to more than 2 data types
Tools
• Sequential or overlap analysis• Clustering
– R package icluster, iclusterPlus• Correlation based• Linear regression
– http://cbio.mskcc.org/leslielab/RegulatorInference – R package glmnet
• Network based– R package SNFtool
• Bayesian• …..
Visualization: Circular map for omics data
Chen et al. Cell 2012, 148(6):1293-1307
Circos plotCircoshttp://circos.ca/intro/genomic_data/Rcircoshttp://cran.r-project.org/web/packages/RCircos/index.html OmicCircoshttp://www.bioconductor.org/packages/release/bioc/html/OmicCircos.html
IGVhttp://www.broadinstitute.org/software/igv/home
NetGestalthttp://www.netgestalt.org/#2
Things to be aware
• The importance• The challenge in integrative analyses– Dimensionality
• Integration attempts are best carried out using known biological knowledge
References• Kristensen VN. et al. Principles and methods of integrative genomic analyses in cancer. Nat
Rev Cancer. 2014, 14(5):299-313• Wang B, et al. Similarity network fusion for aggregating data types on a genomic scale. Nat
Methods. 2014 ,11(3):333-7. • Yuan Y, et al. Assessing the clinical utility of cancer genomic and proteomic data across tumor
types. Nat Biotechnol. 2014 Jul;32(7):644-52. • Shen R, et al. Integrative clustering of multiple genomic data types using a joint latent
variable model with application to breast and lung cancer subtype analysis. Bioinformatics. 2009 Nov 15;25(22):2906-12.
• Liu Q, et al. Integrative omics analysis reveals the importance and scope of translational repression in microRNA-mediated regulation. Mol Cell Proteomics. 2013,12(7):1900-11.
• Setty M, et al. Inferring transcriptional and microRNA-mediated regulatory programs in glioblastoma. Mol Syst Biol. 2012;8:605
• Lappalainen T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 2013, 501, 506–511
• Jacobsen A, et al. Analysis of microRNA-target interactions across diverse cancer types. Nat Struct Mol Biol. 2013 , 20(11):1325-32.
top related