visual exploration of clinical and genomic data for patient stratification

Post on 07-Dec-2014

294 Views

Category:

Science

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Talk presented at the Simons Foundation Biotech Symposium "Complex Data Visualization: Approach and Application" (12 September 2014) http://www.simonsfoundation.org/event/complex-data-visualization-approach-and-application/ In this talk I describe how we integrated a sophisticated computational framework directly into the StratomeX visualization technique to enable rapid exploration of tens of thousands of stratifications in cancer genomics data, creating a unique and powerful tool for the identification and characterization of tumor subtypes. The tool can handle a wide range of genomic and clinical data types for cohorts with hundreds of patients. StratomeX also provides direct access to comprehensive data sets generated by The Cancer Genome Atlas Firehose analysis pipeline. http://stratomex.caleydo.org

TRANSCRIPT

Visual Exploration of Clinical and Genomic Data for Patient Stratification

NILS GEHLENBORG !

@nils_gehlenborg・http://www.gehlenborg.com

Broad Institute of MIT and Harvard

Cancer ProgramHarvard Medical School Center for Biomedical Informatics

Alexander Lex Harvard School of Engineering and Applied Sciences, Cambridge, MA, USA

Marc Streit Johannes Kepler University, Linz, Austria

Christian Partl Graz University of Technology, Graz, Austria

Sam Gratzl Johannes Kepler University, Linz, Austria

Dieter Schmalstieg Graz University of Technology, Graz, Austria !

Hanspeter Pfister Harvard School of Engineering and Applied Sciences, Cambridge, MA, USA

Peter J Park Harvard Medical School, Boston, MA, USA !

Nils Gehlenborg Harvard Medical School, Boston, MA, USA & Broad Institute, Cambridge, MA !!!!Special thanks to

Broad Institute TCGA Genome Data Analysis Center Team in particular Michael S Noble, Lynda Chin & Gaddy Getz

Team

Peter J Park NIH/NCI The Cancer Genome Atlas !

Nils Gehlenborg NIH/NHGRI K99/R00 Pathway to Independence Award !!

Funding

?

TCGAThe Cancer Genome Atlas

20+ cancer types ×

500 patients

10,000+ patients

mRNA expression

microRNA expression

DNA methylation

protein expression

copy number variants

mutation calls

clinical parameters

Stratome

Anthony92931 / Wikimedia Commons

Correlation with clusters based on other data types?

Different outcomes?

Mutations or copy number variants associated with clusters?

Demographic differences?

How can we explore overlap of patient sets across stratifications?

How can we compare properties of patient sets within a stratification?

How can we discover “interesting” stratifications and pathways to consider

How can we handle terabytes of clinical and genomic data in visualization tools?

Challenges

Problem 1 !

Comparing Patient Sets across Stratifications

Pat

ien

ts

Stratifications

mRNA Copy Numbergene X

Mutationgene Y

del

amp

normal

mut

normal

#2

#3

#4

#1

mRNA Copy Numbergene X

Mutationgene Y

del

amp

normal

mut

normal

#2

#3

#4

#1

mRNA Copy Numbergene X

Mutationgene Y

del

amp

normal

mut

normal

#2

#3

#4

#1

mRNA Copy Numbergene X

Mutationgene Y

del

amp

normal

mut

normal

#2

#3

#4

#1

StratomeX(short for Stratome Explorer)

mRNA Copy Number Mutation

del

amp

normal

mut

normal

#2

#3

#4

#1

Select band

Select block

Compare clusterings: consensus NMF and hierarchical

Park columns

Compare clusterings: left cluster split

Compare clusterings: right cluster split

Compare clusterings: left cluster contained in right cluster

Problem 2 !

Comparing Patient Sets within Stratifications

Block Visualizations: Patient Properties

Numerical Data

Matrix

Vector

Matrix + (Pathway) Maps

Categorical Data

Scalar

Add KEGG glioma pathway and map mRNA transcript levels

Modify color mapping on the fly

View pathway detail (cluster 2)

Zoom into pathway detail (cluster 2): EGFR down-regulated

View pathway detail (cluster 3)

Zoom into pathway detail (cluster 3): EGFR up-regulated

Add copy number for EGFR

Add copy number for EGFR

Add survival stratified by TP53 mutation status

View detail of Kaplan-Meier plot based on TP53

?

Knowledge-driven Exploration

Data-driven Exploration

Problem 3 !

Finding “Interesting” Stratifications and Pathways

Is there a mutation that overlaps with this mRNA cluster?

Is there a CNV that affects survival?

Is there a pathway that is enriched in this cluster?

Is there a mutually exclusive mutation?

Query

Stratifications

Clinical Params

Pathways

Query

Retrieve

Visualize

Stratifications

Clinical Params

Pathways

Guided Exploration

LineUp

S Gratzl, A Lex, N Gehlenborg, H Pfister and M Streit, “LineUp: Visual Analysis of Multi-Attribute Rankings“, IEEE Transactions on Visualization and Computer Graphics 19:2277-2286 (2013)

Main TCGA Paper published in Nature in 2013 !

First goal here: Characterize mRNA clusters

Example: Clear Cell Renal Carcinoma (KIRC)

View TCGA mRNA subtypes

Add MutSig q-values for mutations

Invert q-value mapping

Add filter to inverted q-value as cut-off

Query mutated genes

Retrieve Stratifications

Sets with large overlap: Jaccard Index

Similar stratifications: Adjusted Rand Index

Survival: Log Rank Score (one vs rest)

Retrieve Pathways

Gene Set Enrichtment Score: original or PAGE (one vs rest)

Queries

Query mutated genes

Result of Jaccard Index query: preview PTEN

Query mutated genes

Query mutated genes

Query mutated genes with cluster m2

Result of Jaccard Index query: preview MTOR

Re-order columns

Add TCGA microRNA subtypes (direct insert mode)

Add TCGA microRNA subtypes (direct insert mode)

Observe large overlap between m1 and mi3

Observe large overlap between m3 and mi2

Query for copy number variation matching m3

Query only tumor suppressor genes (Vogelstein et al.)

Query only tumor suppressor genes (Vogelstein et al.)

Score only deletions

Score only deletions

Score only deletions

Score only deletions

Score only deletions

View CDKN2A copy number status and m3 and mi2 overlap

Add survival stratified by TCGA microRNA clusters

Find gene mutation that affects survival

Score only mutations

Score only mutations

Score only mutations

Score only mutations

View BAP1 mutation status and survival stratified by BAP1

View BAP1 mutation status and survival stratified by BAP1

View BAP1 mutation status and survival stratified by BAP1

Query for enriched pathway in TCGA mRNA cluster m4

Preview KEGG ribosome pathway overexpression in m4

Confirm selection

Change color mapping

View ribosome pathway detail for TCGA mRNA cluster m4

?

Problem 4 !

Dealing with Terabytes of Cancer Genomics Data

TCGA Data Coordination Center

Broad Institute Genome Data Analysis Center

Standardized Data Sets

Standardized Analyses

Analysis Reports

MSKCC cBio Portal

TCGA Working Groups

StratomeX

...

Standardized Data Sets Standardized Analyses Analysis Reports

Data set versioning

Format normalization

Removal of redacted data

. . .

Mutation Analysis

Copy Number Analysis

Clustering

Correlations

Pathway Analysis

. . .

102

http://gdac.broadinstitute.org individual downloads and view reports

firehose_getbulk download

Standardized Data Sets Standardized Analyses Analysis Reports

102

http://gdac.broadinstitute.org individual downloads and view reports

firehose_getbulk download

Standardized Data Sets Standardized Analyses Analysis Reports

Data Matrices Stratifications

mRNA (array & sequencing) microRNA (array & sequencing) methylation reverse phase protein array clinical parameters

clustering (CNMF & hierarchical) gene mutation status (binary) gene copy number status (5 class)

+ = one per tumor type

Data Package

Standardized Data Sets Standardized Analyses Analysis Reports

up to 24 data and result files from 18 Firehose archives up to 500 MB (190 MB compressed)

Data Packages

Schroeder et al. Genome Medicine 2013, 5:9

How can we explore overlap of patient sets across stratifications?

How can we compare properties of patient sets within a stratification?

How can we discover “interesting” stratifications and pathways to consider

How can we handle terabytes of clinical and genomic data in visualization tools?

Challenges

StratomeX is part of the Caleydo Visualization Framework

Implemented in Java, uses OpenGL and Eclipse Rich Client Platform

Binaries available for Linux, Windows, Mac OS X

Requires Java 1.7 JRE or JDK (on Mac OS X)

Open source licensed under BSD license

Source code on GitHub

CALEYDO

StratomeX http://stratomex.caleydo.orghttp://www.github.com/caleydo A Lex, M Streit, H-J Schulz, C Partl, D Schmalstieg, PJ Park, N Gehlenborg, “StratomeX: Visual Analysis of Large-Scale Heterogeneous Genomics Data for Cancer Subtype Characterization”, Computer Graphics Forum (EuroVis '12), 31:1175-1184 (2012)

M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, PJ Park, N Gehlenborg, “Guided Visual Exploration of Genomic Stratifications in Cancer”, Nature Methods 11:884–885 (2014)

CALEYDO

Plans !

Where to go from here?

Domino

S Gratzl, N Gehlenborg, A Lex, H Pfister and M Streit, “Domino: Extracting, Comparing, and Manipulating Subsets across Multiple Tabular Datasets“, IEEE Transactions on Visualization and Computer Graphics (2014)

INTEGRATION

INTEGRATION

INTE

GRA

TIO

N

INTEGRATION

Horizontal Integration across Data Types

Biological Insight

Vertical Integration across Data Levels

Confirmation & TroubleshootingIN

TEG

RATI

ON

Refinery Platform

!

! |

!

! |

Data repository based on ISA-Tab for reproducible research

Workflow execution in Galaxy

Integrated visualization tools with access to provenance

http://www.refinery-platform.org

Refinery Platform

StratomeX http://stratomex.caleydo.orghttp://www.github.com/caleydo A Lex, M Streit, H-J Schulz, C Partl, D Schmalstieg, PJ Park, N Gehlenborg, “StratomeX: Visual Analysis of Large-Scale Heterogeneous Genomics Data for Cancer Subtype Characterization”, Computer Graphics Forum (EuroVis '12), 31:1175-1184 (2012)

M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, PJ Park, N Gehlenborg, “Guided Visual Exploration of Genomic Stratifications in Cancer”, Nature Methods 11:884–885 (2014)

CALEYDO

Execute Logrank Test query

Select displayed set

Execute Jaccard Index query

Select displayed Z[YH[PÄJH[PVU

Execute Adjusted Rand Index query

Select pathway

Select displayedset

Execute GSEA query

Select displayedZ[YH[PÄJH[PVU

Select clinical param.in LineUp view

Select displayedZ[YH[PÄJH[PVU

Execute LogrankTest query

Execute PAGE query

:LSLJ[�Z[YH[PÄJH[PVU�:LSLJ[�Z[YH[PÄJH[PVU�:LSLJ[�Z[YH[PÄJH[PVU� :LSLJ[�Z[YH[PÄJH[PVU�:LSLJ[�Z[YH[PÄJH[PVU�in LineUp view

Select pathway Select pathway Select pathway Select clinical param. in LineUp view

(KK�Z[YH[PÄJH[PVU

Based on LogrankTest score (survival)

Based on similarity toKPZWSH`LK�Z[YH[PÄJH[PVU

Based on overlapwith displayed set

Add pathway

Stratify with displayedZ[YH[PÄJH[PVU

Find based on differentialexpression in displayed set

Add other data

Stratify with displayedZ[YH[PÄJH[PVU

Display\UZ[YH[PÄLK

Add pathway

Based on LogrankTest score (survival)

Manually

Add other data

Add independentcolumn

Add dependentcolumn

Add independentcolumn to existing one

Manually

Based on GSEA Based on PAGE

6WLU�8\LY`�>PaHYK�

Select clinical param.in LineUp view

in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view

6WLU�8\LY`�>PaHYK� 6WLU�8\LY`�>PaHYK�

top related