visual exploration of clinical and genomic data for patient stratification

131
Visual Exploration of Clinical and Genomic Data for Patient Stratification NILS GEHLENBORG @nils_gehlenborg http://www.gehlenborg.com Broad Institute of MIT and Harvard Cancer Program Harvard Medical School Center for Biomedical Informatics

Upload: ngehlenborg

Post on 07-Dec-2014

294 views

Category:

Science


1 download

DESCRIPTION

Talk presented at the Simons Foundation Biotech Symposium "Complex Data Visualization: Approach and Application" (12 September 2014) http://www.simonsfoundation.org/event/complex-data-visualization-approach-and-application/ In this talk I describe how we integrated a sophisticated computational framework directly into the StratomeX visualization technique to enable rapid exploration of tens of thousands of stratifications in cancer genomics data, creating a unique and powerful tool for the identification and characterization of tumor subtypes. The tool can handle a wide range of genomic and clinical data types for cohorts with hundreds of patients. StratomeX also provides direct access to comprehensive data sets generated by The Cancer Genome Atlas Firehose analysis pipeline. http://stratomex.caleydo.org

TRANSCRIPT

Page 1: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Visual Exploration of Clinical and Genomic Data for Patient Stratification

NILS GEHLENBORG !

@nils_gehlenborg・http://www.gehlenborg.com

Broad Institute of MIT and Harvard

Cancer ProgramHarvard Medical School Center for Biomedical Informatics

Page 2: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Alexander Lex Harvard School of Engineering and Applied Sciences, Cambridge, MA, USA

Marc Streit Johannes Kepler University, Linz, Austria

Christian Partl Graz University of Technology, Graz, Austria

Sam Gratzl Johannes Kepler University, Linz, Austria

Dieter Schmalstieg Graz University of Technology, Graz, Austria !

Hanspeter Pfister Harvard School of Engineering and Applied Sciences, Cambridge, MA, USA

Peter J Park Harvard Medical School, Boston, MA, USA !

Nils Gehlenborg Harvard Medical School, Boston, MA, USA & Broad Institute, Cambridge, MA !!!!Special thanks to

Broad Institute TCGA Genome Data Analysis Center Team in particular Michael S Noble, Lynda Chin & Gaddy Getz

Team

Page 3: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Peter J Park NIH/NCI The Cancer Genome Atlas !

Nils Gehlenborg NIH/NHGRI K99/R00 Pathway to Independence Award !!

Funding

Page 4: Visual Exploration of Clinical and Genomic Data for Patient Stratification

?

Page 5: Visual Exploration of Clinical and Genomic Data for Patient Stratification
Page 6: Visual Exploration of Clinical and Genomic Data for Patient Stratification
Page 7: Visual Exploration of Clinical and Genomic Data for Patient Stratification
Page 8: Visual Exploration of Clinical and Genomic Data for Patient Stratification
Page 9: Visual Exploration of Clinical and Genomic Data for Patient Stratification
Page 10: Visual Exploration of Clinical and Genomic Data for Patient Stratification

TCGAThe Cancer Genome Atlas

Page 11: Visual Exploration of Clinical and Genomic Data for Patient Stratification

20+ cancer types ×

500 patients

Page 12: Visual Exploration of Clinical and Genomic Data for Patient Stratification

10,000+ patients

Page 13: Visual Exploration of Clinical and Genomic Data for Patient Stratification
Page 14: Visual Exploration of Clinical and Genomic Data for Patient Stratification

mRNA expression

microRNA expression

DNA methylation

protein expression

copy number variants

mutation calls

clinical parameters

Page 15: Visual Exploration of Clinical and Genomic Data for Patient Stratification
Page 16: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Stratome

Page 17: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Anthony92931 / Wikimedia Commons

Page 18: Visual Exploration of Clinical and Genomic Data for Patient Stratification
Page 19: Visual Exploration of Clinical and Genomic Data for Patient Stratification
Page 20: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Correlation with clusters based on other data types?

Different outcomes?

Mutations or copy number variants associated with clusters?

Demographic differences?

Page 21: Visual Exploration of Clinical and Genomic Data for Patient Stratification

How can we explore overlap of patient sets across stratifications?

How can we compare properties of patient sets within a stratification?

How can we discover “interesting” stratifications and pathways to consider

How can we handle terabytes of clinical and genomic data in visualization tools?

Challenges

Page 22: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Problem 1 !

Comparing Patient Sets across Stratifications

Page 23: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Pat

ien

ts

Stratifications

Page 24: Visual Exploration of Clinical and Genomic Data for Patient Stratification

mRNA Copy Numbergene X

Mutationgene Y

del

amp

normal

mut

normal

#2

#3

#4

#1

Page 25: Visual Exploration of Clinical and Genomic Data for Patient Stratification

mRNA Copy Numbergene X

Mutationgene Y

del

amp

normal

mut

normal

#2

#3

#4

#1

Page 26: Visual Exploration of Clinical and Genomic Data for Patient Stratification

mRNA Copy Numbergene X

Mutationgene Y

del

amp

normal

mut

normal

#2

#3

#4

#1

Page 27: Visual Exploration of Clinical and Genomic Data for Patient Stratification

mRNA Copy Numbergene X

Mutationgene Y

del

amp

normal

mut

normal

#2

#3

#4

#1

Page 28: Visual Exploration of Clinical and Genomic Data for Patient Stratification

StratomeX(short for Stratome Explorer)

Page 29: Visual Exploration of Clinical and Genomic Data for Patient Stratification

mRNA Copy Number Mutation

del

amp

normal

mut

normal

#2

#3

#4

#1

Page 30: Visual Exploration of Clinical and Genomic Data for Patient Stratification
Page 31: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Select band

Page 32: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Select block

Page 33: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Compare clusterings: consensus NMF and hierarchical

Page 34: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Park columns

Page 35: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Compare clusterings: left cluster split

Page 36: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Compare clusterings: right cluster split

Page 37: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Compare clusterings: left cluster contained in right cluster

Page 38: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Problem 2 !

Comparing Patient Sets within Stratifications

Page 39: Visual Exploration of Clinical and Genomic Data for Patient Stratification
Page 40: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Block Visualizations: Patient Properties

Numerical Data

Matrix

Vector

Matrix + (Pathway) Maps

Categorical Data

Scalar

Page 41: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Add KEGG glioma pathway and map mRNA transcript levels

Page 42: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Modify color mapping on the fly

Page 43: Visual Exploration of Clinical and Genomic Data for Patient Stratification

View pathway detail (cluster 2)

Page 44: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Zoom into pathway detail (cluster 2): EGFR down-regulated

Page 45: Visual Exploration of Clinical and Genomic Data for Patient Stratification
Page 46: Visual Exploration of Clinical and Genomic Data for Patient Stratification

View pathway detail (cluster 3)

Page 47: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Zoom into pathway detail (cluster 3): EGFR up-regulated

Page 48: Visual Exploration of Clinical and Genomic Data for Patient Stratification
Page 49: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Add copy number for EGFR

Page 50: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Add copy number for EGFR

Page 51: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Add survival stratified by TP53 mutation status

Page 52: Visual Exploration of Clinical and Genomic Data for Patient Stratification

View detail of Kaplan-Meier plot based on TP53

Page 53: Visual Exploration of Clinical and Genomic Data for Patient Stratification
Page 54: Visual Exploration of Clinical and Genomic Data for Patient Stratification

?

Page 55: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Knowledge-driven Exploration

Data-driven Exploration

Page 56: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Problem 3 !

Finding “Interesting” Stratifications and Pathways

Page 57: Visual Exploration of Clinical and Genomic Data for Patient Stratification
Page 58: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Is there a mutation that overlaps with this mRNA cluster?

Is there a CNV that affects survival?

Is there a pathway that is enriched in this cluster?

Is there a mutually exclusive mutation?

Query

Stratifications

Clinical Params

Pathways

Page 59: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Query

Retrieve

Visualize

Stratifications

Clinical Params

Pathways

Guided Exploration

Page 60: Visual Exploration of Clinical and Genomic Data for Patient Stratification
Page 61: Visual Exploration of Clinical and Genomic Data for Patient Stratification

LineUp

S Gratzl, A Lex, N Gehlenborg, H Pfister and M Streit, “LineUp: Visual Analysis of Multi-Attribute Rankings“, IEEE Transactions on Visualization and Computer Graphics 19:2277-2286 (2013)

Page 62: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Main TCGA Paper published in Nature in 2013 !

First goal here: Characterize mRNA clusters

Example: Clear Cell Renal Carcinoma (KIRC)

Page 63: Visual Exploration of Clinical and Genomic Data for Patient Stratification

View TCGA mRNA subtypes

Page 64: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Add MutSig q-values for mutations

Page 65: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Invert q-value mapping

Page 66: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Add filter to inverted q-value as cut-off

Page 67: Visual Exploration of Clinical and Genomic Data for Patient Stratification
Page 68: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Query mutated genes

Page 69: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Retrieve Stratifications

Sets with large overlap: Jaccard Index

Similar stratifications: Adjusted Rand Index

Survival: Log Rank Score (one vs rest)

Retrieve Pathways

Gene Set Enrichtment Score: original or PAGE (one vs rest)

Queries

Page 70: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Query mutated genes

Page 71: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Result of Jaccard Index query: preview PTEN

Page 72: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Query mutated genes

Page 73: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Query mutated genes

Page 74: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Query mutated genes with cluster m2

Page 75: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Result of Jaccard Index query: preview MTOR

Page 76: Visual Exploration of Clinical and Genomic Data for Patient Stratification
Page 77: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Re-order columns

Page 78: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Add TCGA microRNA subtypes (direct insert mode)

Page 79: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Add TCGA microRNA subtypes (direct insert mode)

Page 80: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Observe large overlap between m1 and mi3

Page 81: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Observe large overlap between m3 and mi2

Page 82: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Query for copy number variation matching m3

Page 83: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Query only tumor suppressor genes (Vogelstein et al.)

Page 84: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Query only tumor suppressor genes (Vogelstein et al.)

Page 85: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Score only deletions

Page 86: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Score only deletions

Page 87: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Score only deletions

Page 88: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Score only deletions

Page 89: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Score only deletions

Page 90: Visual Exploration of Clinical and Genomic Data for Patient Stratification

View CDKN2A copy number status and m3 and mi2 overlap

Page 91: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Add survival stratified by TCGA microRNA clusters

Page 92: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Find gene mutation that affects survival

Page 93: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Score only mutations

Page 94: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Score only mutations

Page 95: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Score only mutations

Page 96: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Score only mutations

Page 97: Visual Exploration of Clinical and Genomic Data for Patient Stratification

View BAP1 mutation status and survival stratified by BAP1

Page 98: Visual Exploration of Clinical and Genomic Data for Patient Stratification

View BAP1 mutation status and survival stratified by BAP1

Page 99: Visual Exploration of Clinical and Genomic Data for Patient Stratification

View BAP1 mutation status and survival stratified by BAP1

Page 100: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Query for enriched pathway in TCGA mRNA cluster m4

Page 101: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Preview KEGG ribosome pathway overexpression in m4

Page 102: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Confirm selection

Page 103: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Change color mapping

Page 104: Visual Exploration of Clinical and Genomic Data for Patient Stratification

View ribosome pathway detail for TCGA mRNA cluster m4

Page 105: Visual Exploration of Clinical and Genomic Data for Patient Stratification

?

Page 106: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Problem 4 !

Dealing with Terabytes of Cancer Genomics Data

Page 107: Visual Exploration of Clinical and Genomic Data for Patient Stratification

TCGA Data Coordination Center

Broad Institute Genome Data Analysis Center

Standardized Data Sets

Standardized Analyses

Analysis Reports

MSKCC cBio Portal

TCGA Working Groups

StratomeX

...

Page 108: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Standardized Data Sets Standardized Analyses Analysis Reports

Data set versioning

Format normalization

Removal of redacted data

. . .

Mutation Analysis

Copy Number Analysis

Clustering

Correlations

Pathway Analysis

. . .

Page 109: Visual Exploration of Clinical and Genomic Data for Patient Stratification

102

http://gdac.broadinstitute.org individual downloads and view reports

firehose_getbulk download

Standardized Data Sets Standardized Analyses Analysis Reports

Page 110: Visual Exploration of Clinical and Genomic Data for Patient Stratification

102

http://gdac.broadinstitute.org individual downloads and view reports

firehose_getbulk download

Standardized Data Sets Standardized Analyses Analysis Reports

Page 111: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Data Matrices Stratifications

mRNA (array & sequencing) microRNA (array & sequencing) methylation reverse phase protein array clinical parameters

clustering (CNMF & hierarchical) gene mutation status (binary) gene copy number status (5 class)

+ = one per tumor type

Data Package

Standardized Data Sets Standardized Analyses Analysis Reports

Page 112: Visual Exploration of Clinical and Genomic Data for Patient Stratification
Page 113: Visual Exploration of Clinical and Genomic Data for Patient Stratification
Page 114: Visual Exploration of Clinical and Genomic Data for Patient Stratification

up to 24 data and result files from 18 Firehose archives up to 500 MB (190 MB compressed)

Data Packages

Page 115: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Schroeder et al. Genome Medicine 2013, 5:9

Page 116: Visual Exploration of Clinical and Genomic Data for Patient Stratification

How can we explore overlap of patient sets across stratifications?

How can we compare properties of patient sets within a stratification?

How can we discover “interesting” stratifications and pathways to consider

How can we handle terabytes of clinical and genomic data in visualization tools?

Challenges

Page 117: Visual Exploration of Clinical and Genomic Data for Patient Stratification

StratomeX is part of the Caleydo Visualization Framework

Implemented in Java, uses OpenGL and Eclipse Rich Client Platform

Binaries available for Linux, Windows, Mac OS X

Requires Java 1.7 JRE or JDK (on Mac OS X)

Open source licensed under BSD license

Source code on GitHub

CALEYDO

Page 118: Visual Exploration of Clinical and Genomic Data for Patient Stratification

StratomeX http://stratomex.caleydo.orghttp://www.github.com/caleydo A Lex, M Streit, H-J Schulz, C Partl, D Schmalstieg, PJ Park, N Gehlenborg, “StratomeX: Visual Analysis of Large-Scale Heterogeneous Genomics Data for Cancer Subtype Characterization”, Computer Graphics Forum (EuroVis '12), 31:1175-1184 (2012)

M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, PJ Park, N Gehlenborg, “Guided Visual Exploration of Genomic Stratifications in Cancer”, Nature Methods 11:884–885 (2014)

CALEYDO

Page 119: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Plans !

Where to go from here?

Page 120: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Domino

S Gratzl, N Gehlenborg, A Lex, H Pfister and M Streit, “Domino: Extracting, Comparing, and Manipulating Subsets across Multiple Tabular Datasets“, IEEE Transactions on Visualization and Computer Graphics (2014)

Page 121: Visual Exploration of Clinical and Genomic Data for Patient Stratification
Page 122: Visual Exploration of Clinical and Genomic Data for Patient Stratification

INTEGRATION

Page 123: Visual Exploration of Clinical and Genomic Data for Patient Stratification

INTEGRATION

INTE

GRA

TIO

N

Page 124: Visual Exploration of Clinical and Genomic Data for Patient Stratification

INTEGRATION

Horizontal Integration across Data Types

Biological Insight

Page 125: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Vertical Integration across Data Levels

Confirmation & TroubleshootingIN

TEG

RATI

ON

Page 126: Visual Exploration of Clinical and Genomic Data for Patient Stratification
Page 127: Visual Exploration of Clinical and Genomic Data for Patient Stratification
Page 128: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Refinery Platform

!

! |

Page 129: Visual Exploration of Clinical and Genomic Data for Patient Stratification

!

! |

Data repository based on ISA-Tab for reproducible research

Workflow execution in Galaxy

Integrated visualization tools with access to provenance

http://www.refinery-platform.org

Refinery Platform

Page 130: Visual Exploration of Clinical and Genomic Data for Patient Stratification

StratomeX http://stratomex.caleydo.orghttp://www.github.com/caleydo A Lex, M Streit, H-J Schulz, C Partl, D Schmalstieg, PJ Park, N Gehlenborg, “StratomeX: Visual Analysis of Large-Scale Heterogeneous Genomics Data for Cancer Subtype Characterization”, Computer Graphics Forum (EuroVis '12), 31:1175-1184 (2012)

M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, PJ Park, N Gehlenborg, “Guided Visual Exploration of Genomic Stratifications in Cancer”, Nature Methods 11:884–885 (2014)

CALEYDO

Page 131: Visual Exploration of Clinical and Genomic Data for Patient Stratification

Execute Logrank Test query

Select displayed set

Execute Jaccard Index query

Select displayed Z[YH[PÄJH[PVU

Execute Adjusted Rand Index query

Select pathway

Select displayedset

Execute GSEA query

Select displayedZ[YH[PÄJH[PVU

Select clinical param.in LineUp view

Select displayedZ[YH[PÄJH[PVU

Execute LogrankTest query

Execute PAGE query

:LSLJ[�Z[YH[PÄJH[PVU�:LSLJ[�Z[YH[PÄJH[PVU�:LSLJ[�Z[YH[PÄJH[PVU� :LSLJ[�Z[YH[PÄJH[PVU�:LSLJ[�Z[YH[PÄJH[PVU�in LineUp view

Select pathway Select pathway Select pathway Select clinical param. in LineUp view

(KK�Z[YH[PÄJH[PVU

Based on LogrankTest score (survival)

Based on similarity toKPZWSH`LK�Z[YH[PÄJH[PVU

Based on overlapwith displayed set

Add pathway

Stratify with displayedZ[YH[PÄJH[PVU

Find based on differentialexpression in displayed set

Add other data

Stratify with displayedZ[YH[PÄJH[PVU

Display\UZ[YH[PÄLK

Add pathway

Based on LogrankTest score (survival)

Manually

Add other data

Add independentcolumn

Add dependentcolumn

Add independentcolumn to existing one

Manually

Based on GSEA Based on PAGE

6WLU�8\LY`�>PaHYK�

Select clinical param.in LineUp view

in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view

6WLU�8\LY`�>PaHYK� 6WLU�8\LY`�>PaHYK�