netbiosig2013-talk gang su

25
A ‘Cool’ Heatmap: and its Applications in Flexible Multi-scale Molecular Network Exploration Molecular Behavioral Neuroscience Institute Department of Computational Medicine and Bioinformatics University of Michigan, Ann Arbor 48109 [email protected] Gang Su, PhD Network Biology Sig 2013 Friday July 19 th , Berlin, Germany

Upload: alexander-pico

Post on 10-May-2015

1.491 views

Category:

Health & Medicine


0 download

Tags:

DESCRIPTION

Presentation for Network Biology SIG 2013 by Gang Su, University of Michigan, USA. “CoolMap Cytoscape App: Flexible Multi-scale Heatmap-Driven Molecular Network Exploration”

TRANSCRIPT

Page 1: NetBioSIG2013-Talk Gang Su

A ‘Cool’ Heatmap: and its Applications in Flexible Multi-scale Molecular Network Exploration

Molecular����������� ������������������  Behavioral����������� ������������������  Neuroscience����������� ������������������  Institute����������� ������������������  Department����������� ������������������  of����������� ������������������  Computational����������� ������������������  Medicine����������� ������������������  and����������� ������������������  Bioinformatics����������� ������������������  University����������� ������������������  of����������� ������������������  Michigan,����������� ������������������  Ann����������� ������������������  Arbor����������� ������������������  48109����������� ������������������  [email protected]����������� ������������������  

Gang����������� ������������������  Su,����������� ������������������  PhD����������� ������������������  

Network����������� ������������������  Biology����������� ������������������  Sig����������� ������������������  2013����������� ������������������  Friday����������� ������������������  July����������� ������������������  19th,����������� ������������������  Berlin,����������� ������������������  Germany����������� ������������������  

Page 2: NetBioSIG2013-Talk Gang Su

Heatmap… What is it? ‘CoolMap.. I, am your father’

¤  One of the most popular way of visualizing tabular data ¤  X column, Y row, value color

¤  Trees for hierarchical clustering, or groups are often drawn along the sides

¤  Great format for visual exploration and pattern discovery

¤  Used along with node-edge network views such as Cytoscape-clusterExplorer

¤  The paradigm remains largely unchanged

The American Statistician, 2009; !PNAS Dec. 8, 1998 Vol. 95 No. 25 14863-14868 !

Czekanowski (1909) ! Brinton (1914) !Loua (1873) ! Eisen (1998) !12k citations !

Page 3: NetBioSIG2013-Talk Gang Su

The Good, the Bad, and the Ugly… of the conventional heatmaps

¤  The Good ¤  Mapping number to color makes it intuitive

¤  Clustering patterns become conspicuous and interpretable

¤  The Bad

¤  Increasingly difficult to visualize and explore big datasets

¤  Difficult for data other than numeric

¤  The Ugly ¤  Difficult to incorporate existing annotations such as pathways and ontologies

¤  Difficult to visualize high-level relationships such as overall pathway to pathway correlations

The “Figure 1” Phenomena

Page 4: NetBioSIG2013-Talk Gang Su

There are known knowns, and there are known unknowns.

PLoS Genet. 2008 Mar 14;4(3):e1000034 ! BMC Bioinformatics. 2011; 12(Suppl 1); 2011 !

How do we relate the unknown to the known: From observed patterns to existing knowledge interactively and intuitively?

Page 5: NetBioSIG2013-Talk Gang Su

The $$$ Solution

There are only that many screens you can buy

Page 6: NetBioSIG2013-Talk Gang Su

The CoolMap Solution: Nuts and Bolts

¤  Core concept: ‘Collapsible Heatmap’ ¤  The tree nodes can be expanded/collapsed at any level:

¤  Think about a two-way multi tree

¤  Collapsed data are represented using aggregation functions (mean, median, etc.)

¤  The aggregation enables the user to explore data at multiple levels:

¤  Identify potential signals from high level aggregated views

¤  Expand nodes or interest, while keeping the context around

!Using mean to collapse four numeric cells

The two way tree can be expanded and collapsed at multiple levels

Page 7: NetBioSIG2013-Talk Gang Su

CoolMap: Core Design Concepts

¤  Extensible Interfaces: ¤  A Loader that imports custom data objects into a ‘base’ matrix

¤  An aggregator that transforms a group of ‘base’ data objects into a ‘view’ data object

¤  A render that renders the ‘view’ data object to the designated region in the interactive view

Example:

¤  Gene expression values of all genes in pathway A, sample group B, aggregated using median, and rendered in color [0.5, 1, 2.1, 3.2, 4.3] [2.1]

¤  Nucleotide sequences belong to the same transcription factor binding sites, aggregated using IUPAC consensus code to a single letter, and rendered in text: [A,A,A,A,T] [A] A

¤  The ‘base’ matrix can use a variety of data structures, such as arrays, lists, sparse matrices or even remote services

¤  Flexible Row/Column Ontological Trees: ¤  Multiple-inheritance tree

¤  Genes or metabolites may be shared by multiple pathways or ontological terms, and may occur more than once.

¤  Trees from different sources

¤  Side by side comparison of different ontologies (GO, KEGG, Hierarchical Clustering)

¤  Trees may be used at any level

¤  Tree nodes at any level can be inserted into any place in the tree.

Page 8: NetBioSIG2013-Talk Gang Su

Near-ready Releases ¤  CoolMap Core

¤  Core interfaces, data structures and utility functions for base matrix, view matrix, ontology trees, renderers, interactive view panels, etc.

¤  CoolMap Application ¤  An application with auxiliary modules such as dynamic multiple dataset

synchronization, searcher, filters, sorters, data persistence etc.

¤  Followed many best practices from Cytoscape

¤  CoolMap Cytoscape Prototype Plugin ¤  A Cytoscape plugin that enables two way communication between

Cytoscape and CoolMap

Our user classroom user study of a group of undergraduate students with preliminary computer and bioinformatics background shows:

65% found it easy or not difficult to learn 74% highly enjoyed or enjoyed the software

Page 9: NetBioSIG2013-Talk Gang Su

Screenshot

Page 10: NetBioSIG2013-Talk Gang Su

Case Study 1: Eisen Yeast Data

Eisen (1998) !

Gene expression fold change of selected gene groups and experiment conditions

CoolMap makes it easier to interpret data from the higher concept levels

CoolMap!

Page 11: NetBioSIG2013-Talk Gang Su

Case Study 1: Eisen Yeast Data (con’t)

CoolMap reveals more than meets the eye from conventional heatmaps

The peculiar outlier sample of spo5 2 Fold change reversed across many pathways Easier to identify in the aggregated view

í

Page 12: NetBioSIG2013-Talk Gang Su

Case Study 1: Eisen Yeast Data (con’t)

Using CoolMap’s multi-view link functions to compare different ontology definitions Left: Go 6096: Glycolysis Right: Eisen’s annotated Glycolysis cluster

Integrate existing knowledge with observed data for hypothesis generation

Page 13: NetBioSIG2013-Talk Gang Su

Case Study 2: Diet Induced Differential Gene Expression

¤  Individuals fed on SFA (Saturated Fatty Acid) and Monounsaturated Fatty Acid (MUFA) diets demonstrate differential gene expression over 8 week span

¤  Authors picked a list of immune related genes showed up-regulation of these genes

The American journal of clinical nutrition 90, 1656-64 (2009) !

CoolMap!

Page 14: NetBioSIG2013-Talk Gang Su

Probe level expression profiles can be maintained

Case Study 2: Diet Induced Differential Gene Expression (cont’d)

Page 15: NetBioSIG2013-Talk Gang Su

Using ontology groups (genders) leads to new discoveries: up-regulated gene groups and gender-specific responses: weaker patterns. Total of 25k probes

Case Study 2: Diet Induced Differential Gene Expression (cont’d)

Up-regulated clusters Female-specific Male-specific

Page 16: NetBioSIG2013-Talk Gang Su

Case Study 3: Mother-Child Nutrition Data (Unpublished)

v The aggregated group view makes it much easier to interpret at concept level v We can immediately identify that:

§  BCAA AcylCarnitines(0.45), Long Chain AcylCarnitines(0.34), PPARa methylation (0.52), ESR Methylation (0.32) are highly correlated between mother and child

Burant C. Unpublished data !

Page 17: NetBioSIG2013-Talk Gang Su

Case Study 3: Mother-Child Nutrition Data (Unpublished) PPARa: One Level Down ê

¤  Validation ¤  Boxplot overlay (left) and expanded view (right) shows the high correlation is unlikely to be a result

from error, outliers or noise (mean 0.52) ¤  Strong association of PPARa methylation levels in mother and child.

¤  Hypothesis ¤  As PPARa regulates genes involved in cell proliferation, cell differentiation and inflammation

responses, the expression profile of these genes may also be correlated in mother and child.

http://www.ncbi.nlm.nih.gov/gene/5465 !

Burant C. Unpublished data !

Page 18: NetBioSIG2013-Talk Gang Su

Case Study 3: Mother-Child Nutrition Data (Unpublished) BCAA AcylCarnitines

¤  The Mother-child correlation is lower (mean 0.45) ¤  The BCAA AcylCarnitines intra-child group have a larger variance comparing with Mother

¤  While C3 is highly correlated, C4 has low correlation

Page 19: NetBioSIG2013-Talk Gang Su

Case Study 4: DNA Methylation Missing values and ragged data (unpublished)

¤  Sparse or Ragged matrix ¤  Normalized methylation data: every gene has a different number of methylation sites.

¤  Collapsing by cell line (Caski.1 and Caski.2 cell lines) reveals the aggregated (mean, etc.) normalized methylation value. Expansion by cell line reveals details for each methylation site.

Sartor M. Unpublished data !

Page 20: NetBioSIG2013-Talk Gang Su

Case Study 5: Continuous Glucose Monitoring (CGM)

Display glucose level at: •  a variety of time resolutions:

From 5 min to 1 month •  and sample groups:

age groups, gender

Link hypoglycemia events to blood sugar changes.

Page 21: NetBioSIG2013-Talk Gang Su

Case Study 6: Sequence Analysis Example

¤  Interactive Consensus sequence exploration: CRP (Catabolite Activator Protein) binding site, 49 sequences in dozens of promoters | Chip-seq

¤  Extend CoolMap: Loader, Aggregator, Renderer [Annotator]

Full Sequence View !

Sequence Logo !

Consensus View !

Consensus View with base percentage overlay !

Consensus View with GC content overlay !Genome Res. 2004 June; 14(6): 1188-1190 !

Page 22: NetBioSIG2013-Talk Gang Su

Case Study 7: Network Analysis

¤  Link Cytoscape with CoolMap: ¤  Network node link with CoolMap views, by ID, attribute names, etc.

¤  Explore identified patterns in an experiment to curated networks – an alternative for JTreeView; create correlation matrices from Cytoscape numeric attributes;

¤  Use pathways and ontologies to view sub-network to sub-network connectivity

¤  Cluster network based on attributes, and compare unsupervised clustering v.s. annotated pathways and ontologies.

Need two monitors!

Page 23: NetBioSIG2013-Talk Gang Su

Case Study 7: Network Analysis (con’t)

Top Left: MAPK pathway in ‘galFiltered.cys’ network from Cytoscape Bottom Left: Part of the same network arranged with pathways and the adjacency matrix, and sum as aggregator. Each cell shows the number of edges within each pathway, as well as the number of inter-pathway edges. A good ‘community’ clustering will have most of the green dots along the diagonal Right: The same view with MAPK pathway expanded, showing dense intra-cluster connectivity

Page 24: NetBioSIG2013-Talk Gang Su

Case Study 7: Network Analysis (con’t)

Left: a correlation matrix can be created from gal expression profiles, and then use pathways to arrange them into a condensed concept correlation view. Hierarchical clustering can be run from the concept level. Right: The selected region contains nodes are annotated with KEGG pathway: Cell cycle and are close to each other in the network

Page 25: NetBioSIG2013-Talk Gang Su

Acknowledgement

Thank you! Primary Advisor

Dr Fan Meng

Committee Mentors

Dr Brian D. Athey (Co-chair)

Dr Charles F. Burant and his lab

Dr Barbara Mirel

Dr Maureen Sartor

Testers

Usability testers and software testers, fellow Bioinformatics brethren.

Development

Please contact me if you are interested in development or testing:

[email protected]