netbiosig2013-talk gang su
DESCRIPTION
Presentation for Network Biology SIG 2013 by Gang Su, University of Michigan, USA. “CoolMap Cytoscape App: Flexible Multi-scale Heatmap-Driven Molecular Network Exploration”TRANSCRIPT
A ‘Cool’ Heatmap: and its Applications in Flexible Multi-scale Molecular Network Exploration
Molecular����������� ������������������ Behavioral����������� ������������������ Neuroscience����������� ������������������ Institute����������� ������������������ Department����������� ������������������ of����������� ������������������ Computational����������� ������������������ Medicine����������� ������������������ and����������� ������������������ Bioinformatics����������� ������������������ University����������� ������������������ of����������� ������������������ Michigan,����������� ������������������ Ann����������� ������������������ Arbor����������� ������������������ 48109����������� ������������������ [email protected]����������� ������������������
Gang����������� ������������������ Su,����������� ������������������ PhD����������� ������������������
Network����������� ������������������ Biology����������� ������������������ Sig����������� ������������������ 2013����������� ������������������ Friday����������� ������������������ July����������� ������������������ 19th,����������� ������������������ Berlin,����������� ������������������ Germany����������� ������������������
Heatmap… What is it? ‘CoolMap.. I, am your father’
¤ One of the most popular way of visualizing tabular data ¤ X column, Y row, value color
¤ Trees for hierarchical clustering, or groups are often drawn along the sides
¤ Great format for visual exploration and pattern discovery
¤ Used along with node-edge network views such as Cytoscape-clusterExplorer
¤ The paradigm remains largely unchanged
The American Statistician, 2009; !PNAS Dec. 8, 1998 Vol. 95 No. 25 14863-14868 !
Czekanowski (1909) ! Brinton (1914) !Loua (1873) ! Eisen (1998) !12k citations !
The Good, the Bad, and the Ugly… of the conventional heatmaps
¤ The Good ¤ Mapping number to color makes it intuitive
¤ Clustering patterns become conspicuous and interpretable
¤ The Bad
¤ Increasingly difficult to visualize and explore big datasets
¤ Difficult for data other than numeric
¤ The Ugly ¤ Difficult to incorporate existing annotations such as pathways and ontologies
¤ Difficult to visualize high-level relationships such as overall pathway to pathway correlations
The “Figure 1” Phenomena
There are known knowns, and there are known unknowns.
PLoS Genet. 2008 Mar 14;4(3):e1000034 ! BMC Bioinformatics. 2011; 12(Suppl 1); 2011 !
How do we relate the unknown to the known: From observed patterns to existing knowledge interactively and intuitively?
The $$$ Solution
There are only that many screens you can buy
The CoolMap Solution: Nuts and Bolts
¤ Core concept: ‘Collapsible Heatmap’ ¤ The tree nodes can be expanded/collapsed at any level:
¤ Think about a two-way multi tree
¤ Collapsed data are represented using aggregation functions (mean, median, etc.)
¤ The aggregation enables the user to explore data at multiple levels:
¤ Identify potential signals from high level aggregated views
¤ Expand nodes or interest, while keeping the context around
!Using mean to collapse four numeric cells
The two way tree can be expanded and collapsed at multiple levels
CoolMap: Core Design Concepts
¤ Extensible Interfaces: ¤ A Loader that imports custom data objects into a ‘base’ matrix
¤ An aggregator that transforms a group of ‘base’ data objects into a ‘view’ data object
¤ A render that renders the ‘view’ data object to the designated region in the interactive view
Example:
¤ Gene expression values of all genes in pathway A, sample group B, aggregated using median, and rendered in color [0.5, 1, 2.1, 3.2, 4.3] [2.1]
¤ Nucleotide sequences belong to the same transcription factor binding sites, aggregated using IUPAC consensus code to a single letter, and rendered in text: [A,A,A,A,T] [A] A
¤ The ‘base’ matrix can use a variety of data structures, such as arrays, lists, sparse matrices or even remote services
¤ Flexible Row/Column Ontological Trees: ¤ Multiple-inheritance tree
¤ Genes or metabolites may be shared by multiple pathways or ontological terms, and may occur more than once.
¤ Trees from different sources
¤ Side by side comparison of different ontologies (GO, KEGG, Hierarchical Clustering)
¤ Trees may be used at any level
¤ Tree nodes at any level can be inserted into any place in the tree.
Near-ready Releases ¤ CoolMap Core
¤ Core interfaces, data structures and utility functions for base matrix, view matrix, ontology trees, renderers, interactive view panels, etc.
¤ CoolMap Application ¤ An application with auxiliary modules such as dynamic multiple dataset
synchronization, searcher, filters, sorters, data persistence etc.
¤ Followed many best practices from Cytoscape
¤ CoolMap Cytoscape Prototype Plugin ¤ A Cytoscape plugin that enables two way communication between
Cytoscape and CoolMap
Our user classroom user study of a group of undergraduate students with preliminary computer and bioinformatics background shows:
65% found it easy or not difficult to learn 74% highly enjoyed or enjoyed the software
Screenshot
Case Study 1: Eisen Yeast Data
Eisen (1998) !
Gene expression fold change of selected gene groups and experiment conditions
CoolMap makes it easier to interpret data from the higher concept levels
CoolMap!
Case Study 1: Eisen Yeast Data (con’t)
CoolMap reveals more than meets the eye from conventional heatmaps
The peculiar outlier sample of spo5 2 Fold change reversed across many pathways Easier to identify in the aggregated view
í
Case Study 1: Eisen Yeast Data (con’t)
Using CoolMap’s multi-view link functions to compare different ontology definitions Left: Go 6096: Glycolysis Right: Eisen’s annotated Glycolysis cluster
Integrate existing knowledge with observed data for hypothesis generation
Case Study 2: Diet Induced Differential Gene Expression
¤ Individuals fed on SFA (Saturated Fatty Acid) and Monounsaturated Fatty Acid (MUFA) diets demonstrate differential gene expression over 8 week span
¤ Authors picked a list of immune related genes showed up-regulation of these genes
The American journal of clinical nutrition 90, 1656-64 (2009) !
CoolMap!
Probe level expression profiles can be maintained
Case Study 2: Diet Induced Differential Gene Expression (cont’d)
Using ontology groups (genders) leads to new discoveries: up-regulated gene groups and gender-specific responses: weaker patterns. Total of 25k probes
Case Study 2: Diet Induced Differential Gene Expression (cont’d)
Up-regulated clusters Female-specific Male-specific
Case Study 3: Mother-Child Nutrition Data (Unpublished)
v The aggregated group view makes it much easier to interpret at concept level v We can immediately identify that:
§ BCAA AcylCarnitines(0.45), Long Chain AcylCarnitines(0.34), PPARa methylation (0.52), ESR Methylation (0.32) are highly correlated between mother and child
Burant C. Unpublished data !
Case Study 3: Mother-Child Nutrition Data (Unpublished) PPARa: One Level Down ê
¤ Validation ¤ Boxplot overlay (left) and expanded view (right) shows the high correlation is unlikely to be a result
from error, outliers or noise (mean 0.52) ¤ Strong association of PPARa methylation levels in mother and child.
¤ Hypothesis ¤ As PPARa regulates genes involved in cell proliferation, cell differentiation and inflammation
responses, the expression profile of these genes may also be correlated in mother and child.
http://www.ncbi.nlm.nih.gov/gene/5465 !
Burant C. Unpublished data !
Case Study 3: Mother-Child Nutrition Data (Unpublished) BCAA AcylCarnitines
¤ The Mother-child correlation is lower (mean 0.45) ¤ The BCAA AcylCarnitines intra-child group have a larger variance comparing with Mother
¤ While C3 is highly correlated, C4 has low correlation
Case Study 4: DNA Methylation Missing values and ragged data (unpublished)
¤ Sparse or Ragged matrix ¤ Normalized methylation data: every gene has a different number of methylation sites.
¤ Collapsing by cell line (Caski.1 and Caski.2 cell lines) reveals the aggregated (mean, etc.) normalized methylation value. Expansion by cell line reveals details for each methylation site.
Sartor M. Unpublished data !
Case Study 5: Continuous Glucose Monitoring (CGM)
Display glucose level at: • a variety of time resolutions:
From 5 min to 1 month • and sample groups:
age groups, gender
Link hypoglycemia events to blood sugar changes.
Case Study 6: Sequence Analysis Example
¤ Interactive Consensus sequence exploration: CRP (Catabolite Activator Protein) binding site, 49 sequences in dozens of promoters | Chip-seq
¤ Extend CoolMap: Loader, Aggregator, Renderer [Annotator]
Full Sequence View !
Sequence Logo !
Consensus View !
Consensus View with base percentage overlay !
Consensus View with GC content overlay !Genome Res. 2004 June; 14(6): 1188-1190 !
Case Study 7: Network Analysis
¤ Link Cytoscape with CoolMap: ¤ Network node link with CoolMap views, by ID, attribute names, etc.
¤ Explore identified patterns in an experiment to curated networks – an alternative for JTreeView; create correlation matrices from Cytoscape numeric attributes;
¤ Use pathways and ontologies to view sub-network to sub-network connectivity
¤ Cluster network based on attributes, and compare unsupervised clustering v.s. annotated pathways and ontologies.
Need two monitors!
Case Study 7: Network Analysis (con’t)
Top Left: MAPK pathway in ‘galFiltered.cys’ network from Cytoscape Bottom Left: Part of the same network arranged with pathways and the adjacency matrix, and sum as aggregator. Each cell shows the number of edges within each pathway, as well as the number of inter-pathway edges. A good ‘community’ clustering will have most of the green dots along the diagonal Right: The same view with MAPK pathway expanded, showing dense intra-cluster connectivity
Case Study 7: Network Analysis (con’t)
Left: a correlation matrix can be created from gal expression profiles, and then use pathways to arrange them into a condensed concept correlation view. Hierarchical clustering can be run from the concept level. Right: The selected region contains nodes are annotated with KEGG pathway: Cell cycle and are close to each other in the network
Acknowledgement
Thank you! Primary Advisor
Dr Fan Meng
Committee Mentors
Dr Brian D. Athey (Co-chair)
Dr Charles F. Burant and his lab
Dr Barbara Mirel
Dr Maureen Sartor
Testers
Usability testers and software testers, fellow Bioinformatics brethren.
Development
Please contact me if you are interested in development or testing: