introduction to biological network analysis and visualization with cytoscape part1
TRANSCRIPT
Introduction to Biological Network Analysis and Visualization with Cytoscape
Keiichiro OnoCytoscape Core Developer TeamUC, San Diego Trey Ideker Lab / National Resource for Network Biology
5/10/2016 The Scripps Research Institute
Lecture 1: Basics
Keiichiro Ono
Cytoscape Core Developer since 2005 @UCSD Trey Ideker Lab
Area of Interest:Biological Data Integration & Visualization
Agenda
• Lecture 1 (Today): Introduction to Biological Network Analysis and Visualization
• What is the benefits of biological network analysis and visualization?
• Introduction to Cytoscape
• Preview of Lecture 2: cyREST
• Lecture 2:Reproducible Analysis & Visualization
• Introduction to Jupyter Notebook
• Create a reproducible network visualization workflows with Python
All documents, data, and code are available here:
https://github.com/idekerlab/tsri-lecture
Why Network Analysis?
Networks?
EP300
PPARG
SMARCD3
STMN1
SMARCA4
OPTN
ATP6V1C1
PSMD1
HTT
PRNP
HNRNPUL1
CCDC88A
CLU
HSP90AB1
SMARCD3
MAP4K4
MIF4GDUSP11
MARCH6TUBB
EDF1 CHD8
Protein-Protein Interactions
Human-Curated Pathways
KEGG Pathway (TCA Cycle) visualized by Cytoscape KGMLReader
Interactomes
Human Interactome data from BioGRID visualized with Cytoscape
Social Networks
Network extracted from Panama Papers data set
Node 1 - Edge Type - Node 2
Protein 1 - Y2H - Protein 2
Networks vs Pathways
Networks Pathways
Networks Pathways
Collection of binary interactions Human-curated / detailed
Large Scale Small Scale
Generated from omics-data Constructed from literature
Networks / Pathways = Graphs
Benefits of Network Analysis
Benefits of Network Analysis
- You have list of N genes from your screening
- Now you want to know: - Relationships among those genes
- Functions - etc.
Screening 1
PPARGTCF7L2RETNIRS1HNF1AHNF4AKCNJ11GCKLIPCPTPN1ABCC8ENPP1HNF1B
ENSG00000167780ENSG00000255974
EBI-9992455
ENSG00000241119 ANP32B
EBI-10055098
EBI-10055672
EBI-9871829
NR3C1
EBI-10096648
STK16
SNCG
EBI-10034984
EBI-9973444
RAD50
EBI-9980932
ENSG00000070019
EBI-9871836
ENSG00000105398
EBI-9992700
ENSG00000207778
ENSG00000143819
EBI-9980935
ENSG00000125730
ENSG00000180432
ENSG00000197249
EXT2
EBI-5333164
EBI-10051521
OPTN
EBI-10050213 PPARGC1A
EBI-10039585
MAPK8
HNF4A EDF1
SFPQ
ENSG00000110245
PPARG EP300
SMARCD3
PRNPHNRNPUL1
ATP6V1C1
HTT
EBI-10050241
EBI-10039564
ENSG00000118137
PABPC4
EBI-10050232
EBI-10051518
EBI-9871840
EBI-10096653
ENSG00000095596
MARCH6
EBI-3946155
EBI-5333185
BCAR3IRS1 PIK3CA
LRRK2
Irs1
PIK3R1
PIK3R1
Grb2
Phb
Grin1
H1f0
Rps3
Rpn2
Ipo9
Scml2
Cand1
Eif2ak2
Ipo8
P p fi b p 1
S d p r
Poldip3
Tenm3Rars
Ppp2r1a
Vim
Rfc3
Hsp90ab1
R q c d 1
Rplp0
Hnrnpu
Irs1
Snd1Hspa8
Ung
Tp53bp2
GRB2
TP53BP2
YWHAB
IGF1R
DDR1
SMAD3
EBI-1108795
Ctnnd1
Ran
Ywhag
Rab6b
Ybx1
Epha2
Grin2b
Flot2
Aff3
Ptpn1
PELI1
EIF2AK2
INSR
YWHAE
NCOR1
Pik3r1
N U M B
PRKCQ
TP53BP2
PIK3R3
RAP2A Mink1 CHEBI:39112
Ywhab
CHEBI:39079
CDK5
STMN1
PPARG USP11
CHEBI:45783
CHEBI:18319
HSP90AB1
MAP4K4 MIF4GD
CHEBI:64310
CHEBI:49840
TUBBCLU
CHD8
CCDC88A
SMARCA4
SMARCD3
PSMD1
MAP4K2 MINK1
Map3k1MAP3K1
PKM
MAP3K11
CHEBI:49375
GOLIM4
ARHGAP39
GOSR1
BCOR
PTRF
PCNPKIDINS220
TMEM216
PHB
ABCD3
1C
ATP6V0D1
FLOT2
RNF213
SMARCD2
LTN1
PIAS1
Sumo1
ATP6V0A1WRAP53
EVC2
ACTN1
GALNT2
dhbF
B4E2V5
TCTN3
GHR
EGFRATP6V1B2
CDH2
TMEM17
RMDN3
PHB2 PTPN1 Cdh2
TCTN2
PDGFRB
ATF2
CHEBI:17440
mviM1MET
MVP
MSN
MREG
COL5A1
FLOT1
ASS1
CHEBI:17283
uvrB
FOXM1
t y r B
UBE2I
glnA
Ctnnb1
PSMA3
ENSG00000138795
CTNNB1
RUNX3
PSEN1
JUP
DAXX
TCF7L2
hmwP2
hemL2
vgrG7
f a d B
FBLN1
XRCC6
XRCC5
PARP1
Q99IB8-PRO_0000045599
pagA
Psen1
ENSG00000168646
ENSG00000065361
p p s C
Q99IB8-PRO_0000045596
Q8CLD5
TGIF1
YPO2975
p g i
pyrE
ENSG00000110092
ENSG00000136997
Benefits of Network Analysis
- You can see the relationships among the group of biological entities - Find drag targets - Overrepresented functions AND their
connections
Gene List to Network to Biological Insight
- How? - You need to search, integrate, and visualize
multiple data sources This is what you will learn in this lecture
What is Cytoscape?
An Open Source Platform for Biological Network Data Integration, Analysis and Visualization
Cytoscape
Cytoscape 3.4.0 (Latest Release)
Cytoscape- Open Source (LGPL)
- Free for both commercial and academic use - Developed and maintained by universities,
companies, and research institutions - UC, San Diego - University of Toronto - UC, San Francisco - ISB - And collaborators world-wide
Cytoscape- De-facto standard software in biological network
research community - Large User and Developer Community
- Expandable by Apps- This is why Cytoscape is a Platform, not a
simple desktop application
C. Elegans Interactome from BioGRID Database
?
Biological Networks
- Tell us anything by themselves - Just a big hairball…
Module 1
Module 2
In other words…
Module 1
Need a tool to extract meaningful biological modules
Basic Use Case
Networks
Public Interaction Databases
List of Genes
Other Data
Network Data Analysis
Analysis
Graph Analysis
NetworkX
igraph
Cytoscape
Python
Pandas
NumPy
SciPy
Excel
Visualization
Desktop
Gephi
Cytoscape
matplotlib
Web
Cytoscape.js
sigma.js
d3
NDV3
d3.chart
Google Charts
Data Storage
Graph
Neo4j
GraphXDocument
MongoDB
Relational
MySQL
IPython
3rd Party Apps
NetworkAnalyzer
Network Data Analysis
Analysis
Graph Analysis
NetworkX
igraph
Cytoscape
Python
Pandas
NumPy
SciPy
Excel
Visualization
Desktop
Gephi
Cytoscape
matplotlib
Web
Cytoscape.js
sigma.js
d3
NDV3
d3.chart
Google Charts
Data Storage
Graph
Neo4j
GraphXDocument
MongoDB
Relational
MySQL
IPython
3rd Party Apps
NetworkAnalyzer
Network Data Analysis
Analysis
Graph Analysis
NetworkX
igraph
Cytoscape
Python
Pandas
NumPy
SciPy
Excel
Visualization
Desktop
Gephi
Cytoscape
matplotlib
Web
Cytoscape.js
sigma.js
d3
NDV3
d3.chart
Google Charts
Data Storage
Graph
Neo4j
GraphXDocument
MongoDB
Relational
MySQL
IPython
3rd Party Apps
NetworkAnalyzer
Three Basic Steps for Data Visualization with Cytoscape
<?xml version="1.0" encoding="UTF-8"?> <graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd"> <!-- Created by igraph --> <key id="degree" for="node" attr.name="degree" attr.type="double"/> <key id="betweenness" for="node" attr.name="betweenness"
attr.type="double"/> <graph id="G" edgedefault="directed"> <node id="n0"> <data key="degree">79</data> <data key="betweenness">0</data> </node> <node id="n1"> <data key="degree">9</data> <data key="betweenness">167</data> </node> <node id="n2"> <data key="degree">18</data> <data key="betweenness">75</data> </node> <node id="n3"> <data key="degree">8</data> <data key="betweenness">12</data> </node> <node id="n4"> <data key="degree">26</data> <data key="betweenness">210</data> </node> <node id="n5"> <data key="degree">29</data> <data key="betweenness">320</data> </node>
Data Integration
Analysis
Visualization
Drawing Biological Networks
VS
Drawing Tools
You need to specify color of each node, width of each edge, shape of nodes, etc.
There is one huge difference between Cytoscape and Illustrator…
In Cytoscape, Your Data Controls View
Creating Visualizations in Cytoscape
Name Type
BRCA1 gene
MAP2K1 gene
C05981 compound
• Mapping from Type to Node Shape • Mapping from Type to Node Color
C05981
BRCA1
MAP2K1
Creating mappings from data points to Visual Properties
Network Data
Annotated Networks
Attributes
Analyzed Data
Apps
Cytoscape Apps- Extension programs to
add new features to Cytoscape
- formerly called Plugins
- Large App developer/user community - This is why Cytoscape
is so successful in life science community!
Example Apps
ClueGO Creates and visualizes a functionally grouped network of terms/pathways
ReactomeFIPlugIn Explore Reactome pathways and search for diseases related pathways and network patterns using the Reactome functional interaction network
KEGGScape KEGG Pathway Importer for Cytoscape
clusterMaker2 Multi-algorithm clustering app for Cytoscape
cyREST (Now Part of The Core!) RESTful API for Cytoscape
cyREST RESTful API for Cytoscape
APPS.CYTOSCAPE.ORG
Overview of App Ecosystem
A travel guide to Cytoscape plugins Rintaro Saito, Michael E Smoot, Keiichiro Ono, Johannes Ruscheinski, Peng-Liang Wang, Samad Lotia, Alexander R Pico, Gary D Bader, Trey Ideker (2012) Nature Methods 9 (11) p. 1069-1076
Tips for Learning New Tools
Choose a Right Tool
Choose a Right ToolAnalysis VisualizationData Preparation
Data Visualization Tools
http://selection.datavisualization.ch/
Data Visualization Tools
http://selection.datavisualization.ch/
Data Visualization Tools
http://selection.datavisualization.ch/
Tools
• In some cases, you can get exact same result using different tools
• Example: Data preparation (cleansing / cleansing)
• But if you choose right tools, you can do it 100x faster than others.
• ex: Re-formatting complex data sets
• Excel vs Python Script
• Some recommendations:
• R/Bioconductor, Python/Pandas, Git/GitHub/Gist
Learning Tools = Saving Your Time
Hands-on:
Introduction to Network Visualization with Cytoscape
Data Visualization
- Goal: Help others to understand your data
- Emphasize what you want to tell
- Use color, shape,
size of objects effectively!
- Excellent resource for data visualization
- Tamara Munzner’s Web Site: http://www.cs.ubc.ca/~tmm/
Data Visualization
Today’s Goal
Story:
I want to show gene expression changes over time as a network diagram
YPL201C YPL211W YML007WYPL131W YOR327CYDR171W YCL067C
YCL032W YGL208WYER074WYBL050W YLR134WYPL149W
YDR050C YMR311CYGL134WYER102W YBR112CYKL101W YNL199C
YPL222W
YLR264W
YPL089C
YNL098C
YLL028W
YBR072W
YOR326W
YJR066W
YOR039W YNL135C YPR041WYDR174W YIL074C YKL028WYOR362C
YIL162W
YNL189W
YOR212W
YPR080W
YPR145W
YLL019C
YLR284CYPL031C YFR037CYML074C YPL240CYPR048W YBR274W YBR050C YML032C
YJR022WYBR248C YDR382W YER081WYIR009W YDR244W YOL016C
YER103W YGR058WYLR256WYAL003W YOR355WYIL061C YER111C YMR309C
YPL248CYOL127W YBR019CYLR362W YGL035CYPR167C YML123C
YBL026WYLL021W YNL091W YOR178C YIL113WYLR321C YML064C YMR117C YDL194WYNR007C
YOL058WYBR045CYER065CYNL167CYNL047C YGL097WYHR071W YDL078C YDL081CYDR354W
YER145C YGR136WYDR311W YPR119WYER112W YLR214WYCR012W
YER143W YBR043CYKL204W
YGR019WYEL041W YER133W
YOL149W YBR118WYAL038W YDR167WYMR058WYER079W YMR291W
YKL012W YDL113CYDR299W YDL075W YDL236W YGL229CYLR377C YNL145W
YNL236W YJL030W
YOL156W
YGL013C
YHR171W
YBL021CYMR021C
YHR174W
YFL038C
YER090WYPR062W YAR007C YNL307C YGL237CYML024WYDR335W YLR075W
YNL050CYGR046W YAL040CYLR191W YMR138WYIL045W YHR005C YNL301C
YKL211CYLR452C YPL075WYML051W YOL123W YGR088WYHR198C YMR300C
YJR060W YMR043WYPR124WYLR081W YLR319CYKL074C YOR036WYKL001C
YDR100W
YDR395W
YDR009W
YDR309C
YPR102C
YAL030W
YHR084W
YLR345W
YBR170C
YJL089WYFL026W YBR018C YGL115W YHR179WYDL215CYGR009C YOL120C
YFL017C YDR429C
YIL052C YGL073W
YGR108WYPR035W
YJL190CYOL086C YHR055CYBL005WYKR026C
YBR155W
YOR264W YKL109W
YOR167C
YDR070CYEL015W
YIL133C
YGL166WYHR030CYGL008C
YMR146C YBR160W
YOL136C
YOL051W
YBR020W YBR190WYDR323CYLR197W YFR014CYKL161C
YML054C YKR099WYLR340WYGL106W YBR093CYCL040W
YLR044C
YCR086WYDL130W
YJL203W
YEL009CYBR135W
YOR361C YGR085C
YER056CA
YNL216WYMR005W
YBR109C
YLR229C YER124C
YJL157C YDR461WYNL154CYLR117C YKR097W YIL069CYMR186W YJR109CYIL015W
YER040WYDR103W YGR074WYER052C YIL160CYOR290C YLR249W
YGL153WYOR215CYGR254W YLR432WYCR084CYOR089CYGR218W YOR303W
YGL161C YLR293CYDL030WYNL036W YHR135CYER179W YDR277CYDR184CYNL312W YML114C YFL039C YOL059WYER054C YER110CYLR109W YLR116WYNL214W YBL069W
YHR141CYER116CYJL219W YPL111WYDL023C YGL202WYER062C YMR183CYFR034CYGL122C
YIL105C YDL088CYPR010C YJR048W YIL070C YEL039CYDR412WYMR108W
YOR204W YMR255W YLR175W YHR115CYNL164C YJL013C YDL063C YNL117W
YIL143CYOR315W YDR146CYLR310CYGR014WYBR217W
YNR053C
YJL036W
YNL116W
YOR120W
YDL014W
YJL194W
YDL013W
YDR032C
YOR310C YPR113W
YLR153C
YGR048W
YGR203W
YNL113WYOR202W
YNR050C
YCL030C
YJL159W
YHR053CYPR110C?YLR258W YBL079W
YNL069C YNL311CYDR142C YGL044CYMR044W
What is Great Visualization…?
Design is complicated, because humans are complicated. Design is a process to avoid bad designs.
Mike Bostock (New York Times Visualization Team. Creator of D3.js)
It is hard to generalize the design process, but we can avoid pitfalls by following some basic rules.
Every pixel should carry information.
Edward Tufte
Avoid Data Overload
• Mapping too many attributes makes your visualization awful!
• It is hard to see the overall trend of your data sets if too many channels are used in a image
“Great Artists Steal…”
MSL5
TEM1
PRP40
MUD2
HAP4HAP2
CYC1
GCY1
HAP3
YHR198C
ECI1
YEL015W
GAL1
GAL7
GAL80
GAL3
GAL11
GAL4
GAL2
MLS1
SIP4
FBP1
GAL10
SWI5
SUC2
MIG1
ADH1PGK1
CDC19
GCR1
CBF1ENO1
ENO2
MCK1
CYC7
HAP1
CTT1
NCE103
SSL2
TFB1YNL091W
TRP4
ARG1
GCN4
SKO1
HIS3
ADE4 ILV2
TIF35
TIF5 NIP1
GNA1
PRE10
PRT1
YDR070C
GPD2
RPS17A
BAS1
HIS7
RPS24B
MSL1
HIS4
PDC5
PHO84
PHO4
YNL047C YIL105C
MET16
RPL11BRPS8B
RPL10
RPL11A
CKS1
RPL31A
PHO13
PDC1
SXM1RPL34B
RPL16B
ATC1
CAR1
FCY1
RFA2
ICL1SRP1
TPI1RPL18B
RPL25
PHO5
RPS24ARPL18A
DMC1 RAP1
RPL16A
HSP42
MSL5
TEM1
PRP40
MUD2
HAP4HAP2
CYC1
GCY1
HAP3
YHR198C
ECI1
YEL015W
GAL1
GAL7
GAL80
GAL3
GAL11
GAL4
GAL2
MLS1
SIP4
FBP1
GAL10
SWI5
SUC2
MIG1
ADH1PGK1
CDC19
GCR1
CBF1ENO1
ENO2
MCK1
CYC7
HAP1
CTT1
NCE103
SSL2
TFB1YNL091W
TRP4
ARG1
GCN4
SKO1
HIS3
ADE4 ILV2
TIF35
TIF5 NIP1
GNA1
PRE10
PRT1
YDR070C
GPD2
RPS17A
BAS1
HIS7
RPS24B
MSL1
HIS4
PDC5
PHO84
PHO4
YNL047C YIL105C
MET16
RPL11BRPS8B
RPL10
RPL11A
CKS1
RPL31A
PHO13
PDC1
SXM1RPL34B
RPL16B
ATC1
CAR1
FCY1
RFA2
ICL1SRP1
TPI1RPL18B
RPL25
PHO5
RPS24ARPL18A
DMC1 RAP1
RPL16A
HSP42
Map gene expression values to color
Avoid using more colors in other components (edge/label)
If necessary, map other data into non-overlapping visual properties
(edge score to width)
Part 1: Session File and Basic Navigation
Cytoscape 3.4 Desktop
Toolbar
Network Panel
Bird’s Eve View
Table Browser
Network Views
Local ColumnTable Tabs
List Data(Values in [ ])
Shared Column
Session File
- Snapshot of your workspace - Networks - Tables - Visual Styles - System Properties
Open a Session
- Click folder icon - Or, File → Open
Exercise 1: Loading a session
Navigation- Pan: Drag - Zoom
- IN: Mouse Wheel UP - OUT: Mouse Wheel DOWN
- Selection: Shift + Drag - Fit to Window
- Selected region - Entire network
First Neighbor of Nodes
CTR+6
Create New Sub-Network From Selection
CTR+N
- Grid View
- Detached View
Part 2: Data Import
Network Data Formats- SIF - GML - XGMML - GraphML - BioPAX - PSI-MI - SBML - KGML (KEGG) - Excel - Text Table - CSV - Tab
NCBI Gene ID 672
On Chromosome 17
GO Terms DNA Repair Cell Cycle
DNA Binding
Ensemble ID ENSG00000012048
BRCA1
Data Tables for Cytoscape- Example:
- Numeric- Gene expression profiles - Network statistics calculated in other
applications, such as R - Confidence scores for edges
- Text (or categorical)- GO annotation for genes - List of genes related to disease X - Targets for FDA approved drugs - Genes on KEGG Pathway Y - Clusters / group / community calculated
in external programs - …
Your Data Sets- Anything saved as a table can be
loaded into Cytoscape - Excel - Tab Delimited Document - CSV
- As long as proper mapping key is available, Cytoscape can map them to your networks.
Mapping Key in the Network
Mapping Key in the Table
Exercise 2: Loading network and tables
Part 3: Visualization
Layouts
Force-Directed + Edge Bundling
Stacked-Node Layout + Default Edge Bend
Circular + Edge Bend
Automatic Layout
- Choose proper algorithm - Tree-like data - Hierarchical Layout - Scale-Free Network - Force-directed - Circular process - Circular Layout
- Tweak parameters if necessary
Manual Layout
- Tweak result from automatic layout - Scale - Align - Rotate
Exercise 3: Apply layouts
Visual Style
- Collection of mappings from Attributes to Visual Properties
Visual Styles
- Defaults + Mappings - Expression values to node color - Gene function to node shape - Interaction detection method to edge line
type - Confidence score to edge width
Core Idea: Data Controls The View
Data Controls The View• Photoshop / Illustrator
• You control the pixels and objects on the display
• Data Visualization Tools (including Cytoscape)
• Data points are mapped to visual properties
• Color
• Size
Data Controls The View
Expression Values To Node Colors
Discrete Mapping Editor
Continuous Mapping Editor
Exercise 4: Create New Visual Style
Preview of Lecture 2: Reproducible Workflow with Jupyter Notebook and cyREST
Cline, Melissa S., et al. "Integration of biological networks and gene expression data using Cytoscape." Nature protocols 2.10 (2007): 2366-2382.
Cline, Melissa S., et al. "Integration of biological networks and gene expression data using Cytoscape." Nature protocols 2.10 (2007): 2366-2382.
Results
Sharing Results
😐
Sharing Results and Process
😃
Point & Click Operation is Easy, but not Reproducible…
Goal: Reproducible Science
Goal: Reproducible Science
REST
Tools You Need
REST
- GitHub - For source code sharing
- IPython (Jupyter) Notebook - Your electronic lab notebook
- cyREST - RESTful API module for Cytoscape
Further Readings
Further Readings
• My presentation slides
• http://www.slideshare.net/keiono
• This deck of will be uploaded today
Further Readings 1- Introduction to Network Biology
- Deciphering Protein–Protein Interactions. Part I. Experimental Techniques and Databases Shoemaker BA, Panchenko AR (2007) Deciphering Protein–Protein Interactions. Part I. Experimental Techniques and Databases. PLoS Comput Biol 3(3): e42.doi:10.1371/journal.pcbi.0030042
- Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners Shoemaker BA, Panchenko AR (2007) Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners. PLoS Comput Biol 3(4): e43. doi:10.1371/journal.pcbi.0030043
Further Readings 2- Overview of Cytoscape Apps (Plugins)
- A travel guide to Cytoscape pluginsRintaro Saito, Michael E Smoot, Keiichiro Ono, Johannes Ruscheinski, Peng-Liang Wang, Samad Lotia, Alexander R Pico, Gary D Bader, Trey Ideker (2012) Nature Methods 9 (11) p. 1069-1076
- Sample Protocol (based on 2.x)
− Integration of biological networks and gene expression data using CytoscapeCline, et al. Nature Protocols, 2, 2366-2382 (2007).
Further Readings 3
- Cytoscape Tutorial Booklet: Analysis and Visualization of Biological Networks with Cytoscape
- http://www.rbvi.ucsf.edu/Outreach/Workshops/ISMBTutorial.pdf
2016 Keiichiro Ono [email protected]