computational biology networks and pathways lecture slides week 11
TRANSCRIPT
Computational Biology
Networks and Pathways
Lecture Slides Week 11
Data is Interconnected
What is a Graph
Complexity
A network is a collection of interactionsA network is a collection of interactions
Pathways are a subset of networksPathways are a subset of networks
All pathways are networks of interactionsAll pathways are networks of interactions
not all networks are pathwaysnot all networks are pathways
Young et. al: Transcriptional Regulatory Networks in Saccharomyces cerevisiae; Science 2002
A network is a collection of interactionsA network is a collection of interactions
Pathways are a subset of networksPathways are a subset of networksAll pathways are networks of interactions, however not All pathways are networks of interactions, however not
all networks are pathways!all networks are pathways!
Pathway is a biological network that corresponds to Pathway is a biological network that corresponds to a specific physiological process or phenotypea specific physiological process or phenotype
Biological pathways
Biological components interacting with each other Biological components interacting with each other over time to bring about a single biological effectover time to bring about a single biological effect
Pathways can be broken down sub-pathways Pathways can be broken down sub-pathways
Some common pathways: Some common pathways: signal transductionsignal transductionmetabolic pathways, gene regulatory pathwaysmetabolic pathways, gene regulatory pathways
Entities in one pathway can be found in othersEntities in one pathway can be found in others
3 types of interactions that can be mapped into pathways 3 types of interactions that can be mapped into pathways
protein (enzyme) – metabolite (ligand)protein (enzyme) – metabolite (ligand) metabolic pathwaysmetabolic pathways
protein – proteinprotein – proteincell signaling pathways, protein complexescell signaling pathways, protein complexes
protein – geneprotein – genegenetic networksgenetic networks
KEGG KEGG http://www.genome.jp/kegg/http://www.genome.jp/kegg/BioCyc BioCyc http://www.biocyc.org/http://www.biocyc.org/Reactome http://www.reactome.org/Reactome http://www.reactome.org/GenMAPP http://www.genmapp.org/GenMAPP http://www.genmapp.org/BioCarta http://www.biocarta.com/BioCarta http://www.biocarta.com/TransPATH http://www.biobase-TransPATH http://www.biobase-
international.com/pages/index.php?international.com/pages/index.php?id=transpathdatabasesid=transpathdatabases
Pathguide Pathguide – the pathway resource list – the pathway resource list http://www.pathguide.org/http://www.pathguide.org/
Available resources
Network Topology (PPI)
Network analysis and visualization tools
Databases for analysis
Text mining algorithms (e.g., natural language processing (NLP)) technologies
Expert human curation
Ingenuity Pathway Analysishttp://www.ingenuity.com/products/pathways_analysis.html
PathwayStudiohttp://www.ariadnegenomics.com/products/pathway-studio/
PathwayArchitect http://www.selectscience.net
Cytoscapehttp://www.cytoscape.org/
Biological Networkshttp://biologicalnetworks.net/
GeneGOhttp://www.genego.com/
Nanduri etal (unpublished)
GO term enrichment
Nanduri etal (unpublished)
Nanduri etal (unpublished)
Nanduri etal (unpublished)
Nanduri etal (unpublished)
End Theory I
5 min mindmapping
10 min break
Practice I
Cytoscape
Download and install cytoscape
Add the reactome app
Initialize the reactome app
Inspect some metabolic pathways
End Practice I
15 min break
Theory II
Pathways vs. networksGene networks
• Clusters of genes (or gene products) with evidence of co-expression
• Connections usually represent degrees of co-expression• In-depth knowledge of process is not necessary• Networks are non-predictive
Biochemical pathways• Series of chained, chemical reactions• Connections represent describable (and quantifiable) relations
between molecules, proteins, lipids, etc.• Enzymatic process is elucidated• Changes via perturbation are predictable downstream
Pathways vs. networks
Gene networks Biochemical pathways
Curation Relatively easy: automated and manual
Difficult: mostly manual
Nodes Genes or gene products Any general molecule
Edges Levels of co-expression/influence or a qualitative relation
Representation of possibly quantifiable mechanisms between compounds
Fidelity Low – usually very little detail
High – specific processes
Predictive power Relatively low Relatively high
Pathway and network granularity
Level of detail
Eff
ort
to
cu
rate
General interaction
networks
Mathem
atical
simulation m
odels
Probabilistic
networks
Qualitative
networks
Curated reaction
pathways
Introduction to pathways and networks
Examples of pathways and networks
Review of pathway databases and tools
Representing pathways and networks
Methods of inferring pathways and networks
Pathway and cellular simulations
Yeast gene interaction network
Tong, et al., Science 303, 808 (2004)
Characteristics of the yeast gene network
Some genes (e.g. regulatory factors) act as ‘hubs’ in a network and have many interactionsDegrees of connectivity follows the power lawHubs may make interesting anti-cancer targets
Clusters of genes with known function suggest function for hypothetical genes in same cluster
Network characteristics can be used to predict protein-protein interactions
Path between two genes tends to be short (average ~3.3 hops)
Tong, et al., Science 303, 808 (2004)
E. coli metabolic pathway
Karp, et al., Science 293, 2040 (2001)
glycolysis
Pathways: E. coli metabolic map
Encompasses >791 chemical compounds in >744 noted biochemical reactions
Pathway was compiled via literature information extraction and extensive manual curationSystem allows for users to indicate evidence of pathway
annotations
Curation is done collaboratively with numerous experts outside of EcoCyc
Karp, et al., Science 293, 2040 (2001)
Pathways in bioinformatics
Most resources for pathways focus on metabolic pathways (signaling and regulatory gaining prominence)
Pathways as a very specific subtype of networksLike networks, can be made in computable (symbolic)
form
Specificities in chemical reactions are more predictive
Pathways can chain together, forming larger pathways
Karp, et al., Science 293, 2040 (2001)
Pathway repositories
BioCyc/MetaCyc
Kyoto Encyclopedia of Genes and Genomes (KEGG) PATHWAY DB
BioCarta
BioModels database
BioCyc database http://www.biocyc.org
Pathway/genome database (PGDB) for organisms with completely sequenced genomes
409 full genomes and pathways deposited
Species-specific pathways are inferred form MetaCyc
Query/navigation/pathway creation support through the Pathway Tools software suite
http://www.biocyc.org
MetaCyc database http://www.metacyc.org
Non-redundant reference database for metabolic pathways, reactions, enzymes and compounds
Curation through experimental verification and manual literature review
>1200 pathways from 1600+ species (mostly plants and microorganisms)
http://www.metacyc.org
http://www.metacyc.org
Glycolysis pathway in MetaCyc
KEGG PATHWAY database http://www.kegg.com
Consolidated set of databases that cover genomics (GENE), chemical compounds (LIGAND) and reaction networks (PATHWAY)
Broad focus on metabolics, signal transduction, disease, etc.
Species-specific views available (but networks are static across all organisms)
http://www.kegg.com
http://www.kegg.com
Glycolysis pathway in KEGG
Global Pathway Map
BioCarta database http://www.biocarta.com
Corporate-owned, publicly-curated pathway database
Series of interactive, “cartoon” pathway maps
Predominantly human and mouse pathways
Contains 120,000 gene entries and 355 pathways
http://www.biocarta.com
http://www.biocarta.com
Glycolysis pathway in BioCarta
BioModels database http://www.biomodels.net
Database for published, quantitative models of biochemical processes
All models/pathways curated manually, compliant with MIRIAM
Models can be output in SBML format for quantitative modeling
86 curated models, 40 models pending curation
http://www.biomodels.net
http://www.biomodels.net
Glycolysis pathways in BioModels
Comparison of pathway databases
MetaCyc/
BioCyc
KEGG PATHWAYS
BioCarta BioModels
Curation Manual and automated
Automated Manual Manual
Size ~621+ pathways ~289 reference pathways
~355 pathways ~126 models
Nomenclature EC, GO EC, KO None GO
Organism coverage
~500 species Various Primarily human and mouse
~475 species
Visuals Species-specific custom
Reference and species-specific
Animated, cartoonish
Non-standardized
Primary usage PGDB, computational biology
PGDB, pathway comparisons
Human pathways, disease
Simulations, modeling
Introduction to pathways and networks
Examples of pathways and networks
Review of pathway databases and tools
Representing pathways and networks
Methods of inferring pathways and networks
Pathway and cellular simulations
Inferring pathways and networks
Experimental methodsMicroarray co-expressionQuantitative trait locus mapping (QTL)Isotope-coded affinity tagging (ICAT)Yeast two-hybrid assayGreen florescent protein tagging (GFP tagging)
Computational methodsDatabase-driven protein-protein interactionsExpression clustering techniquesLiterature-mining for specified interactions
Introduction to pathways and networks
Examples of pathways and networks
Review of pathway databases and tools
Representing pathways and networks
Methods of inferring pathways and networks
Pathway and cellular simulations
Cellular simulations
Study the effect perturbation has on a pathway (and thus the organism)
Generally require extensive detail on the pathway or reactions of interest (flux equations, metabolite concentration, etc.)
Cellular pathway simulations must manage both temporal and spatial complexity
Spatial dimension
Adapted from Kelly, H., http://www.fas.org/resource/05242004121456.pdf , via Neal, Yngve 2006 VHS, UW MEBI 591
Tem
po
ral
inte
rval
s
0.1 nm 10nm 1um 1mm 1cm 1m
pico
sec.
n
anos
ec.
m
icro
sec.
m
illis
ec.
sec
. m
in.
yr.
quantumm
echanics
molecular dynam
ics
cellular processes
systems physiology
organs and organisms
Simulation methods and techniques
Biological process Phenomena Computation scheme
Metabolism Enzymatic reaction Differential-algebraic equations, flux-based analysis
Signal transduction Binding Differential-algebraic equations, stochastic algorithms, diffusion-reaction
Gene expression Binding
Polymerization Degradation
Object-oriented modeling, differential-algebraic equations, stochastic algorithms, boolean networks
DNA replication BindingPolymerization
Object-oriented modeling, differential-algebraic equations
Membrane transport Osmotic pressureMembrane potential
Differential-algebraic equations, electrophysiology
Adapted from Tomita 2001
Research in simulation and modelingVirtual Cell (National Resource for Cell Analysis and
Modeling)
MCell (the Salk Institute)
Gepasi (Virginia Tech)
E-CELL (Institute for Advanced Biosciences, Keio University)
Karyote/CellX (Indiana University)
End Theory II
5 min mindmapping
10 min break
Term Project
Max 3000 words
Focus on results and their discussion
Make sure to incorporate all the little hints we gave
Incorporate runtime for the new dataset as another performance measure
Practice
Perform the steps as described here:
http://wiki.cytoscape.org/GettingStarted