andrej bugrim genego, inc
DESCRIPTION
Andrej Bugrim GeneGo, Inc. Protein scoring based on significance in biological networks. Two problems of systems biology. How to reconstruct condition-specific networks in biologically robust way How to utilize reconstructed networks in day-to-day laboratory practice - PowerPoint PPT PresentationTRANSCRIPT
Copyright GeneGo 2000-2007
Andrej BugrimAndrej BugrimGeneGo, Inc.GeneGo, Inc.
Protein scoring based on significance in biological networks
Copyright GeneGo 2000-2007
Two problems of systems biology
• How to reconstruct condition-specific networks in biologically robust way
• How to utilize reconstructed networks in day-to-day laboratory practice
Still need to answer questions centered on individual genes/proteins:– Which genes are most important for a
condition/disease?– What are the best drug targets?– What are the most robust biomarkers?
Copyright GeneGo 2000-2007
Sources of the problems
• Biological networks are very interconnected due to presence of hubs. Hubs almost always provide “shortest path” connectivity
• Multiple paths can be generated to connect a pair of nodes - no way to discriminate between alternative hypothesis
• Resulting networks are often large and biologically intractable. It is hard to understand roles of individual nodes
Copyright GeneGo 2000-2007
Some earlier solutions
• Use “canonical pathways” as basis for reconstruction– Limited to known pathways
• Penalize hubs when reconstructing networks– Does not discriminate between individual hubs
Copyright GeneGo 2000-2007
Our solution
Find nodes that are significant in providing connectivity in condition-
specific dataset
Copyright GeneGo 2000-2007
Finding topologically significant nodes
A
B C
Topologically significantTopologically significant Not topologically significant
4 out 6 under nodes regulated by B are differentially expressed: more than random
share = significant
Only 1 out of 6 nodes regulated by C is differentially expressed: could be due to random event
= not significant
In reality algorithm also considers nodes beyond first-degree neighbors
Differentially expressed genes Non-differentially expressed genes
Copyright GeneGo 2000-2007
Why JAK1 is significant in this dataset?
Regulation via JAK1
JAK1 provides essential network conduit between PLAUR and many differentially expressed targets of STAT1
Topological significance helps to find important links in pathways that do not come up on HT screens
Feedback loops
Copyright GeneGo 2000-2007
Node scoring algorithm
1. Let K be a set of experimentally-derived nodes of interest (e.g. nodes representing differentially expressed genes). K is the subset of the global network of size N.
2. Calculate shortest path network S by building directed paths from each node in K to other nodes in K, wherever possible. S is a subset of N and may contain nodes in addition to K. Also some nodes from K may become part of S
3. Lets consider node i є (S) and one of the nodes of the experimental set j є K.
4. Calculate the shortest path networks between j and every other node in the global network (N-1 pairs) and count how many of them contain i. This number is Nij < N-1.
5. Calculate the shortest path networks between j and all other nodes in the experimental set and count how many of them contain node i This number Kij < K-1.
6. The probability that node i would be present Kij-times or more in the shortest path networks of i by chance follows a hyper-geometric distribution:
7. Repeat the procedure for all nodes (j) in the subset, calculating K p-values for node i (pij), each of these values showing relevance of node i to individual members of the set K. As we want to identify the nodes which are statistically significant to at least one or more members of the experimental set we define the p-value associated with node i as the minimum of the pij values.
Copyright GeneGo 2000-2007
Algorithm validation: PSORIASIS
• Psoriasis is recognized as the most common T cell-mediated inflammatory disease in humans.
• Genetic linkage to as many as six distinct disease loci has been established but the molecular etiology and genetics remain unknown.
• To begin to identify psoriasis disease-related genes and construct in vivo pathways of the implicated processes, genome-wide expression screens of psoriasis patients need to be undertaken
• The disease-related gene map may provide new insights into the pathogenesis of psoriasis
Copyright GeneGo 2000-2007
Copyright GeneGo 2000-2007
Data
• 4 samples from 4 psoriasis patients were taken at 2 different times
– At the time of developed psoriatic lesion (P)– And at the time of its complete healing (N)
– The samples were taken from the same exact spot on the same patient, which eliminates a great deal of experimental bias and uncertainty.
• Affymetrix Human U95A microarray technology was then utilized to evaluate the expression data
• Only the differentially expressed genes between the sample from the lesion (P) and the from the normal (N) were then used for comprehensive analysis with new algorithm and in MetaCore 4.0
Copyright GeneGo 2000-2007
Algorithm validation
• As “experimental set” we use 266 differentially expressed genes identified in the paper
•The shortest path network connecting these genes is built using global network of protein interactions from MetaCore™. Statistical significance of each node in this network is calculated as described above
•To evaluate whether the nodes deemed significant by our method are indeed likely to be disease-related we perform automated search of PubMed abstracts for co-occurrence of corresponding gene name and word “psoriasis” for every gene in the shortest path network. Different statistical measures are plotted as function of node’s p-value
•Functional analysis of high-scored genes is performed in MetaCore™
Copyright GeneGo 2000-2007
Fraction of genes related to “psoriasis” scales with significance
Copyright GeneGo 2000-2007
High-scoring nodes have higher fraction of psoriasis hits
Copyright GeneGo 2000-2007
Enrichment with psoriasis hits among differential genes
Copyright GeneGo 2000-2007
No correlation with node degree
Copyright GeneGo 2000-2007
Copyright GeneGo 2000-2007
Functional analysis: GeneGo processes
Copyright GeneGo 2000-2007
Map Map Folders Cell process p-Value Genes
IFN gamma signaling pathway
Cell signaling/Immune responseFunction groups/Cyto/chemokines
cytokine and chemokine mediated signaling pathway, immune response
1.88E-26 32 63
Prolactin receptor signaling Function groups/Growth factorsFunction groups/Hormones
intracellular receptor-mediated signaling pathway, response to hormone stimulus
4.57E-24 30 62
Regulation of G1/S transition (part 2)
Cell signaling/Cell cycle control cell cycle 3.30E-22 22 33
Chemokines and adhesion Cell signaling/Cell adhesionFunction groups/Cyto/chemokines
cytokine and chemokine mediated signaling pathway, cell adhesion
6.46E-22 45 174
EGF signaling pathway Cell signaling/Growth and differentiation/Epidermal cell differentiationFunction groups/Growth factors
intracellular receptor-mediated signaling pathway, response to extracellular stimulus
4.89E-21 28 64
PDGF signaling via STATs and NF-kB
Cell signaling/Growth and differentiation/Growth and differentiation (common pathways)Function groups/Growth factors
intracellular receptor-mediated signaling pathway, response to extracellular stimulus
5.18E-21 23 40
IGF-RI signaling Cell signaling/Growth and differentiation/Growth and differentiation (common pathways)Function groups/Growth factors
intracellular receptor-mediated signaling pathway, response to
extracellular stimulus
1.63E-20 29 72
AKT signaling Function groups/Kinases protein kinase cascade 5.96E-19 25 57
TGF, WNT and cytoskeletal remodeling
Cell signaling/Cell adhesion cell adhesion 6.15E-19 45 204
Copyright GeneGo 2000-2007
Functional analysis: IFN-gamma map
Copyright GeneGo 2000-2007
VEGF – key pathway identified!
Simonetti O, Lucarini G, Goteri G, Zizzi A, Biagini G, Lo Muzio L, Offidani A. VEGF is likely a key factor in the link between inflammation and angiogenesis in psoriasis: results of an immunohistochemical study. Int J Immunopathol Pharmacol. 2006 October-December;19(4):751-760
Copyright GeneGo 2000-2007
Glucocorticoid – another key pathway
Copyright GeneGo 2000-2007
Copyright GeneGo 2000-2007
Conclusions from algorithm validation
• High-scored nodes are significantly enriched in disease-related genes
• Important disease-related pathways are identified
• Important drug targets are highly scoed
Copyright GeneGo 2000-2007
Integration of genomic and proteomic sets
• LNCap prostate cell lines– Treated with Androgen– Untreated - control
• Data:– Proteomic data - ~ 70 proteins exclusively present in treated cells– Gene Expression profiling of Androgen-treated cells
• Analysis– Topological analysis of Androgen-specific protein network– Correlation between topologically significant nodes and gene expression– Functional analysis in MetaCore™– Network analysis in MetaCore™
Copyright GeneGo 2000-2007
Revealing regulation of LNCaP cells response to Androgen
by differentially expressed genes
by Androgen-specific proteins
by topologically significant node
Topologically significant nodes reveal regulation
Gene Expression and Proteomic
data reveal target pathways
Copyright GeneGo 2000-2007
Correlation between expression and significance
Among topologically significant genes the fraction of differentially expressed genes is high
P-value related to topological significance
P-va
lue
rela
ted
to d
iffer
enti
al e
xpre
ssio
n
Copyright GeneGo 2000-2007
Androgen receptor signaling
1- Differentially expressed gene
2 – Androgen-specific protein
3- Topologically significant node
Copyright GeneGo 2000-2007
Regulation of lipid Metabolism
Differentially expressed genes identified by microarray and confirmed by proteomic screen
Topologically significant nodes revealed by the new algorithm
Copyright GeneGo 2000-2007
Fatty acid metabolism: target pathway
Copyright GeneGo 2000-2007
Role of PBEF
Copyright GeneGo 2000-2007
Possible regulation of PBEF by AR
PBEF occurs in both, expression and proteomic datasets – possibly activated by androgen receptor via HIF1 or HNF4
Copyright GeneGo 2000-2007
Possible feedback from Insulin and IGF-1R back to AR
Copyright GeneGo 2000-2007
Conclusions
• Presented method allows assigning priority to nodes in biological networks built on condition-specific datasets
• The presented method is able to predominantly select genes with high relevance to condition of interest
• The presented method could be used for cross-validation of different datatypes, identification of novel drug targets and validation of existing targets
Copyright GeneGo 2000-2007
Putting it all together: network activity inference– Identifying causal relation between putative input and output signals– Tracking effects of molecular perturbation trough activation/inhibition
cascades
Z Z Z
Experimental data: start cascade
Experimental data: terminate cascade
Inferred activity
Experimental data
Predicted input
Predicted target
Scoring intermediary nodes
Copyright GeneGo 2000-2007
“Druggable” network modules
Copyright GeneGo 2000-2007
Acknowledgements
GeneGoZoltan DezsoYuri NikolskyTatiana Nikolskaya
University of MichiganAdaikkalam VellaichamySaravana M DhanasekaranArun SreekumarArul ChinnaiyanGilbert Omenn