an update on ongoing projects within biorange sp3.2.2.1
DESCRIPTION
An update on ongoing projects within Biorange SP3.2.2.1. Biorange Project Meeting Leiden, September 15 Tim Hulsen. User. Knowledge integration. ArrayExpress db. CoPub. Xref db. Biorange SP3.2.2. Gene annotation through applications: PhyloPat, BioVenn, OrthoPath, CoPub. Overview. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/1.jpg)
An update on ongoing projects within Biorange SP3.2.2.1
Biorange Project Meeting
Leiden, September 15
Tim Hulsen
![Page 2: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/2.jpg)
Biorange SP3.2.2
CoPub
Knowledge integration
ArrayExpress db
User
Xref db
Gene annotation throughapplications: PhyloPat,BioVenn, OrthoPath, CoPub
![Page 3: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/3.jpg)
Overview
• PhyloPat• Published in BMC Bioinformatics (2006)• Update submitted to Nucleic Acids Res. Database issue
• BioVenn• Revised version submitted to BMC Genomics
• Orthologous networks & OrthoPath• Manuscript in preparation
• CoPub (Taverna workflows)• Published in Nucleic Acids Res. Web Server issue (2008)
![Page 4: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/4.jpg)
PhyloPat - Introduction
• Phylogenetic patterns show the presence or absence of certain genes in a set of full genomes derived from different species
• PhyloPat allows the complete Ensembl gene database to be queried using phylogenetic patterns
• Published in september 2006, now new version with:• Ensembl v50
• Support of HGNC and EntrezGene IDs
• FASTA-format sequences of the members of a phylogenetic lineage
• Gene neighborhood view
• http://www.cmbi.ru.nl/phylopat
![Page 5: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/5.jpg)
PhyloPat: Update to Ensembl v50
• 39 species, under which model organisms such as C. elegans, D. melanogaster, D. rerio, G. gallus, M. musculus, R. norvegicus, C. familiaris, M. mulatta, and human
• In total 814,936 genes
• In total 244,114 orthologous groups, created by clustering the orthologous gene pairs predicted by Ensembl
![Page 6: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/6.jpg)
PhyloPat: Support of HGNC and EntrezGene IDs
• HGNC-Ensembl mapping for 29 species
• EntrezGene-Ensembl mapping for 18 species
Choose form four types of IDs
![Page 7: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/7.jpg)
PhyloPat: FASTA-format sequences
“L”: Longest peptide sequences from this orthologous group (only the longest peptide per gene)
“A”: All peptide sequences from this orthologous group (all peptides per gene)
![Page 8: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/8.jpg)
PhyloPat: Gene neighborhood view
• The ‘Gene neighborhood view’ shows all genes from all species in a certain phylogenetic lineage, and all genes in their proximity on the genome (10 genes to both sides)
• Neighbouring genes are color-coded according to the orthologous groups they belong to
• Gene neighborhood gives information about functional relationships (genes involved in similar processes are often clustered together)
• Can be used to find the ‘true’ ortholog from a set of genes, by using not only phylogenetic information but also genomic context
![Page 9: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/9.jpg)
PhyloPat: Gene neighborhood view
Each cell: - Ensembl Gene ID- PhyloPat ID- HGNC Symbol
ERN1 and ERN2 can bedistinguished by lookingAt gene context
![Page 10: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/10.jpg)
Overview
• PhyloPat• Published in BMC Bioinformatics (2006)• Update submitted to Nucleic Acids Res. Database issue
• BioVenn• Revised version submitted to BMC Genomics
• Orthologous networks & OrthoPath• Manuscript in preparation
• CoPub (Taverna workflows)• Published in Nucleic Acids Res. Web Server issue (2008)
![Page 11: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/11.jpg)
BioVenn
• Web application to see the overlap between different lists of biological identifiers, using area-proportional Venn diagrams
• Support of wide range of IDs, which are recognized and linked to the corresponding database: Affymetrix, COG, Ensembl, EntrezGene, Gene Ontology, InterPro, IPI, KEGG Pathway, KOG, PhyloPat and RefSeq
• Optional mapping of Affymetrix and EntrezGene to Ensembl
• Output in SVG (with drag-and-drop functionality) or PNG
• http://www.cmbi.ru.nl/biovenn/
![Page 12: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/12.jpg)
BioVenn
Embedded / standalone, SVG / PNG ID mapping
Absolute numbers / percentages
![Page 13: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/13.jpg)
BioVenn
• Lists for all 13 sets (X total, X only, XY total overlap, XY only overlap, XYZ overlap, etc.)
• If type of ID (e.g. Affymetrix, Ensembl) is recognized, output is linked to the corresponding database
![Page 14: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/14.jpg)
Overview
• PhyloPat• Published in BMC Bioinformatics (2006)• Update submitted to Nucleic Acids Res. Database issue
• BioVenn• Revised version submitted to BMC Genomics
• Orthologous networks & OrthoPath• Manuscript in preparation
• CoPub (Taverna workflows)• Published in Nucleic Acids Res. Web Server issue (2008)
![Page 15: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/15.jpg)
Assessing orthologous biology in groups of genes: Application to GC induced insulin resistance
Biorange meeting 2008-03-11:
Goal: Gain better insight into the conservation of genesinvolved in glucocorticoid induced insulin resistance(GC induced IR) between human, mouse and rat.
Use CoPub to build literature networks, map orthology
Validation needed
![Page 16: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/16.jpg)
Network
Cytochrome P450s
Lipid transport
Adipocyte differentiation
Jak/Stat/IL6Insulin signaling
Fatty acid oxidation/catabolism
Misc: amino acid metabolism, MAPK signaling, osteoblast
Dexamethosone& insulin
![Page 17: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/17.jpg)
Validation approach
Get all genes from a KEGG pathway
Select random 10% of these genes
Create Gene Network using these genes (CoPub)
Compare with original KEGG pathway
Repeat with varying thresholds
![Page 18: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/18.jpg)
Results
Pathway ID # genes in pathway
FP_rate TP_rate Pos.Pred.Val Percentage Manual Pos
Hematopoietic cell lineage hsa04640 88 0.04 0.69 0.11 0.53
Jak-STAT signaling pathway hsa04630 153 0.02 0.54 0.20 0.33
Cytokine-cytokine receptor interaction hsa04060 256 0.03 0.51 0.24 0.53
Toll-like receptor signaling pathway hsa04620 90 0.02 0.50 0.17 0.40
Metabolism of xenobiotics by cytochrome P450 hsa00980 70 0.00 0.50 0.43 0.32
Melanoma hsa05218 71 0.02 0.49 0.11 0.20
Renal cell carcinoma hsa05211 69 0.04 0.49 0.06 0.40
VEGF signaling pathway hsa04370 70 0.04 0.49 0.06 0.20
GnRH signaling pathway hsa04912 97 0.03 0.48 0.09 0.13
Endometrial cancer hsa05213 52 0.02 0.48 0.07 0.47
Average TP = 0.24Average FP = 0.01
![Page 19: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/19.jpg)
Application to all human genes
Create network for each gene with R scaled =30, literature count = 5
Calculate average conservation for each network based on conservation for all the
genes in the network in 4 species (P.tro.,M.mus.,R.nor.,C.fam.)
Get all genes in 100 least conserved
networks
Get all genes in 100 most conserved
networks
Calculate GO enrichment
Calculate GO enrichment
Compare
211 genes 309 genes
6,181 networks with size>2
Term PValue GO:0051704~multi-organism process 1.23E-07 GO:0006952~defense response 1.48E-07 GO:0009615~response to virus 4.47E-06 GO:0051707~response to other organism 2.65E-05 GO:0007586~digestion 7.07E-05 GO:0050896~response to stimulus 1.04E-04 GO:0009607~response to biotic stimulus 4.16E-04 GO:0006955~immune response 6.66E-04 GO:0030101~natural killer cell activation 0.003385226 GO:0007565~female pregnancy 0.003668999
Term PValue GO:0048856~anatomical structure development 5.80E-05 GO:0007399~nervous system development 7.35E-05 GO:0005977~glycogen metabolic process 9.18E-05 GO:0006073~glucan metabolic process 1.02E-04 GO:0048731~system development 1.12E-04 GO:0032502~developmental process 1.16E-04 GO:0007275~multicellular organismal development 2.67E-04 GO:0044262~cellular carbohydrate metabolic process 2.90E-04 GO:0032501~multicellular organismal process 4.49E-04 GO:0006813~potassium ion transport 5.58E-04 GO:0048513~organ development 6.85E-04 GO:0044264~cellular polysaccharide metabolic process 7.09E-04 GO:0005976~polysaccharide metabolic process 7.94E-04
Non-conserved
Conserved
![Page 20: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/20.jpg)
OrthoPath
• OrthoPath is a gene centric search tool for literature networks and their orthologs
• Three input methods:• Single gene search: Get the literature network for a given gene.
OrthoPath will create a network of genes that are connected to this single gene.
• Keyword Search:Get the literature network based on a certain keyword. OrthoPath looks for genes that are connected to the keyword, and creates a network from all these genes.
• Multi Gene Search:Get the literature network for a set of genes. OrthoPath creates a network from only these genes that are entered by the user.
• http://ws2.grid.sara.nl/cgi-bin/orthopath/op.pl
![Page 21: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/21.jpg)
OrthoPathSearch with a single gene
Search with a keyword
Search with a list of genes
Output in HTML, SVG, Cytoscape or Ingenuity format
Set the minimum strength of aco-citation between two keywords
Set the minimum number of abstractsin which a co-citation between the 2genes is found
![Page 22: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/22.jpg)
OrthoPath
Each node in the network:
- EntrezGene information: ID, symbol, description
- number of neighbours
- number of orthologs from human (for all five species)
![Page 23: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/23.jpg)
Overview
• PhyloPat• Published in BMC Bioinformatics (2006)• Update submitted to Nucleic Acids Res. Database issue
• BioVenn• Revised version submitted to BMC Genomics
• Orthologous networks & OrthoPath• Manuscript in preparation
• CoPub (Taverna workflows)• Published in Nucleic Acids Res. Web Server issue (2008)
![Page 24: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/24.jpg)
CoPub Taverna Workflows
• Taverna: free software tool for designing and executing workflows
• Workflow files for CoPub have been
developed:
(1) Search gene
(2) Get literatureneighbours
![Page 25: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/25.jpg)
CoPub Taverna Workflows
(3) Get a list of categories
(4) Get the completenetwork
![Page 26: An update on ongoing projects within Biorange SP3.2.2.1](https://reader037.vdocuments.mx/reader037/viewer/2022110104/5681554b550346895dc31a51/html5/thumbnails/26.jpg)
Acknowledgements
• Wynand Alkema
• Wilco Fleuren
• Raoul Frijters
• Peter Groenen