gene ontology network enrichment analysis
DESCRIPTION
From the UC Davis Proteomics 2014 Summer Workshop www.proteomics.ucdavis.edu by Dmitry Grapov, Ph DTRANSCRIPT
Dmitry Grapov, PhD
Gene Ontology Network Enrichment Analysis
Download all material for the tutorial
https://sourceforge.net/projects/teachingdemos/files/2014%20UC%20Davis%20Proteomics%20Workshop/Summer%202014%20Proteomics%20Workshop.zip/download
https://sourceforge.net/projects/teachingdemos/files/
Choose 2014 UC Davis Proteomics Workshop or use the full URL below
• decrease• increase
Use functional analysis to identify if the changes in variables are enriched (increased compared to random chance) for some biological pathway, domain or ontological category.
Enrichment or Overrepresentation analysis
Biochemical Pathway Biochemical Ontology
Major Tasks
Using the proteins listed in the excel workbook: ‘proteomic data for analysis.xlsx’ and worksheet: ‘protein IDs’
1. Conduct Gene Ontology (GO) Enrichment Analysis using DAVID Bioinformatics Resourceshttp://david.abcc.ncifcrf.gov/home.jsp
2. Investigate enriched terms using Quick GO http://www.ebi.ac.uk/QuickGO/
3. Summaries and visualize the results using REVIGO http://revigo.irb.hr/
4. Create and modify GO network using Cytoscape http://www.cytoscape.org/
Protein IDsCommon protein identifier UniProt/SwissProt Accession (default in scaffold) http://www.uniprot.org/
Use Biomart to translate to other database IDS
http://www.biomart.org/
e.g. gene symbols
David Bioinformatics Resourceshttp://david.abcc.ncifcrf.gov/home.jsp
David Bioinformatics Resources
1. Upload list
2. Choose ID type
3. Select list type
4. Submit
David Bioinformatics Resourcesorganism Make sure all IDs were recognized
List of biochemical databases tested for enrichment
David Bioinformatics Resources
List of biochemical databases tested for enrichment
1. Choose GO
David Bioinformatics Resources
http://david.abcc.ncifcrf.gov/helps/functional_annotation.html#E3
David Bioinformatics Resources
List of biochemical databases tested for enrichment
1. Overview BP: Biological process
2. Select
David Bioinformatics Resources
http://david.abcc.ncifcrf.gov/helps/functional_annotation.html#E3
David Bioinformatics Resources1. Overview most enriched term
Quick GO http://www.ebi.ac.uk/QuickGO/1. View children (lower hierarchy subsets) of this term
David Bioinformatics Resources/Quick GO1. Can you identify any enriched children of this term in our DAVID output?
?
2. Download results
Overview and Format Results in Excel
1. Save results 2. Open in MS Excel
Overview Results
Modified Fisher’s Exact Test p-value
optionally: Check in Rx<-data.frame(user=c(1,47),genome=c(690,13528))
fisher.test(x) # p-value = 5.41e-06
(13/47) / (690/13528)
Alternative to Fisher Exact Test:
Hypergeometric Test
How to calculate statistics to determine enrichment?
hit.num = 51 # number of significantly changed pathway variables
set.num = 1455 # number of variables in pathway
full = 3358 # all possible variables in organism
q.size = 72 # number of significantly changed variables
phyper(hit.num-1, set.num, full-set.num, q.size, lower.tail=F)
enrichment p-value = 1.717553e-06
Visualization Options
Challenges: •Removal of redundant information•Visualizing term relationships (term-term, term-protein)
Use REVIGO to filter redundant terms
http://revigo.irb.hr/
prepare input (term, p-value)
1. Upload to
REVIGO
Supek F, Bošnjak M, Škunca N, Šmuc T. "REVIGO summarizes and visualizes long lists of Gene Ontology terms" PLoS ONE 2011. doi:10.1371/journal.pone.0021800
2. Run
REVIGO: overview scatterplot
Position defined on similarity (MDS)
REVIGO: overview table
Cluster leaders prioritized based on enrichment p-value
REVIGO: network
• Edges: 3% of the strongest GO term pairwise similarities
• Node size: generality of term (small = specific)
• Node color: p-value
Download network
Cytoscape
1. Open Cytoscape
Import REVIGO network into cytoscape
2
3 4
Cytoscape: set layout and defaults
1. Set layout 3. Set network defaults
2
4 5
Cytoscape: map data to network properties
1. Set Edge width and color 2. Set Node labels, size and color
Cytoscape: overview network components
Download edge information
1
2
3. View in excel
Download node information
1
2
3. View in excel
Bonus: Modify Edge and Node Attributes to show term to protein connections
See file ‘test edge.xlsx’ and ‘test node.xslx, for examples of upload formats
See detailed instructions at http://www.slideshare.net/dgrapov/demonstration-of-network-mapping
See more Statistical and Multivariate Analysis Examples athttp://imdevsoftware.wordpress.com/tutorials/
Questions?
This research was supported in part by NIH 1 U24 DK097154