the european nutrigenomics organisation gene ontology (go) analysis chris evelo and lars eijssen,...
TRANSCRIPT
NuNuGOGONuNuGOGO
the European Nutrigenomics Organisation
Gene Ontology (GO) analysisGene Ontology (GO) analysis
Chris Evelo and Lars Eijssen, Maastricht University
NuNuGOGONuNuGOGO
the European Nutrigenomics Organisation
Amigo browser http://www.godatabase.org/cgi-bin/go.cgiGO consortium: http://www.geneontology.org
The Gene Ontology (GO) project gives a consistent description of gene products based on information from different databases.
Gene Ontology (GO) levels (I)Gene Ontology (GO) levels (I)
Gene Product annotation with GO terms
Cellular Component nucleus chromosome DNA topoisomerase complex
Biological Process DNA replication DNA topological change DNA ligation DNA repair
Molecular Function chromatin binding DNA topoisomerase activity DNA-dependent ATPase activity
Human DNA topoisomerase IIA(P11388)
NuNuGOGONuNuGOGO
the European Nutrigenomics Organisation
Gene Ontology (GO) levels (II)Gene Ontology (GO) levels (II)
NuNuGOGONuNuGOGO
the European Nutrigenomics Organisation
All freely available
from the internet
Gene Ontology analysis toolsGene Ontology analysis tools
Onto-Express GOToolbox MAPPFinder (GenMAPP) GOstat GO-Elite GeneMerge GOSurfer David/EASE Fatigo
Metacore (not free, NuGO has licences)
NuNuGOGONuNuGOGO GO analysis versus pathway analysisGO analysis versus pathway analysis
Biological pathways contain more information, GO classes are just sets of genes that share an annotation
Pathways are generally more curated GO classes are however organised in a tree, biological
pathways are (in practice) not GO classes are also more uniformly covering the space
of biological processes, pathway analysis depends heavily on the pathways that have been contributed/added
GO also covers cellular localisation and biochemical function
NuNuGOGONuNuGOGO NuGO GenePattern modulesNuGO GenePattern modules
NuGO Quality Control Analysis– Quality control; Bioconductor packages affy, affyPLM, simpleaffy
NuGO Expression File Creator– Normalisation (rma, gcrma, Mas 5.0, dchip); Bioconductor packages
rma, gcrma
Limma Analysis– Differential Gene Expression; Bioconductor package limma
TopGO Analysis – Gene Ontology based functional analysis; Bioconductor package topgo
Get Result for GO– Functional analysis; Bioconductor R script, to filter data for genes
associated with one particular GO identifier
Slide from: Caroline Reiff, RRI, Aberdeen
If you use these modules for your publication, please cite:De Groot, P.J., Reiff, C., Mayer, C., Mueller, M. NuGO contributionsto GenePattern. Genes Nutrition 2008; 3:143-156
NuNuGOGONuNuGOGO
Slide from: Caroline Reiff, RRI, Aberdeen
TopGO analysisTopGO analysis
Runs topGO (bioconductor)– Gene enrichment analysis tool, which integrates the knowledge about
the relationship between GO terms (BP, MF, CC) for the calculation of statistical significance (Alexa et al., 2006).
2 test statistics – Fisher`s exact test (define threshold i.e. FDR<0.05)– Kolmogorov Smirnov (KS) test (looks at distribution of P values)
3 GO scoring algorithms (classic, elim, weight)– classic scores each node independent– elim scores nodes bottom up, scores parent nodes after
elimination of genes present in significant child node– weight scores nodes bottom up, assigns weights to genes
based on P values obtained for each node
NuNuGOGONuNuGOGO Scoring the tree (I)Scoring the tree (I)
Classic:
2/20
2/20 (20/100)
5/10 (7/30)
7/10
3/25 (11/50)
1/15
This node
This node plus subtree these values are used to score!(because the genes belong in factto that term as well)
Suppose all the bold values are significant The classis algorithm would return all these processes!
NuNuGOGONuNuGOGO Scoring the tree (II)Scoring the tree (II)
However, it would be better to only return the best term in every branch– Best could mean: the most specific significant one– This can be achieved by removing genes that are
present in significant child leaves, from the parent’s score
Elim does this:
2/20
2/20 (20/100)
5/10 (7/30)
7/10
3/25 (11/50)
1/15
(4/40)
NuNuGOGONuNuGOGO Scoring the tree (III)Scoring the tree (III)
Another option to score branches would be to compute the significance of each leave just as the classis algorithm
Hereafter, for every branch the most significant leave is the one that is reported back
NuNuGOGONuNuGOGO
NuNuGOGONuNuGOGO GO_EliteGO_Elite
Compatible with GenMAPP Mappfinder Smart algorithm Done in Python
– Fast– Runs on Windows and Linux (incl NBX)
Still under development Collaborative development
NuNuGOGONuNuGOGO Go_EliteGo_Elite
Searches relationships in a hierarchical nature
Identifies most significant scoring GO term:
with higher score than all sibling terms
For sibling terms, if one sibling branch scores higher than the parent and another branch does not, the highest scoring term from the latter sibling branch is also selected for the GO-Elite output, but the parent term is not
NuNuGOGONuNuGOGOTopGO Analysis (GenePattern)TopGO Analysis (GenePattern)implements bioconductor package topgoimplements bioconductor package topgo
Input: Limma results table (renamed to contain characters only) or table obtained from other analysis containing the following 3 columns:
topGO Analysis tests performed in GenePattern:
GenePattern topgo topgo statistics topgo algorithmclassic Fisher Fisher`s exact test classicclassic KS KS test classicelim Fisher Fisher`s exact test elimelim KS KS test elimWeight Fisher Fisher`s exact test weight
ID Gene Symbol Unigene Gene Description FC logFC AveExpr t P.Value adj.P.Val B1415670_at Copg Mm.258785coatomer protein complex, subunit gamma-1.26127 -0.33488 9.754401 -3.07062 0.007335 0.075293 -2.609791415671_at Atp6v0d1 Mm.17708 ATPase, H+ transporting, lysosomal V0 subunit D11.125095 0.170046 10.58127 2.210002 0.042067 0.155286 -4.219641415672_at Golga7 Mm.196269, Mm.327543golgi autoantigen, golgin subfamily a, 7-1.04305 -0.06081 10.53922 -0.75083 0.463684 0.587357 -6.129351415673_at Psph Mm.271784phosphoserine phosphatase-1.08924 -0.12332 7.884192 -0.67409 0.509903 0.628664 -6.18414
Slide from: Caroline Reiff, RRI, Aberdeen
NuNuGOGONuNuGOGO
Load limma table
Enter threshold(P value or FDR)
Enter cdf name within quotes and with .db extension
Slide from: Caroline Reiff, RRI, Aberdeen
NuNuGOGONuNuGOGOExample results table for elim Fisher test
(top 15 GO biological processes)
GO.ID Term Annotated Significant Expected elim
GO:0006955 immune response 410 158 88.47 2.80E-12
GO:0006118 electron transport 420 134 90.63 3.80E-07
GO:0006631 fatty acid metabolic process 161 66 34.74 1.30E-06
GO:0006260 DNA replication 155 64 33.45 4.30E-06
GO:0019886 antigen processing and presentation of e... 14 11 3.02 8.80E-06
GO:0007067 mitosis 181 64 39.06 1.30E-05
GO:0042552 myelination 27 15 5.83 0.00012
GO:0051301 cell division 263 81 56.75 0.00027
GO:0016064 immunoglobulin mediated immune response 69 28 14.89 0.00027
GO:0019217 regulation of fatty acid metabolic proce... 11 8 2.37 0.00041
GO:0008380 RNA splicing 205 65 44.23 0.00044
GO:0000074 regulation of progression through cell c... 395 113 85.23 0.00051
GO:0045576 mast cell activation 14 9 3.02 0.00069
GO:0006695 cholesterol biosynthetic process 25 13 5.39 0.00079
GO:0006364 rRNA processing 50 21 10.79 0.00091
Slide from: Caroline Reiff, RRI, Aberdeen
TopGO analysis outputTopGO analysis output
NuNuGOGONuNuGOGOA GO Graph for each of the 5 tests
(squares= 15 most significant GO Ids)
Slide from: Caroline Reiff, RRI, Aberdeen
NuNuGOGONuNuGOGO
Load Limma table
Enter Chip name
Click run
Slide from: Caroline Reiff, RRI, Aberdeen
Get Results For GOGet Results For GO
NuNuGOGONuNuGOGO
GeneName GeneSymbol ID P.Value adj.P.Val logFCcarboxylesterase 3 Ces3 1449081_at 2.69E-06 0.0001273 -3.311896peroxisome proliferator activated receptor alphaPpara 1449051_at 7.44E-06 0.000256 -2.224587peroxisome proliferator activated receptor alphaPpara 1439675_at 2.70E-05 0.0006372 -2.178694-aminobutyrate aminotransferaseAbat 1433855_at 2.63E-05 0.000626 -2.161838glycerol-3-phosphate acyltransferase, mitochondrialGpam 1419499_at 1.02E-08 2.51E-06 -2.152756hydroxyprostaglandin dehydrogenase 15 (NAD)Hpgd 1419906_at 0.0004878 0.0054019 -1.745005enoyl-Coenzyme A, hydratase/3-hydroxyacyl Coenzyme A dehydrogenaseEhhadh 1448382_at 4.05E-06 0.0001673 -1.556166cDNA sequence BC018371BC018371 1425355_at 0.0036213 0.0252179 -1.499929acyl-CoA synthetase long-chain family member 3Acsl3 1428386_at 0.0003235 0.0040049 -1.483404acyl-CoA synthetase medium-chain family member 3Acsm3 1425559_a_at 0.0014181 0.0121822 -1.355912phytanoyl-CoA hydroxylasePhyh 1460194_at 5.42E-06 0.0002036 -1.341853lysophospholipase 3 Lypla3 1423704_at 7.31E-06 0.0002522 -1.335713solute carrier family 27 (fatty acid transporter), member 4Slc27a4 1424441_at 3.90E-06 0.000164 -1.260517acetyl-Coenzyme A acyltransferase 2 (mitochondrial 3-oxoacyl-Coenzyme A thiolase)Acaa2 1428145_at 9.34E-07 6.11E-05 -1.161749fatty acid desaturase 3 Fads3 1435910_at 0.0002635 0.0034303 -1.139258caveolin, caveolae protein 1Cav1 1449145_a_at 0.0004282 0.0049293 -1.124646solute carrier family 27 (fatty acid transporter), member 2Slc27a2 1416316_at 1.55E-05 0.0004362 -1.12027acetyl-Coenzyme A acyltransferase 1AAcaa1a 1416946_a_at 6.93E-07 5.03E-05 -1.102933acyl-Coenzyme A oxidase 1, palmitoylAcox1 1416409_at 9.29E-05 0.0015677 -1.074135crystallin, lambda 1 Cryl1 1430681_at 0.0004409 0.0050152 -1.042197
Highly significant FDR plus strong down-regulation
Slide from: Caroline Reiff, RRI, Aberdeen
Example: results FA metabolismExample: results FA metabolism
NuNuGOGONuNuGOGO
load GCTCLSCHIP file
Click run
Wait for the result
Slide from: Caroline Reiff, RRI, Aberdeen
GenePattern also has a GSEA moduleGenePattern also has a GSEA module
NuNuGOGONuNuGOGO
NAME GENE_SYMBOL GENE_TITLE SCORE1600029D21RIK null null -4.82235H2-DMA null null -4.2664PSMB8 PSMB8 proteasome (prosome, macropain) subunit, beta type, 8 (large multifunctional peptidase 7)-4.26033H2-AB1 null null -4.06198B3GALT5 B3GALT5 UDP-Gal:betaGlcNAc beta 1,3-galactosyltransferase, polypeptide 5-4.048650610037M15RIK null null -4.03495IIGP2 null null -3.98382LY6A null null -3.97938PSMB9 PSMB9 proteasome (prosome, macropain) subunit, beta type, 9 (large multifunctional peptidase 2)-3.88622H2-EB1 null null -3.69099EIF4E3 EIF4E3 eukaryotic translation initiation factor 4E member 3-3.68941TLR1 TLR1 toll-like receptor 1 -3.64236CD74 CD74 CD74 molecule, major histocompatibility complex, class II invariant chain-3.61186FUT2 FUT2 fucosyltransferase 2 (secretor status included)-3.58954GDA GDA guanine deaminase -3.57173
Enrichment in phenotype: C (6 samples)64 / 162 gene sets are upregulated in phenotype C 1 gene sets are significant at FDR < 25% 1 gene sets are significantly enriched at nominal pvalue < 1% 5 gene sets are significantly enriched at nominal pvalue < 5% Snapshot of enrichment results Detailed enrichment results in html format Detailed enrichment results in excel format (tab delimited text) Guide to interpret results
Enrichment in phenotype: ILko (6 samples)98 / 162 gene sets are upregulated in phenotype ILko 0 gene sets are significantly enriched at FDR < 25% 3 gene sets are significantly enriched at nominal pvalue < 1% 13 gene sets are significantly enriched at nominal pvalue < 5% Snapshot of enrichment results Detailed enrichment results in html format Detailed enrichment results in excel format (tab delimited text) Guide to interpret results
Slide from: Caroline Reiff, RRI, Aberdeen
Example results of GSEAExample results of GSEA
NuNuGOGONuNuGOGO What to use for gene setsWhat to use for gene sets
You can use whatever you like…(meaning one GSEA is not the same as another even if it uses the same statistics)
Genesets from WikiPathways pathwaysPathVisio now also have GSEA
Metabolite sets (you could do MSEA…)
Could even use GO classes
NuNuGOGONuNuGOGO
Caroline ReiffPhilip De Groot
Sarah WielandKenneth Strouts
Claus MayerTony Travis
NuGO
Nathan SalmonisStan Gaj
Lars Eijssen
AcknowledgementsAcknowledgements