the european nutrigenomics organisation gene ontology (go) analysis chris evelo and lars eijssen,...

NuNuGOGONuNuGOGO

the European Nutrigenomics Organisation

Gene Ontology (GO) analysisGene Ontology (GO) analysis

Chris Evelo and Lars Eijssen, Maastricht University

NuNuGOGONuNuGOGO


Amigo browser http://www.godatabase.org/cgi-bin/go.cgiGO consortium: http://www.geneontology.org

The Gene Ontology (GO) project gives a consistent description of gene products based on information from different databases.

Gene Ontology (GO) levels (I)Gene Ontology (GO) levels (I)

Gene Product annotation with GO terms

Cellular Component nucleus chromosome DNA topoisomerase complex

Biological Process DNA replication DNA topological change DNA ligation DNA repair

Molecular Function chromatin binding DNA topoisomerase activity DNA-dependent ATPase activity

Human DNA topoisomerase IIA(P11388)

NuNuGOGONuNuGOGO


Gene Ontology (GO) levels (II)Gene Ontology (GO) levels (II)

NuNuGOGONuNuGOGO


All freely available

from the internet

Gene Ontology analysis toolsGene Ontology analysis tools

Onto-Express GOToolbox MAPPFinder (GenMAPP) GOstat GO-Elite GeneMerge GOSurfer David/EASE Fatigo

Metacore (not free, NuGO has licences)

NuNuGOGONuNuGOGO GO analysis versus pathway analysisGO analysis versus pathway analysis

Biological pathways contain more information, GO classes are just sets of genes that share an annotation

Pathways are generally more curated GO classes are however organised in a tree, biological

pathways are (in practice) not GO classes are also more uniformly covering the space

of biological processes, pathway analysis depends heavily on the pathways that have been contributed/added

GO also covers cellular localisation and biochemical function

NuNuGOGONuNuGOGO NuGO GenePattern modulesNuGO GenePattern modules

NuGO Quality Control Analysis– Quality control; Bioconductor packages affy, affyPLM, simpleaffy

NuGO Expression File Creator– Normalisation (rma, gcrma, Mas 5.0, dchip); Bioconductor packages

rma, gcrma

Limma Analysis– Differential Gene Expression; Bioconductor package limma

TopGO Analysis – Gene Ontology based functional analysis; Bioconductor package topgo

Get Result for GO– Functional analysis; Bioconductor R script, to filter data for genes

associated with one particular GO identifier

Slide from: Caroline Reiff, RRI, Aberdeen

If you use these modules for your publication, please cite:De Groot, P.J., Reiff, C., Mayer, C., Mueller, M. NuGO contributionsto GenePattern. Genes Nutrition 2008; 3:143-156

NuNuGOGONuNuGOGO


TopGO analysisTopGO analysis

Runs topGO (bioconductor)– Gene enrichment analysis tool, which integrates the knowledge about

the relationship between GO terms (BP, MF, CC) for the calculation of statistical significance (Alexa et al., 2006).

2 test statistics – Fisher`s exact test (define threshold i.e. FDR<0.05)– Kolmogorov Smirnov (KS) test (looks at distribution of P values)

3 GO scoring algorithms (classic, elim, weight)– classic scores each node independent– elim scores nodes bottom up, scores parent nodes after

elimination of genes present in significant child node– weight scores nodes bottom up, assigns weights to genes

based on P values obtained for each node

NuNuGOGONuNuGOGO Scoring the tree (I)Scoring the tree (I)

Classic:

2/20

2/20 (20/100)

5/10 (7/30)

7/10

3/25 (11/50)

1/15

This node

This node plus subtree these values are used to score!(because the genes belong in factto that term as well)

Suppose all the bold values are significant The classis algorithm would return all these processes!

NuNuGOGONuNuGOGO Scoring the tree (II)Scoring the tree (II)

However, it would be better to only return the best term in every branch– Best could mean: the most specific significant one– This can be achieved by removing genes that are

present in significant child leaves, from the parent’s score

Elim does this:

2/20

2/20 (20/100)

5/10 (7/30)

7/10

3/25 (11/50)

1/15

(4/40)

NuNuGOGONuNuGOGO Scoring the tree (III)Scoring the tree (III)

Another option to score branches would be to compute the significance of each leave just as the classis algorithm

Hereafter, for every branch the most significant leave is the one that is reported back

NuNuGOGONuNuGOGO

NuNuGOGONuNuGOGO GO_EliteGO_Elite

Compatible with GenMAPP Mappfinder Smart algorithm Done in Python

– Fast– Runs on Windows and Linux (incl NBX)

Still under development Collaborative development

NuNuGOGONuNuGOGO Go_EliteGo_Elite

Searches relationships in a hierarchical nature

Identifies most significant scoring GO term:

with higher score than all sibling terms

For sibling terms, if one sibling branch scores higher than the parent and another branch does not, the highest scoring term from the latter sibling branch is also selected for the GO-Elite output, but the parent term is not

NuNuGOGONuNuGOGOTopGO Analysis (GenePattern)TopGO Analysis (GenePattern)implements bioconductor package topgoimplements bioconductor package topgo

Input: Limma results table (renamed to contain characters only) or table obtained from other analysis containing the following 3 columns:

topGO Analysis tests performed in GenePattern:

GenePattern topgo topgo statistics topgo algorithmclassic Fisher Fisher`s exact test classicclassic KS KS test classicelim Fisher Fisher`s exact test elimelim KS KS test elimWeight Fisher Fisher`s exact test weight

ID Gene Symbol Unigene Gene Description FC logFC AveExpr t P.Value adj.P.Val B1415670_at Copg Mm.258785coatomer protein complex, subunit gamma-1.26127 -0.33488 9.754401 -3.07062 0.007335 0.075293 -2.609791415671_at Atp6v0d1 Mm.17708 ATPase, H+ transporting, lysosomal V0 subunit D11.125095 0.170046 10.58127 2.210002 0.042067 0.155286 -4.219641415672_at Golga7 Mm.196269, Mm.327543golgi autoantigen, golgin subfamily a, 7-1.04305 -0.06081 10.53922 -0.75083 0.463684 0.587357 -6.129351415673_at Psph Mm.271784phosphoserine phosphatase-1.08924 -0.12332 7.884192 -0.67409 0.509903 0.628664 -6.18414


NuNuGOGONuNuGOGO

Load limma table

Enter threshold(P value or FDR)

Enter cdf name within quotes and with .db extension


NuNuGOGONuNuGOGOExample results table for elim Fisher test

(top 15 GO biological processes)

GO.ID Term Annotated Significant Expected elim

GO:0006955 immune response 410 158 88.47 2.80E-12

GO:0006118 electron transport 420 134 90.63 3.80E-07

GO:0006631 fatty acid metabolic process 161 66 34.74 1.30E-06

GO:0006260 DNA replication 155 64 33.45 4.30E-06

GO:0019886 antigen processing and presentation of e... 14 11 3.02 8.80E-06

GO:0007067 mitosis 181 64 39.06 1.30E-05

GO:0042552 myelination 27 15 5.83 0.00012

GO:0051301 cell division 263 81 56.75 0.00027

GO:0016064 immunoglobulin mediated immune response 69 28 14.89 0.00027

GO:0019217 regulation of fatty acid metabolic proce... 11 8 2.37 0.00041

GO:0008380 RNA splicing 205 65 44.23 0.00044

GO:0000074 regulation of progression through cell c... 395 113 85.23 0.00051

GO:0045576 mast cell activation 14 9 3.02 0.00069

GO:0006695 cholesterol biosynthetic process 25 13 5.39 0.00079

GO:0006364 rRNA processing 50 21 10.79 0.00091


TopGO analysis outputTopGO analysis output

NuNuGOGONuNuGOGOA GO Graph for each of the 5 tests

(squares= 15 most significant GO Ids)


NuNuGOGONuNuGOGO

Load Limma table

Enter Chip name

Click run


Get Results For GOGet Results For GO

NuNuGOGONuNuGOGO

GeneName GeneSymbol ID P.Value adj.P.Val logFCcarboxylesterase 3 Ces3 1449081_at 2.69E-06 0.0001273 -3.311896peroxisome proliferator activated receptor alphaPpara 1449051_at 7.44E-06 0.000256 -2.224587peroxisome proliferator activated receptor alphaPpara 1439675_at 2.70E-05 0.0006372 -2.178694-aminobutyrate aminotransferaseAbat 1433855_at 2.63E-05 0.000626 -2.161838glycerol-3-phosphate acyltransferase, mitochondrialGpam 1419499_at 1.02E-08 2.51E-06 -2.152756hydroxyprostaglandin dehydrogenase 15 (NAD)Hpgd 1419906_at 0.0004878 0.0054019 -1.745005enoyl-Coenzyme A, hydratase/3-hydroxyacyl Coenzyme A dehydrogenaseEhhadh 1448382_at 4.05E-06 0.0001673 -1.556166cDNA sequence BC018371BC018371 1425355_at 0.0036213 0.0252179 -1.499929acyl-CoA synthetase long-chain family member 3Acsl3 1428386_at 0.0003235 0.0040049 -1.483404acyl-CoA synthetase medium-chain family member 3Acsm3 1425559_a_at 0.0014181 0.0121822 -1.355912phytanoyl-CoA hydroxylasePhyh 1460194_at 5.42E-06 0.0002036 -1.341853lysophospholipase 3 Lypla3 1423704_at 7.31E-06 0.0002522 -1.335713solute carrier family 27 (fatty acid transporter), member 4Slc27a4 1424441_at 3.90E-06 0.000164 -1.260517acetyl-Coenzyme A acyltransferase 2 (mitochondrial 3-oxoacyl-Coenzyme A thiolase)Acaa2 1428145_at 9.34E-07 6.11E-05 -1.161749fatty acid desaturase 3 Fads3 1435910_at 0.0002635 0.0034303 -1.139258caveolin, caveolae protein 1Cav1 1449145_a_at 0.0004282 0.0049293 -1.124646solute carrier family 27 (fatty acid transporter), member 2Slc27a2 1416316_at 1.55E-05 0.0004362 -1.12027acetyl-Coenzyme A acyltransferase 1AAcaa1a 1416946_a_at 6.93E-07 5.03E-05 -1.102933acyl-Coenzyme A oxidase 1, palmitoylAcox1 1416409_at 9.29E-05 0.0015677 -1.074135crystallin, lambda 1 Cryl1 1430681_at 0.0004409 0.0050152 -1.042197

Highly significant FDR plus strong down-regulation


Example: results FA metabolismExample: results FA metabolism

NuNuGOGONuNuGOGO

load GCTCLSCHIP file

Click run

Wait for the result


GenePattern also has a GSEA moduleGenePattern also has a GSEA module

NuNuGOGONuNuGOGO

NAME GENE_SYMBOL GENE_TITLE SCORE1600029D21RIK null null -4.82235H2-DMA null null -4.2664PSMB8 PSMB8 proteasome (prosome, macropain) subunit, beta type, 8 (large multifunctional peptidase 7)-4.26033H2-AB1 null null -4.06198B3GALT5 B3GALT5 UDP-Gal:betaGlcNAc beta 1,3-galactosyltransferase, polypeptide 5-4.048650610037M15RIK null null -4.03495IIGP2 null null -3.98382LY6A null null -3.97938PSMB9 PSMB9 proteasome (prosome, macropain) subunit, beta type, 9 (large multifunctional peptidase 2)-3.88622H2-EB1 null null -3.69099EIF4E3 EIF4E3 eukaryotic translation initiation factor 4E member 3-3.68941TLR1 TLR1 toll-like receptor 1 -3.64236CD74 CD74 CD74 molecule, major histocompatibility complex, class II invariant chain-3.61186FUT2 FUT2 fucosyltransferase 2 (secretor status included)-3.58954GDA GDA guanine deaminase -3.57173

Enrichment in phenotype: C (6 samples)64 / 162 gene sets are upregulated in phenotype C 1 gene sets are significant at FDR < 25% 1 gene sets are significantly enriched at nominal pvalue < 1% 5 gene sets are significantly enriched at nominal pvalue < 5% Snapshot of enrichment results Detailed enrichment results in html format Detailed enrichment results in excel format (tab delimited text) Guide to interpret results

Enrichment in phenotype: ILko (6 samples)98 / 162 gene sets are upregulated in phenotype ILko 0 gene sets are significantly enriched at FDR < 25% 3 gene sets are significantly enriched at nominal pvalue < 1% 13 gene sets are significantly enriched at nominal pvalue < 5% Snapshot of enrichment results Detailed enrichment results in html format Detailed enrichment results in excel format (tab delimited text) Guide to interpret results


Example results of GSEAExample results of GSEA

http://www.broad.mit.edu/gsea/doc/GSEAUserGuideFrame.html?_Interpreting_GSEA_Results

NuNuGOGONuNuGOGO What to use for gene setsWhat to use for gene sets

You can use whatever you like…(meaning one GSEA is not the same as another even if it uses the same statistics)

Genesets from WikiPathways pathwaysPathVisio now also have GSEA

Metabolite sets (you could do MSEA…)

Could even use GO classes

NuNuGOGONuNuGOGO

Caroline ReiffPhilip De Groot

Sarah WielandKenneth Strouts

Claus MayerTony Travis

NuGO

Nathan SalmonisStan Gaj

Lars Eijssen

AcknowledgementsAcknowledgements

the european nutrigenomics organisation gene ontology (go) analysis chris evelo and lars eijssen,...

Documents

aberdeen topgo analysis

identifier slide

analysis chris evelo

biochemical function

gene product annotation

maastricht university

nugo contributions

annotation pathways