introduction to the gene ontology and go annotation resources

54
EBI is an Outstation of the European Molecular Biology Laboratory. Introduction to the Gene Ontology and GO annotation resources Rachael Huntley UniProtKB-GOA Curator EBI

Upload: fionn

Post on 11-Jan-2016

60 views

Category:

Documents


1 download

DESCRIPTION

Rachael Huntley UniProtKB-GOA Curator EBI. Introduction to the Gene Ontology and GO annotation resources. Talk Overview. Intro to GO and GO terms. Annotating to GO. Accessing GO annotations. Practical use of GO. GO analysis tools. Precautions. Reasons for the Gene Ontology. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to the Gene Ontology and GO annotation resources

EBI is an Outstation of the European Molecular Biology Laboratory.

Introduction to the Gene Ontologyand GO annotation resources

Rachael Huntley

UniProtKB-GOA Curator

EBI

Page 2: Introduction to the Gene Ontology and GO annotation resources

2

Talk Overview

• Intro to GO and GO terms

• Annotating to GO

• Practical use of GO

• GO analysis tools

• Precautions

• Accessing GO annotations

Page 3: Introduction to the Gene Ontology and GO annotation resources

3

Reasons for the Gene Ontology

www.geneontology.org

• Inconsistency in naming of biological concepts

• Large datasets need to be interpreted quickly

• Increasing amounts of biological data available

Page 4: Introduction to the Gene Ontology and GO annotation resources

4

Search on ‘DNA repair’...get over 60,000 results

Increasing amounts of biological data available

Expansion of sequence information

Page 5: Introduction to the Gene Ontology and GO annotation resources

5

Reasons for the Gene Ontology

• Inconsistency in naming of biological concepts

• Large datasets need to be interpreted quickly

• Increasing amounts of biological data available

Page 6: Introduction to the Gene Ontology and GO annotation resources

6

• Need to organise the data, analyse it, share it in a timely manner to benefit other researchers

• Inconsistencies in analyses make cross-species or cross-database comparison difficult

Large datasets need to be interpreted quickly

Page 7: Introduction to the Gene Ontology and GO annotation resources

7

Reasons for the Gene Ontology

• Inconsistency in naming of biological concepts

• Large datasets need to be interpreted quickly

• Increasing amounts of biological data available

Page 8: Introduction to the Gene Ontology and GO annotation resources

8

English is not a very precise language

• Same name for different concepts• Different names for the same concept

Inconsistency in naming of biological concepts

?

An example …

Tactition Tactile sense

Taction

Sensory perception of touch ; GO:0050975

Page 9: Introduction to the Gene Ontology and GO annotation resources

9

• A way to capture biological knowledge in a written and computable form

The Gene Ontology

• A set of concepts and their relationships to each other arrangedas a hierarchy

www.ebi.ac.uk/QuickGO

Less specific concepts

More specific concepts

Page 10: Introduction to the Gene Ontology and GO annotation resources

10

The Concepts in GO

1. Molecular Function

2. Biological Process

3. Cellular Component

An elemental activity or task or job

• protein kinase activity• insulin receptor activity

A commonly recognised series of events

• cell division

Where a gene product is located

• mitochondrion

• mitochondrial matrix

• mitochondrial inner membrane

Page 11: Introduction to the Gene Ontology and GO annotation resources

11

Anatomy of a GO term

Unique identifierUnique identifier

Term nameTerm name

DefinitionDefinitionSynonymsSynonyms

Cross-referencesCross-references

Page 12: Introduction to the Gene Ontology and GO annotation resources

12

Ontology structure

• Directed acyclic graph

Terms can have more than one parent

• Terms are linked by relationships

is_a

part_of

regulates

+ve regulates

-ve regulateswww.ebi.ac.uk/QuickGO

Page 13: Introduction to the Gene Ontology and GO annotation resources

13

http://amigo.geneontology.org/cgi-bin/amigo/go.cgi

… there are more browsers available on the GO Consortium Tools page:

http://www.geneontology.org/GO.tools.shtml

Searching for GO terms

The latest OBO format Gene Ontology file can be downloaded from:

http://www.geneontology.org/ontology/gene_ontology.obo

http://www.ebi.ac.uk/QuickGO

Page 14: Introduction to the Gene Ontology and GO annotation resources

14 http://www.ebi.ac.uk/QuickGO

The EBI's QuickGO browser

Search GO terms or proteins

Page 15: Introduction to the Gene Ontology and GO annotation resources

15

GO Annotation

Page 16: Introduction to the Gene Ontology and GO annotation resources

16

Reactome

http://www.geneontology.org

Page 17: Introduction to the Gene Ontology and GO annotation resources

17

Aims of the GO project

• Compile the ontologies

- currently over 34,000 terms- constantly increasing and improving

• Annotate gene products using ontology terms

- around 30 groups provide annotations • Provide a public resource of data and tools

- regular releases of annotations- tools for browsing/querying annotations and editing the ontology

Page 18: Introduction to the Gene Ontology and GO annotation resources

18

Gene Ontology Annotation (UniProtKB-GOA) database at the EBI

• Largest open-source contributor of annotations to GO

• Member of the GO Consortium since 2001

• Provides annotation for more than 360,000 species

• UniProtKB-GOA’s priority is to annotate the human proteome

• UniProtKB-GOA is responsible for human, chicken and bovine annotations in the GO Consortium

Page 19: Introduction to the Gene Ontology and GO annotation resources

19

A GO annotation is … …a statement that a gene product;

1. has a particular molecular function or is involved in a particular biological process

or is located within a certain cellular component

2. as determined by a particular method

3. as described in a particular reference

P00505

Accession Name GO ID GO term name Reference Evidence code

IDAPMID:2731362aspartate transaminase activityGO:0004069GOT2

Page 20: Introduction to the Gene Ontology and GO annotation resources

20

Electronic

Manual

GOA makes annotations using two methods;

• Quick way of producing large numbers of annotations• Annotations use less-specific GO terms

• Time-consuming process producing lower numbers of annotations

• Annotations are very detailed and accurate

Page 21: Introduction to the Gene Ontology and GO annotation resources

21

Electronic

Manual

Evidence Codes

Inferred from Direct Assay (IDA) – e.g. enzyme assay

Inferred from Physical Interaction (IPI) – e.g. yeast 2-hybrid

Inferred from Electronic Annotation (IEA) – e.g. UniProtKB keyword mapping

Some examples…

See the full list on the GO Consortium websitehttp://www.geneontology.org/GO.evidence.shtml

Page 22: Introduction to the Gene Ontology and GO annotation resources

22

1. Mapping of external concepts to GO terms

e.g. InterPro2GO, Swiss-Prot Keyword2GO, Enzyme Commission2GO

Annotations are high-quality and have an explanation of the method (GO_REF)

Electronic annotation by UniProtKB-GOA

Macaque

Mouse

Dog

e.g. Human

Cow

Guinea PigChimpanzee Rat

Chicken

Ensembl compara

2. Automatic transfer of annotations to orthologs

Page 23: Introduction to the Gene Ontology and GO annotation resources

23

Mapping of concepts from UniProtKB files

GO:0004715 ; non-membrane spanning protein tyrosine kinase activity

GO:0005856: cytoskeleton

GO:0007155 ; cell adhesion

Page 24: Introduction to the Gene Ontology and GO annotation resources

24

1. Mapping of external concepts to GO terms

e.g. InterPro2GO, Swiss-Prot Keyword2GO, Enzyme Commission2GO

Annotations are high-quality and have an explanation of the method (GO_REF)

Electronic annotation by UniProtKB-GOA

Macaque

Mouse

DogCow

Guinea PigChimpanzee Rat

Chicken

Ensembl compara

2. Automatic transfer of annotations to orthologs

...and more

e.g. Human

Page 25: Introduction to the Gene Ontology and GO annotation resources

25

Manual annotation by UniProtKB-GOA

High–quality, specific annotations made using:

• Peer-reviewed papers

• A range of evidence codes to categorise the types of evidence found in a

paper

http://www.ebi.ac.uk/GOA

Page 26: Introduction to the Gene Ontology and GO annotation resources

26

Finding annotations in a paper

In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response…

Process: response to wounding GO:0009611

wound response

serine/threonine kinase activity,

Function: protein serine/threonine kinase activity GO:0004674

integral membrane protein

Component: integral to plasma membrane GO:0005887

…for B. napus PERK1 protein (Q9ARH1)

PubMed ID: 12374299

Page 27: Introduction to the Gene Ontology and GO annotation resources

27

Additional information

• QualifiersModify the interpretation of an annotation

• NOT (protein is not associated with the GO term)

• colocalizes_with (protein associates with complex but is not a bona fide member)

• contributes_to (describes action of a complex of proteins)

• 'With' column

Can include further information on the method being referenced

e.g. the protein accession of an interacting protein

Page 28: Introduction to the Gene Ontology and GO annotation resources

28

GO:0005739 Mitochondrion

HumanProteinAtlas

UniProtKB-GOA integrates manual annotations from external groups

LifeDB (subcellular locations)

Reactome (pathways)

also;

Page 29: Introduction to the Gene Ontology and GO annotation resources

29

* Includes manual annotations integrated from external model organism and multi-species databases

Status of UniProtKB-GOA Annotation

0.9%

66%

159,630 910,536Manual annotations*

11,261,38298,037,844Electronic annotations

Proteins Annotations Evidence Source UniProt

Coverage

Aug 2011 Statistics

Page 30: Introduction to the Gene Ontology and GO annotation resources

30

How to access and use

GO annotation data

Page 31: Introduction to the Gene Ontology and GO annotation resources

31

Where can you find annotations?

UniProtKB

Ensembl

Entrez gene

Page 32: Introduction to the Gene Ontology and GO annotation resources

32

GO Consortium website

UniProtKB-GOA website

Gene Association Files17 column files containing all information for each annotation

Page 33: Introduction to the Gene Ontology and GO annotation resources

33

GO browsers

Page 34: Introduction to the Gene Ontology and GO annotation resources

34 http://www.ebi.ac.uk/QuickGO

The EBI's QuickGO browser

Search GO terms or proteins

Find sets of GO annotations

Page 35: Introduction to the Gene Ontology and GO annotation resources

35

• Access gene product functional information

• Analyse high-throughput genomic or proteomic datasets

• Validation of experimental techniques

• Get a broad overview of a proteome

• Obtain functional information for novel gene products 

How scientists use the GO

Some examples…

Page 36: Introduction to the Gene Ontology and GO annotation resources

36

Selected Gene Tree: pearson lw n3d ...Branch color classification:Set_LW_n3d_5p_...

Colored by: Copy of Copy of C5_RMA (Defa...Gene List: all genes (14010)

attacked

time

control

Puparial adhesionMolting cycleHemocyanin

Defense responseImmune responseResponse to stimulusToll regulated genesJAK-STAT regulated genes

Immune responseToll regulated genes

Amino acid catabolismLipid metobolism

Peptidase activityProtein catabolismImmune response

Selected Gene Tree: pearson lw n3d ...Branch color classification:Set_LW_n3d_5p_...

Colored by: Copy of Copy of C5_RMA (Defa...Gene List: all genes (14010)

Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI.

MicroArray data analysis

Analysis of high-throughput genomic datasets

Page 37: Introduction to the Gene Ontology and GO annotation resources

37

(Orrù et al., Molecular and Cellular Proteomics 2007)

Characterisation of proteins interacting with ribosomal protein S19

Analysis of high-throughput proteomic datasets

Page 38: Introduction to the Gene Ontology and GO annotation resources

38

Validation of experimental techniques

(Cao et al., Journal of Proteome Research 2006)

Rat liver plasma membrane isolation

Page 39: Introduction to the Gene Ontology and GO annotation resources

39

Obtain functional information for novel gene products 

MPYVSQSQHIDRVRGAIEGRLPAPGNSSRLVSSWQRSYEQYRLDPGSVIGPRVLTSSELR DVQGKEEAFLRASGQCLARLHDMIRMADYCVMLTDAHGVTIDYRIDRDRRGDFKHAGLYI GSCWSEREEGTCGIASVLTDLAPITVHKTDHFRAAFTTLTCSASPIFAPTGELIGVLDAS AVQSPDNRDSQRLVFQLVRQSAALIEDGYFLNQTAQHWMIFGHASRNFVEAQPEVLIAFD ECGNIAASNRKAQECIAGLNGPRHVDEIFDTSAVHLHDVARTDTIMPLRLRATGAVLYAR IRAPLKRVSRSACAVSPSHSGQGTHDAHNDTNLDAISRFLHSRDSRIARNAEVALRIAGK HLPILILGETGVGKEVFAQALHASGARRAKPFVAVNCGAIPDSLIESELFGYAPGAFTGA RSRGARGKIAQAHGGTLFLDEIGDMPLNLQTRLLRVLAEGEVLPLGGDAPVRVDIDVICA THRDLARMVEEGTFREDLYYRLSGATLHMPPLRERADILDVVHAVFDEEAQSAGHVLTLD GRLAERLARFSWPGNIRQLRNVLRYACAVCDSTRVELRHVSPDVAALLAPDEAALRPALA LENDERARIVDALTRHHWRPNAAAEALGM

www.ebi.ac.uk/InterProScan

InterProScan

Page 40: Introduction to the Gene Ontology and GO annotation resources

40

Annotating novel sequences

• Can use BLAST queries to find similar sequences with GO annotation which can be transferred to the new sequence

• Two tools currently available;

AmiGO BLAST (from GO Consortium) http://amigo.geneontology.org/cgi-bin/amigo/blast.cgi

– searches the GO Consortium database

BLAST2GO (from Babelomics) http://www.blast2go.org/

– searches the NCBI database

Page 41: Introduction to the Gene Ontology and GO annotation resources

41

AmiGO BLAST

amigo.geneontology.org/cgi-bin/amigo/blast.cgi

Exportin-T from Pongo abelii (Sumatran orangutan)

Page 42: Introduction to the Gene Ontology and GO annotation resources

42

Analysis of large datasets using GO

Page 43: Introduction to the Gene Ontology and GO annotation resources

43

Using the GO to provide a functional overview for a large dataset

• Many GO analysis tools use GO slims to give a broad overview of the dataset

• GO slims are cut-down versions of the GO andcontain a subset of the terms in the whole GO

• GO slims usually contain less-specialised GO terms

Page 44: Introduction to the Gene Ontology and GO annotation resources

44

Slimming the GO using the ‘true path rule’ Many gene products are associated with a large number of descriptive, leaf GO nodes:

Page 45: Introduction to the Gene Ontology and GO annotation resources

45

Slimming the GO using the ‘true path rule’ …however annotations can be mapped up to a smaller set of parent GO terms:

Page 46: Introduction to the Gene Ontology and GO annotation resources

46

GO slims

or you can make your own using;

Custom slims are available for download;

http://www.geneontology.org/GO.slims.shtml

• AmiGO's GO slimmer

• QuickGOhttp://www.ebi.ac.uk/QuickGO

http://amigo.geneontology.org/cgi-bin/amigo/slimmer

Page 47: Introduction to the Gene Ontology and GO annotation resources

47 www.ebi.ac.uk/QuickGO

Map-up annotations with GO slims

The EBI's QuickGO browser

Search GO terms or proteins

Find sets of GO annotations

Page 48: Introduction to the Gene Ontology and GO annotation resources

48

GO analysis tools

Page 49: Introduction to the Gene Ontology and GO annotation resources

49

Numerous Third Party Tools

www.geneontology.org/GO.tools.shtml

Page 50: Introduction to the Gene Ontology and GO annotation resources

50

Term enrichment

• Most popular type of GO analysis

• Determines which GO terms are more oftenassociated with a specified list of genes/proteins compared with a control list or rest of genome

• Many tools available to do this analysis

• User must decide which is best for their analysis

Page 51: Introduction to the Gene Ontology and GO annotation resources

51

Precautions when using GO annotations for analysis

• Recommended that ‘NOT’ annotations are removed before analysis - only ~5000 out of ~100 million annotations are ‘NOT’- can confuse the analysis

• The Gene Ontology is always changing and GO annotations are continually being created

- always use a current version of both

- if publishing your analyses please report the versions/dates you used

http://www.geneontology.org/GO.cite.shtml

Page 52: Introduction to the Gene Ontology and GO annotation resources

52

Precautions when using GO annotations for analysis

• Unannotated is not unknown

- where there is no evidence in the literature for a process, function orlocation the gene product is annotated to the appropriate ontology’sroot node with an ‘ND’ evidence code (no biological data), thereby distinguishing between unannotated and unknown

• Pay attention to under-represented GO terms- a strong under-representation of a pathway may mean that normalfunctioning of that pathway is necessary for the given condition

Page 53: Introduction to the Gene Ontology and GO annotation resources

53

The UniProtKB-GOA group

Curators:

Software developer:

Team leaders:

Emily DimmerRachael Huntley

Rolf Apweiler

Email: [email protected]

http://www.ebi.ac.uk/GOA

Claire O’Donovan

Tony Sawford

Yasmin Alam-Faruque

Page 54: Introduction to the Gene Ontology and GO annotation resources

54

Acknowledgements

GO ConsortiumMembers of;

UniProtKB

InterPro

IntAct

HAMAP

FundingNational Human Genome Research Institute (NHGRI)Kidney Research UKEMBL