gene ontology john pinney [email protected]

36
Gene Ontology John Pinney [email protected]

Upload: druce

Post on 24-Feb-2016

62 views

Category:

Documents


0 download

DESCRIPTION

Gene Ontology John Pinney [email protected]. Gene annotation. G oal: transfer knowledge about the function of gene products from model organisms to other genomes. Gene annotation. Problem: keyword systems are different between research communities. Gene annotation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Gene Ontology John Pinney j.pinney@imperial.ac.uk

Gene OntologyJohn Pinney

[email protected]

Page 2: Gene Ontology John Pinney j.pinney@imperial.ac.uk

Gene annotation

Goal: transfer knowledge about

the function of gene products from model organisms to other genomes

Page 3: Gene Ontology John Pinney j.pinney@imperial.ac.uk

Gene annotation

Problem:keyword systems are

different between research communities

Page 4: Gene Ontology John Pinney j.pinney@imperial.ac.uk

Gene annotation

Solution:controlled vocabulary

Page 5: Gene Ontology John Pinney j.pinney@imperial.ac.uk

Ontology

structuredcontrolled vocabulary

Page 6: Gene Ontology John Pinney j.pinney@imperial.ac.uk

Ontology:a collection of terms

and their definitions

and the logical relationships between them

Page 7: Gene Ontology John Pinney j.pinney@imperial.ac.uk

Gene Ontology (GO):a collection of terms

and their definitions

and the logical relationships between them

describing gene products

Page 8: Gene Ontology John Pinney j.pinney@imperial.ac.uk

nucleus

“A membrane-bounded organelle of eukaryotic cells in which chromosomes are housed and replicated. In most cells, the nucleus contains all of the cell's chromosomes except the organellar chromosomes, and is the site of RNA synthesis and processing. In some species, or in specialized cell types, RNA metabolism or DNA replication may be absent.”

GO:0005634

Page 9: Gene Ontology John Pinney j.pinney@imperial.ac.uk

nucleus

cell

nuclear membrane

nucleoplasm

nucleolus

“part of”

Page 10: Gene Ontology John Pinney j.pinney@imperial.ac.uk

nucleus

intracellular membrane-bounded organelle

pronucleus

intracellular organelle

“is a”

membrane-bounded organelle

Page 11: Gene Ontology John Pinney j.pinney@imperial.ac.uk

A term may have more than one parent term

andmore than one child term.

=>The gene ontology is not a tree

Page 12: Gene Ontology John Pinney j.pinney@imperial.ac.uk

The gene ontology has a structure known as a Directed Acyclic Graph (DAG).

relationships are not symmetrical

there are no directed loops

mathematical term for a network

Page 13: Gene Ontology John Pinney j.pinney@imperial.ac.uk

GO is actually made up of 3 different ontologies:

cellular componentmolecular functionbiological process

Page 14: Gene Ontology John Pinney j.pinney@imperial.ac.uk

cellular component

“The part of a cell or its extracellular environment in which a gene product is located. A gene product may be located in one or more parts of a cell.”

Page 15: Gene Ontology John Pinney j.pinney@imperial.ac.uk

cellular componentexamples:

cohesin core heterodimerextracellular regionlaminin-1 complexreplication forktranscription factor complex

Page 16: Gene Ontology John Pinney j.pinney@imperial.ac.uk

molecular function

“Elemental activities, such as catalysis or binding, describing the actions of a gene product at the molecular level. A given gene product may exhibit one or more molecular functions.”

Page 17: Gene Ontology John Pinney j.pinney@imperial.ac.uk

molecular functionexamples:

transcription factor bindingenzyme activator activity3'-nucleotidase activitymetallopeptidase activityhexokinase activity

Page 18: Gene Ontology John Pinney j.pinney@imperial.ac.uk

biological process

“Those processes specifically pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms. A process is a collection of molecular events with a defined beginning and end.”

Page 19: Gene Ontology John Pinney j.pinney@imperial.ac.uk

biological processexamples:

para-aminobenzoic acid biosynthetic processprotein localizationestablishment of blood-nerve barriercircadian rhythmposterior midgut development

Page 20: Gene Ontology John Pinney j.pinney@imperial.ac.uk

geneontology.org

Page 21: Gene Ontology John Pinney j.pinney@imperial.ac.uk

geneontology.orgsearch and browse the ontologies

Page 22: Gene Ontology John Pinney j.pinney@imperial.ac.uk

geneontology.orgsearch and browse the ontologies

Page 23: Gene Ontology John Pinney j.pinney@imperial.ac.uk

geneontology.orgdownload ontologies

Page 24: Gene Ontology John Pinney j.pinney@imperial.ac.uk

geneontology.orgdownload mappings from other databases

enzyme functions (EC, KEGG, MetaCyc)

protein domains(Pfam, SMART, PRINTS,…)

other controlled vocabularies of functions(E. coli functions, MIPS FunCat)

Page 25: Gene Ontology John Pinney j.pinney@imperial.ac.uk

geneontology.orgdownload annotations for various genomes

Page 26: Gene Ontology John Pinney j.pinney@imperial.ac.uk

NCBI_NPNP_354299.2lolDGO:0043190ISS"ABC transporter, nucleotide binding/ATPase protein (lipoprotein)"taxon:17629920070612PAMGO_GAT

geneontology.orgdownload annotations for various genomes

databasegene product IDgene symbolGO term ID

evidence code

Page 27: Gene Ontology John Pinney j.pinney@imperial.ac.uk

evidence codes

Allow curators to indicate the type of evidence for each gene-term annotation.

experimental

computational

author statement

e.g. IMP Inferred from mutant phenotype IDA Inferred from direct assay

e.g. ISS Inferred from sequence similarity

IGC Inferred from genome context e.g. TAS Traceable author statement

Page 28: Gene Ontology John Pinney j.pinney@imperial.ac.uk

NCBI_NPNP_354299.2lolDGO:0043190ISS"ABC transporter, nucleotide binding/ATPase protein (lipoprotein)"taxon:17629920070612PAMGO_GAT

geneontology.orgdownload annotations for various genomes

databasegene product IDgene symbolGO term ID

evidence code description

organism (taxon) IDdateannotation project ID

Page 29: Gene Ontology John Pinney j.pinney@imperial.ac.uk

geneontology.orgrepository of analysis tools that use GO

search, edit and and browse ontologies / annotationssoftware librariesstatistical analysistext miningprotein interactionsenrichment analysis

Page 30: Gene Ontology John Pinney j.pinney@imperial.ac.uk

Enrichment analysis

Page 31: Gene Ontology John Pinney j.pinney@imperial.ac.uk

significant expression change in a microarray experiment

cluster from a protein

interaction network

some other experiment /

analysis

gene setwhole

genome (annotated)

Which GO terms occur significantly more often than expected in this

gene set?

BiNGO

GOstat

ArrayTrack

Page 32: Gene Ontology John Pinney j.pinney@imperial.ac.uk

Advantages of GOsingle set of terms to describe the function of gene products from all organisms.DAG structure provides a logical framework to represent knowledge at whatever level of detail is available.continually revised to reflect the state of current knowledge.can quantify strength of relationships between terms (semantic similarity).many statistical analysis tools available.

Page 33: Gene Ontology John Pinney j.pinney@imperial.ac.uk

Limitations of GOGO is limited in scope: it does not cover

processes that are not normal functions of gene products (e.g. oncogenesis).

sequence attributes (e.g. introns/exons)protein structures or interactionsevolutiongene expression

Page 34: Gene Ontology John Pinney j.pinney@imperial.ac.uk

Summary (1)The gene ontology (GO) is a structured, controlled vocabulary to describe the function of gene products.

Terms in GO have logical relationships (“is a”, “part of”) with one another. Together these form a structure called a Directed Acyclic Graph (DAG).

GO is formed of 3 separate ontologies describing different aspects of gene function: cellular component, molecular function and biological process.

Page 35: Gene Ontology John Pinney j.pinney@imperial.ac.uk

Summary (2)geneontology.org is the central resource for downloading ontology, annotation and mapping files.

evidence codes are used in annotations to show the experimental, computational or literature support for each function.

Page 36: Gene Ontology John Pinney j.pinney@imperial.ac.uk

Summary (3)many software tools are available to support GO analysis of experimental data, including enrichment analysis by

ArrayTrack (microarray expression data)BiNGO (protein interaction clusters)GOstat (any data in the form of gene sets)