go and obo:
DESCRIPTION
GO and OBO:. an introduction. What is the Gene Ontology? What is OBO? OBO-Edit demo & practical. Gene Ontology. Built for a very specific purpose: “annotation of genes and proteins in genomic and protein databases” Applicable to all species. Evolution of GO. Original GO created in 2000 - PowerPoint PPT PresentationTRANSCRIPT
GO and OBO:GO and OBO:
an introductionan introduction
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
• What is the Gene Ontology?• What is OBO?• OBO-Edit demo & practical
• What is the Gene Ontology?• What is OBO?• OBO-Edit demo & practical
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
Gene OntologyGene Ontology
• Built for a very specific purpose:“annotation of genes and proteins in
genomic and protein databases”• Applicable to all species
• Built for a very specific purpose:“annotation of genes and proteins in
genomic and protein databases”• Applicable to all species
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
Evolution of GOEvolution of GO
• Original GO created in 2000• Three databases involved:
– FlyBase (Drosophila)– MGI (Mouse)– SGD (S. cerevisae)
• Used immediately
• Original GO created in 2000• Three databases involved:
– FlyBase (Drosophila)– MGI (Mouse)– SGD (S. cerevisae)
• Used immediately
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
Evolution of GOEvolution of GO
• Later databases:– TAIR (Arabadopsis)– TIGR (microbes including prokaryotes)– SWISS-PROT (several thousand species inc. human)– PSU (P. falciparum)
• Recent additions– ZFIN (zebrafish)– PAMGO (plant pathogens)
• Later databases:– TAIR (Arabadopsis)– TIGR (microbes including prokaryotes)– SWISS-PROT (several thousand species inc. human)– PSU (P. falciparum)
• Recent additions– ZFIN (zebrafish)– PAMGO (plant pathogens)
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
Evolution of GOEvolution of GO
• GO development traditionally annotation-driven– development directed by use
• Terms added as new species annotated• Terms added on as as-needed basis
• GO development traditionally annotation-driven– development directed by use
• Terms added as new species annotated• Terms added on as as-needed basis
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
Evolution of GOEvolution of GO
• Developed by an international consortium of biologists and computer scientists– members from individual databases– central office at EBI
• Development involves collaboration with domain experts from different biological fields– also formal ontologists
• Developed by an international consortium of biologists and computer scientists– members from individual databases– central office at EBI
• Development involves collaboration with domain experts from different biological fields– also formal ontologists
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
Evolution of GOEvolution of GO
• Resulted in ‘organic’ structure, little formality
• Ontological formality added subsequently– philosophical and logical
• Resulted in ‘organic’ structure, little formality
• Ontological formality added subsequently– philosophical and logical
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
Growth of GOGrowth of GOGO term history 2001 - 2007
0
5000
10000
15000
20000
25000
30000
Jan-01Apr-01Jul-01Oct-01Jan-02Apr-02Jul-02Oct-02Jan-03Apr-03Jul-03Oct-03Jan-04Apr-04Jul-04Oct-04Jan-05Apr-05Jul-05Oct-05Jan-06Apr-06Jul-06Oct-06Jan-07
Date
Number of terms
obsolete
undefined terms
defined terms
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
How does GO work?How does GO work?
• What does the gene product do?• Where and when does it act?• Why does it perform these
activities?
• What does the gene product do?• Where and when does it act?• Why does it perform these
activities?
What information might we want to capture about a gene product?What information might we want to capture about a gene product?
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
GO structureGO structure
• GO terms divided into three parts:– cellular component– molecular function– biological process
• GO terms divided into three parts:– cellular component– molecular function– biological process
Cellular ComponentCellular Component
• where a gene product acts
Cellular ComponentCellular Component
Cellular ComponentCellular Component
Cellular ComponentCellular Component
• Enzyme complexes in the component ontology refer to places, not activities.
Molecular FunctionMolecular Function
• activities or “jobs” of a gene product
glucose-6-phosphate isomerase activity
Molecular FunctionMolecular Function
insulin bindinginsulin receptor activity
Molecular FunctionMolecular Function
drug transporter activity
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
Molecular FunctionMolecular Function
• A gene product may have several functions; a function term refers to a single reaction or activity, not a gene product.
• Sets of functions make up a biological process.
• A gene product may have several functions; a function term refers to a single reaction or activity, not a gene product.
• Sets of functions make up a biological process.
Biological ProcessBiological Process
a commonly recognized series of events
cell division
Biological ProcessBiological Process
transcription
Biological ProcessBiological Process
regulation of gluconeogenesis
Biological ProcessBiological Process
limb development
Biological ProcessBiological Process
courtship behavior
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
Ontology StructureOntology Structure
• Terms are linked by two relationships– is-a – part-of
• Terms are linked by two relationships– is-a – part-of
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
Ontology StructureOntology Structurecell
membrane chloroplast
mitochondrial chloroplastmembrane membrane
is-apart-of
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
Ontology StructureOntology Structure
• Ontologies are structured as a hierarchical directed acyclic graph (DAG)
• Terms can have more than one parent and zero, one or more children
• Ontologies are structured as a hierarchical directed acyclic graph (DAG)
• Terms can have more than one parent and zero, one or more children
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
Ontology StructureOntology Structurecell
membrane chloroplast
mitochondrial chloroplastmembrane membrane
Directed Acyclic Graph (DAG) - multiple
parentage allowed
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
Open Biomedical Ontologies (OBO)Open Biomedical Ontologies (OBO)
• GO is a member of OBO • An umbrella project for grouping
different ontologies in biological/medical field– a repository for ontologies with
defined set of standards• Available from a single source:http://obo.sourceforge.net/
• GO is a member of OBO • An umbrella project for grouping
different ontologies in biological/medical field– a repository for ontologies with
defined set of standards• Available from a single source:http://obo.sourceforge.net/
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
Why do we need OBO?Why do we need OBO?
• GO covers small area of biology:– molecular function of a protein– biological function of a protein– cellular location of a protein
• GO covers small area of biology:– molecular function of a protein– biological function of a protein– cellular location of a protein
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
Why do we need OBO?Why do we need OBO?
• Lots of other aspects that also need to be captured, e.g.:– phenotype– anatomy– genomic– taxonomy
• Lots of other aspects that also need to be captured, e.g.:– phenotype– anatomy– genomic– taxonomy
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
Why do we need OBO?Why do we need OBO?
• Many groups develop their own ontologies– e.g. plant ontology, anatomies for specific organisms
• No standardisation of ontologies with respect to:– format– scope – relationships
• No way of knowing whether such ontologies already exist
• No mechanism of distribution for other groups
• Many groups develop their own ontologies– e.g. plant ontology, anatomies for specific organisms
• No standardisation of ontologies with respect to:– format– scope – relationships
• No way of knowing whether such ontologies already exist
• No mechanism of distribution for other groups
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
Why do we need OBO?Why do we need OBO?
• Creating ontologies takes a lot of work– Makes sense to reuse existing
ontologies where possible• Improves data integration where
small set of ontologies used• Allows ontologies to be made
available from a single place
• Creating ontologies takes a lot of work– Makes sense to reuse existing
ontologies where possible• Improves data integration where
small set of ontologies used• Allows ontologies to be made
available from a single place
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
Why do we need OBO?Why do we need OBO?
• Ultimate aim: a complete set of integrated ontologies completely covering the biomedical domain
• Ultimate aim: a complete set of integrated ontologies completely covering the biomedical domain
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
OBO requirementsOBO requirements
To be part of OBO, ontologies must:
• Be open, can be used by all without any constraint
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
OBO requirements: openOBO requirements: open
• Ontologies can be used by anyone without any constraints, except:– original authors are acknowledged– cannot be edited and then released
under same name
• Ontologies can be used by anyone without any constraints, except:– original authors are acknowledged– cannot be edited and then released
under same name
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
OBO requirementsOBO requirements
To be part of OBO, ontologies must:
• Be open, can be used by all without any constraint
• Be in a common shared syntax
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
OBO requirements: syntax OBO requirements: syntax
• Usually the OBO format, same as primary GO format– and adaptions of OBO format
• Also accept OWL (Web Ontology Language) format
• Allows the same tools to be applied, facilitating shared software implementations
• Usually the OBO format, same as primary GO format– and adaptions of OBO format
• Also accept OWL (Web Ontology Language) format
• Allows the same tools to be applied, facilitating shared software implementations
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
Anatomy of an OBO termAnatomy of an OBO termid: GO:0006094name: gluconeogenesisnamespace: processdef: The formation of glucose fromnoncarbohydrate precursors, such aspyruvate, amino acids and glycerol.[http://cancerweb.ncl.ac.uk/omd/index.html]exact_synonym: glucose biosynthesisxref_analog: MetaCyc:GLUCONEO-PWYis_a: GO:0006006is_a: GO:0006092
id: GO:0006094name: gluconeogenesisnamespace: processdef: The formation of glucose fromnoncarbohydrate precursors, such aspyruvate, amino acids and glycerol.[http://cancerweb.ncl.ac.uk/omd/index.html]exact_synonym: glucose biosynthesisxref_analog: MetaCyc:GLUCONEO-PWYis_a: GO:0006006is_a: GO:0006092
unique IDterm name
definition
synonymdatabase ref
parentage
ontology
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
OBO requirementsOBO requirements
To be part of OBO, ontologies must:
• Be open, can be used by all without any constraint
• Be in a common shared syntax• Not overlap with other ontologies in
OBO
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
OBO requirements: overlappingOBO requirements: overlapping• Ontologies can (and should)
overlap partially, but large overlap should be avoided
• Idea is that terms from different ontologies can be combined to form new terms
• Striving for accepted standards rather than competition
• Ontologies can (and should) overlap partially, but large overlap should be avoided
• Idea is that terms from different ontologies can be combined to form new terms
• Striving for accepted standards rather than competition
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
OBO requirementsOBO requirements
To be part of OBO, ontologies must:
• Be open, can be used by all without any constraint
• Be in a common shared syntax• Not overlap with other ontologies in
OBO• Share a unique identifier space
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
OBO requirements: id spaceOBO requirements: id space• So, for example, the GO identifier
is “GO”:– No other OBO ontology could use this
id space
• Prevents problems where multiple ontologies are used together
• So, for example, the GO identifier is “GO”:– No other OBO ontology could use this
id space
• Prevents problems where multiple ontologies are used together
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
OBO requirementsOBO requirements
To be part of OBO, ontologies must:
• Be open, can be used by all without any constraint
• Be in a common shared syntax• Not overlap with other ontologies in
OBO• Share a unique identifier space• Include text definitions of their terms
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
OBO requirementsOBO requirements
• In addition, OBO includes ontology of relationships– all ontologies should use these
definitions of relationships• For example
– part_of– develops_from– regulates
• In addition, OBO includes ontology of relationships– all ontologies should use these
definitions of relationships• For example
– part_of– develops_from– regulates
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
What’s availableWhat’s available
• demo:http://obo.sourceforge.net/
• demo:http://obo.sourceforge.net/
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
Editing ontologiesEditing ontologies
• GO is edited using OBO-Edit– stand-alone Java application– available for all platforms– browse, create or edit any ontology in
OBO format
• GO is edited using OBO-Edit– stand-alone Java application– available for all platforms– browse, create or edit any ontology in
OBO format
Jane Lomax EMBL-EBIJane Lomax EMBL-EBI
OBO-Edit demoOBO-Edit demo
• Browsing ontologies– loading ontologies (including loading multiple ontologies)– graph viewer– reasoner/single relationship views– searching/filtering/rendering– help
• Creating/editing ontologies– creating a new ontology– adding terms– copying/moving/deleting terms– adding definitions, dbxrefs etc– verification plugin– saving ontologies
• Browsing ontologies– loading ontologies (including loading multiple ontologies)– graph viewer– reasoner/single relationship views– searching/filtering/rendering– help
• Creating/editing ontologies– creating a new ontology– adding terms– copying/moving/deleting terms– adding definitions, dbxrefs etc– verification plugin– saving ontologies