gene ontology overview and perspective lung development ontology workshop
Post on 30-Dec-2015
225 Views
Preview:
TRANSCRIPT
2
what kinds of things exist?
what are the relationships between these things?
eye
_part of
sclera
_is a
sense organ
developsfrom
Optic placode
A biological ontology is:
A (machine) interpretable representation of some aspect of biological reality
http://www.macula.org/anatomy/eyeframe.html
3
Gene Ontology (GO) Consortium
aa
www.geneontology.org Formed to develop a shared language adequate for the annotation of molecular characteristics across organisms; a common language to
share knowledge.
Seeks to achieve a mutual understanding of the definition and meaning of any word used; thus we are able to support cross-database queries.
Members agree to contribute gene product annotations and associated sequences to GO database; thus facilitating data analysis and semantic interoperability.
5
Molecular Function = elemental activity/task
the tasks performed by individual gene products; examples are carbohydrate binding and ATPase activity
Biological Process = biological goal or objective broad biological goals, such as mitosis or purine metabolism, that
are accomplished by ordered assemblies of molecular functions
Cellular Component = location or complex subcellular structures, locations, and macromolecular complexes;
examples include nucleus, telomere, and RNA polymerase II holoenzyme
GO represents three biological domains
7
Build and maintain logically rigorous and biologically accurate ontologies
Comprehensively annotate reference genomes
Support genome annotation projects for all organisms
Freely provide ontologies, annotations and tools to the research community
The Gene Ontology (GO)
1. Build and maintain logically rigorous and biologically accurate ontologies
2. Comprehensively annotate reference genomes
3. Support genome annotation projects for all organisms
4. Freely provide ontologies, annotations and tools to the research community
8
The GO is still developing daily both in ontological structures and in domain knowledge
Ontology development workshops focus on specific domains needing revision and bring together ontology developers and domain experts
Currently running ~2 workshops / year1. Metabolism and cell cycle (Aug, 2004)2. Immunology and defense response (Nov05, Apr06)3. Early CNS development (June, 2006)4. Peripheral nervous system development (Feb, 2007)5. Blood Pressure Regulation (June, 2007)6. Muscle Development (July, 2007)
Building the ontologies
9
725 new terms related to immunology
127 new terms added to cell type ontology
Building the ontology: Immune System Process
Red part_of
Blue is_aAlex Diehl
10
P05147
PMID: 2976880
GO:0047519IDA
P05147 GO:0047519 IDA PMID:2976880
GO Term
Reference
Evidence
Annotating Gene Products using GO
Gene Product
11
Annotations are assertions
There is evidence that this gene product can be best classified using this term
The source of the evidence and other information is included
There is agreement on the meaning of the term
12
Annotations are the connections between genomic information and the GO.Experiments provide the data that enables us to annotate gene products with terms from the ontologies.
Annotations for App: amyloid beta (A4) precursor protein
Annotations are assertions
13
NO Direct ExperimentInferred from evidence
Direct Experiment in organism
We use evidence codes to describe the basis of the annotation
IDA: Inferred from direct assay IPI: Inferred from physical interaction IMP: Inferred from mutant phenotype IGI: Inferred from genetic interaction IEP: Inferred from expression pattern IEA: Inferred from electronic annotation ISS: Inferred from sequence or structural
similarity TAS: Traceable author statement NAS: Non-traceable author statement IC: Inferred by curator RCA: Reviewed Computational Analysis ND: no data available
14
GO Annotation Stats:
I
GO Annotations
Total manual GO annotations - 388,633
Total proteins with manual annotations – 80,402
Contributing Groups (including MGI): - 19
Total Pub Med References – 346,002
Total number predicted annotations – 17,029,553
Total number taxa – 129,318
Total number distinct proteins – 2,971,374
April 24, 2007
15Now we can query across all annotations based on shared biological activity.
Annotations of gene products to GO are genome specific
17
GO enables genomic data analysis
Microarrays allow biologists to record changes in gene function across entire genomes
Result: Vast amounts of gene expression data desperately needing cataloging and tagging
Many data analysis tools use GO graph structure to statistically evaluate clusters of co-expressed genes based on shared functional annotations 680 pub (of 1517) on GO list 46 microarray tools contributed
19
GO is wildly successful
FIGURE 3. Representative cell-type-specific genes and corresponding molecular functions.
Nature: January 2007
20
Comprehensively annotate Reference Genomes
Saccharomyces cerevisiae Schizosaccharomyces
pombe Arabidopsis thaliana
Human Mouse Fly Rat Chicken Zebrafish Worm Dicty E.coli
21
Priority genes: those implicated in human diseases
Determine orthologs/homologs in reference genomes
For these genes, comprehensively curate biomedical literature
Reference Genome Annotation Project
Mary Dolan
22
Shared annotation focus = Coordinated attention to ontology structure
Orthology/homology set across primary model organisms
Reference ID mappings including associations of sequences, gene/proteins, and human diseases
Ultimately, transparent access to comprehensive information about genes among the primary data providers
Reference Genome Development Projects
23
1. Verifying and maintaining domain representations in the ontology that reflect best knowledge of the real world.
- Depends on the involvement of biologists (domain experts)- Difficult to automate- Must accommodate continuing changes in what we think we
understand about biological systems
2. Providing comprehensive annotations, where experimental evidence is available, for all genes
- Dependant on the quality of annotations from experimental literature
- Combines manual curation by highly-trained scientists supplemented by computational inference prediction annotations
- Comprehensiveness may depend on changes in biomedical publishing
Ongoing Challenges for the GO Consortium
24
acknowledgements
MGI
Carol BultJanan EppigJim KadinJoel RichardsonMartin Ringwald
Lois Maltais TBK Reddy
Monica McAndrews-Hill
Nancy Butler
GO
Michael Ashburner (Cambridge) J. Michael Cherry (Stanford) Suzanna Lewis (LBNL)
Rex Chisholm (NWU) David Hill (Jackson Lab) Midori Harris (EBI) Chris Mungall (LBNL) Jane Lomax (EBI) Eurie Hong (Stanford) Jen Clark (EBI)
GO @ MGI
Alex Diehl Mary Dolan Harold Drabkin David Hill
Li Ni Dmitry Sitnikov
25
Gene Ontology www.geneontology.org
MGI projects are supported by NIH [NHGRI, NICH, and NCI].
Bar Harbor, Maine, USA
Mouse Genome Informaticswww.informatics.jax.org
GO Consortium is supported by NIH-NHGRI and by the European Union RTD Programme
PRO is supported by NIGMS
Corpora is supported by NLM
top related