abstract the cell ontology (cl) is a candidate obo foundry 1 ontology for the representation of in...

1
Abstract The Cell Ontology (CL) is a candidate OBO Foundry 1 ontology for the representation of in vivo cell types. As part of our work in redeveloping the CL, we detail here how cross-product terms were generated for over 340 hematopoietic CL terms using as a basis their recently revised free text definitions 2 . The cross-product terms used classes from other OBO Foundry ontologies to describe the protein complexes found on the surface of a cell type, the biological processes executed by a cell type, and the phenotypic characteristics associated with a cell type. As biological ontologies such as the CL grow in complexity, computable logical definitions are needed to ensure terms are mutually independent from each other. We employed the rule-based reasoner 3 in OBO-edit (reasoning time = 15 seconds) to use the logic inherent in cross-product terms to find cycle over errors and violations of disjoint from statements. Findings were confirmed with the HermiT reasoner 4 in Protégé-4 (reasoning time = 8 seconds). Both reasoners were able to infer new relationships within the CL (Tr1 cell [CL:0000901] is_a regulatory T cell [CL:0000815]), Conclusions We have generated over 340 logical definitions for hematopoietic cell types in the CL. Logically defining these terms increases the accuracy of ontology design by curators and infers new relationships between cell type terms. Biological processes, protein expression, anatomical location, and phenotypic qualities are all integrated into cell type definitions. This work provides a model to the redevelopment of the whole of CL . As we redevelop more branches of the CL using logical definitions, we believe inferred relationships between branches of the CL will lead to unexpected associations between disparate areas of biology. Terrence F. Meehan 1 , Anna Maria Masci 2 , Lindsay G. Cowell 2 , Judith A. Blake 1 , Christopher J. Mungall 3 , Alexander D. Diehl 1 1 Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME, USA, 2 Dept. of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, NC, USA 3 Lawrence Berkeley National Laboratory, Berkeley, CA, USA References 1 Smith B et al. (2007) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol. 11, 1251-5. 2 Diehl AD et al. (2010) Hematopoietic cell types: Prototype for a revised cell ontology. J Biomed Inform. In press, available online. 3 The Gene Ontology Consortium (2010) The Gene Ontology in 2010: extensions and refinements, Nucleic Acids Research 38, D331-335 4 Motik, B et al. (2009) Hypertableau reasoning for description logics. J. Artif. Int. Res. 36, 165-228 Figure 1 image obtained from Uppsala University,http://tiny.cc/tswkr Figure 2. Ovals indicate the OBO Foundry ontologies whose terms are used in cross product formation with the CL. Arrows indicate the relationships used with a given ontology. For further information Please contact [email protected] or [email protected]. The most recent version of the cell ontlogy file “cell.edit.obo” can be obtained at : http://obo.cvs.sourceforge.net/viewvc/obo/ obo/ontology/anatomy/cell_type Logically defining hematopoietic cell types in the Cell Ontology by cross- product extensions Logical definitions link CL to multiple ontologies Numerous terms are used in a logical definition Figure 3. The “hematopoietic stem cell” (HSC) is defined as a hematopoietic cell that does not express 12 lineage-specific protein markers. Two species-specific HSC sub-types are further defined by the presence or absence of additional protein markers. Blue ovals = CL terms, red ovals= PRO terms. Figure 4. Biological entities associated with bone taken from 4 OBO Foundry ontologies are depicted in a screen shot from OBO-Edit. Inferred relationships are depicted as dashed arrows. I= is_a, C= capable_of, P= part_of, Q= has_quality. Uberon (grey), GO-biological process (green), PATO (yellow) and CL (blue). New relationships automatically inferred Wrong! Figure 5. Gamma-delta T cell type is inferred to by a sub-type of alpha-beta T cell, a violation of the disjointness relationship asserted between the two cell types (not shown). The inferred is_a relationship results from the too general logical definition for alpha-beta T cell. Mistakes in manual curation are discovered Generate hypotheses Figure 6. Mature NK T cell is inferred to be a type of mucosal invariant T cell. From this inference, we hypothesized that the two cell types may have similar biological functions. Subsequent literature review found this hypothesis is currently being tested by a small group of researchers. C = capable_of. Computable logical definitions, also known as cross-products, are generated by the use of genus- differentia criteria where a term is defined by how its is_a parent term (the genus) relates to terms in orthogonal ontologies (the differentia). A term can have more than one logical definition, and inherits logical definitions from its is_a parent. granulocyte (CL:0000094): myeloid leukocyte has_part stored secretory granule (GO:0042106) dashed arrows = inferred relationships ??? Figure 1. A human granulocyte. What are the advantages of logically defining terms in the Cell Ontology (CL)? What is a logical definition? Great for flow cytometry

Post on 20-Dec-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Abstract The Cell Ontology (CL) is a candidate OBO Foundry 1 ontology for the representation of in vivo cell types. As part of our work in redeveloping

AbstractThe Cell Ontology (CL) is a candidate

OBO Foundry1 ontology for the representation of in vivo cell types. As part of our work in redeveloping the CL, we detail here how cross-product terms were generated for over 340 hematopoietic CL terms using as a basis their recently revised free text definitions2. The cross-product terms used classes from other OBO Foundry ontologies to describe the protein complexes found on the surface of a cell type, the biological processes executed by a cell type, and the phenotypic characteristics associated with a cell type. As biological ontologies such as the CL grow in complexity, computable logical definitions are needed to ensure terms are mutually independent from each other. We employed the rule-based reasoner3 in OBO-edit (reasoning time = 15 seconds) to use the logic inherent in cross-product terms to find cycle over errors and violations of disjoint from statements. Findings were confirmed with the HermiT reasoner4 in Protégé-4 (reasoning time = 8 seconds). Both reasoners were able to infer new relationships within the CL (Tr1 cell [CL:0000901] is_a regulatory T cell [CL:0000815]), and between the CL and the contributing ontologies (multinuclear osteoclast [CL:0000779] part_of bone [UBERON:0001474]). This work demonstrates the utility of cross-product terms in development of the CL and its interoperability with other biomedical ontologies.

ConclusionsWe have generated over 340 logical

definitions for hematopoietic cell types in the CL. Logically defining these terms increases the accuracy of ontology design by curators and infers new relationships between cell type terms. Biological processes, protein expression, anatomical location, and phenotypic qualities are all integrated into cell type definitions. This work provides a model to the redevelopment of the whole of CL . As we redevelop more branches of the CL using logical definitions, we believe inferred relationships between branches of the CL will lead to unexpected associations between disparate areas of biology.

Terrence F. Meehan1, Anna Maria Masci2, Lindsay G. Cowell2, Judith A. Blake1, Christopher J. Mungall3, Alexander D. Diehl1 1Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME, USA, 2Dept. of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, NC, USA

3Lawrence Berkeley National Laboratory, Berkeley, CA, USA

References1Smith B et al. (2007) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol. 11, 1251-5.2Diehl AD et al. (2010) Hematopoietic cell types: Prototype for a revised cell ontology. J Biomed Inform. In press, available online.3The Gene Ontology Consortium (2010) The Gene Ontology in 2010: extensions and refinements, Nucleic Acids Research 38, D331-3354Motik, B et al. (2009) Hypertableau reasoning for description logics. J. Artif. Int. Res. 36, 165-228

Figure 1 image obtained from Uppsala University,http://tiny.cc/tswkr

Figure 2. Ovals indicate the OBO Foundry ontologies whose terms are used in cross product formation with the CL. Arrows indicate the relationships used with a given ontology.

For further informationPlease contact [email protected] or [email protected]. The most recent version of the cell ontlogy file “cell.edit.obo” can be obtained at :http://obo.cvs.sourceforge.net/viewvc/obo/obo/ontology/anatomy/cell_type

Logically defining hematopoietic cell types in the Cell Ontology by cross-product extensions

Logical definitions link CL to multiple ontologies

Numerous terms are used in alogical definition

Figure 3. The “hematopoietic stem cell” (HSC) is defined as a hematopoietic cell that does not express 12 lineage-specific protein markers. Two species-specific HSC sub-types are further defined by the presence or absence of additional protein markers. Blue ovals = CL terms, red ovals= PRO terms.

Figure 4. Biological entities associated with bone taken from 4 OBO Foundry ontologies are depicted in a screen shot from OBO-Edit. Inferred relationships are depicted as dashed arrows. I= is_a, C= capable_of, P= part_of, Q= has_quality. Uberon (grey), GO-biological process (green), PATO (yellow) and CL (blue).

New relationships automatically inferred

Wrong!

Figure 5. Gamma-delta T cell type is inferred to by a sub-type of alpha-beta T cell, a violation of the disjointness relationship asserted between the two cell types (not shown). The inferred is_a relationship results from the too general logical definition for alpha-beta T cell.

Mistakes in manual curation are discovered

Generate hypotheses

Figure 6. Mature NK T cell is inferred to be a type of mucosal invariant T cell. From this inference, we hypothesized that the two cell types may have similar biological functions. Subsequent literature review found this hypothesis is currently being tested by a small group of researchers. C = capable_of.

Computable logical definitions, also known as cross-products, are generated by the use of genus-differentia criteria where a term is defined by how its is_a parent term (the genus) relates to terms in orthogonal ontologies (the differentia). A term can have more than one logical definition, and inherits logical definitions from its is_a parent.

granulocyte (CL:0000094):myeloid leukocyte has_part

stored secretory granule (GO:0042106)

dashed arrows =inferred

relationships

???

Figure 1. A human granulocyte.

What are the advantages of logically defining terms in the Cell Ontology (CL)?

What is a logical definition?

Great forflow cytometry