weaving and untangling the go

104
Weaving and untangling the GO is_a completeness ~9 slides • granularity & BP ~3 slides • Linking MF to BP ~15 slides • Sensu ~13 slides – linguistic qualifiers vs relations • Linking GO to other ontologies ~40 slides – GO+Cell

Upload: midori

Post on 14-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Weaving and untangling the GO. is_a completeness ~9 slides granularity & BP ~3 slides Linking MF to BP ~15 slides Sensu ~13 slides linguistic qualifiers vs relations Linking GO to other ontologies ~40 slides GO+Cell. Tangled DAGs and complexity. paths increasing - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Weaving and untangling the GO

Weaving and untangling the GO

• is_a completeness ~9 slides

• granularity & BP ~3 slides

• Linking MF to BP ~15 slides

• Sensu ~13 slides

– linguistic qualifiers vs relations

• Linking GO to other ontologies ~40

slides

– GO+Cell

Page 2: Weaving and untangling the GO

Tangled DAGs and complexity

• paths increasing• GO process in

general has a multiple axes of classification– qualifier -ve +ve

– anatomy• structural• spatial

– chemical• structural• functional

Page 3: Weaving and untangling the GO

is_a completene

ss

Page 4: Weaving and untangling the GO

GO and is_a completeness

• Why?• What’s wrong with every term

having at least one is_a or part_of parent?– this is the way we’ve always done

things

Page 5: Weaving and untangling the GO

Ontologies should be complete

• No errors of omission• is_a completeness is the ontologically

correct thing to do– every entity type is a subtype of some other

thing

• Accurate ontologies = accurate queries– currently a query for “find all kinds of

development” does not return “ovarian follicle development”

• this is wrong

Page 6: Weaving and untangling the GO

missing is_as hinders common tool use

• We should play nicely with the others in the playground

• Most (non-GOC) tools expect is_a completeness– GO looks funny when viewed in other

tools• the standard is to show only is_a relations

in default tree view

– missing is_as breaks reasoners

Page 7: Weaving and untangling the GO

Filling is_a gaps brings practical benefits

• Easier for tools to find inconsistencies in GO

• We can start to untangle displays

Page 8: Weaving and untangling the GO

Example: current displays mix relations

• it’s a mess

Page 9: Weaving and untangling the GO
Page 10: Weaving and untangling the GO

untangling is_a and part_of

• difficult if is_a hierarchy is incomplete– is_a orphans show up at root node in

pure is_a display

• not everything must have an asserted part_of parent– can infer from is_a parents

Page 11: Weaving and untangling the GO
Page 12: Weaving and untangling the GO

The new complete cellular component

• Current CC:– 277 is_a orphans / 1688 terms– avg is-a-paths-to-root 1.4– avg mixed-paths-to-root 6.97

• Jane’s fixed CC:– 0 is_a orphans– avg is-a-paths-to-root 3.36– avg mixed-paths-to-root 38.6

Page 13: Weaving and untangling the GO

Granularity and the

organisation

of GO:BP

Page 14: Weaving and untangling the GO

Fixing the upper levels of BP

• The upper portion of any ontology is very important for organisation

• Design decisions percolate down• Many users exploring GO top-down

see this first• Diamonds are particularly bad in

the upper level– significantly increases tangledness

Page 15: Weaving and untangling the GO

biologicalprocess

cellularprocess

physiologicalprocess

organismalphysiological

process

cellularphysiological

process

others

Page 16: Weaving and untangling the GO

The processes pertinent to the function of an organism above the cellular level; includes the integrated processes of

tissues and organs

The processes

pertinent to the

integrated function of a

cell

A phenomenon marked by changes that lead to a particular result, mediated by one or more gene

products

Processes that are carried out at the cellular level, but are

not necessarily restricted to a single cell. For example, cell

communication occurs among more than one cell, but occurs

at the cellular level

Those processes specifically pertinent to the functioning of integrated living units: cells,

tissues, organs, and organisms

biologicalprocess

cellularprocess

physiologicalprocess

organismalphysiological

process

cellularphysiological

process

Page 17: Weaving and untangling the GO

Consider… (long term view)

• Making top division by granularity of the process itself– biological process

• molecular level process?• cellular level process• (multi-cellular) level process

• These types are disjoint• But what about physiological process?

– this is not disjoint from the granularity of the process itself

Page 18: Weaving and untangling the GO

Relations between GO ontologies

Page 19: Weaving and untangling the GO

Outline

• We focus on MF & BP• biological example from David• the types and relations in reality

– maintaining the ALL-SOME definition of relations

• how should this be implemented in the GO?– what links should be manifested– retain some level of redundancy, or eliminate it?

Page 20: Weaving and untangling the GO

GO:0006548Histidine catabolism

GO:0004397Histidine ammonia

lyase activityGO:0016153

Urocanate hydratase activity

GO:0050480imidazolopropionase

activityGO:0030409Glutamate- Formimidoyl transferase

GO:0050415Formimidoyl-Glutamase

activity

GO:0050129N-formylglutamate

deformylaseactivity

GO:0050416Formimidoylglutamate

deiminaseactivity

GO:0019557Histidine catabolism

to glutamate and formate

GO:0019556Histidine catabolism

to glutamate and formamide

GO:????????Histidine catabolism

to glutamate and formiminotetrahydrofolate

Overbeek, et al. The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes. NAR 2005, 33-17:5691-5702

Page 21: Weaving and untangling the GO

Ontological Representation

• I will try and be clear when I am talking about– types in reality– types we wish to manifest as terms in

the GO (or in other ontologies)• all GO terms should be types• not all types need to have terms created

- we limit for practical reasons

Page 22: Weaving and untangling the GO

What are the relations in reality?

• Between types in the same ontology, different levels of granularity– part_of

• Between functions and processes (at the same level of granularity)– functioning_of

• Between component and function– has_function

• Between process and component– located_in

Page 23: Weaving and untangling the GO

What are the instances and relations in reality?

some molecular function instance

some molecular

functionING instance

some multistep process instance

functioningof

part_of

some gene product instance

hasfunction

function process

Page 24: Weaving and untangling the GO

What are the types and type-level relations in

reality?

some type of molecular function

some type of molecular

functionING

some type of multistep process

functioningof

part(direction?)

some type of gene product

hasfunction

function process

Page 25: Weaving and untangling the GO

types example

histidine ammonia

lyase function

histidine ammonia

lyase reaction

histidine catabolism

functioningof

part?

issues: -- ALL-SOME structure

function process

coarse

fine

Page 26: Weaving and untangling the GO

What are the types and relations in reality?

Formimidoylglutmate

deiminase function

Formimidoylglutmatedeiminase reaction

histidine catabolism to glutamate and

formate

functioningof

issues: -- ALL-SOME structure

function process

haspart?

coarse

fine

Page 27: Weaving and untangling the GO

We want to capture these real relationships between

biological types• Between granular levels• Between orthogonal ontologies

• But first we must be clear on the definitions of these types, and which types should be manifested as GO terms

Page 28: Weaving and untangling the GO

Can we just manifest this in the GO?

some type of molecular function

some type of molecular

functionING

some type of multistep process

functioningof

haspart(?)

issues: -- not all function terms have a functionING corresponding term -- even if they do, redundancy is generally to be avoided

coarse

fine

function process

Page 29: Weaving and untangling the GO

We already have some redundancy

• function & process redundancy• iron transport (BP)• iron transporter (MF)

• function & component redundancy• voltage-gated ion channel function• voltage-gated ion channel complex

• If we retain this redundancy, these relations can be trivially added

• But we don’t always have this redundancy– not all functions have a corresponding

functioning term

Page 30: Weaving and untangling the GO

Manifest shortcut relationships

some type of molecular function

some type of molecular

functionING

some type of process

functioningof

haspart(?)

coarse

fine

function process

• one relation standing for two

Page 31: Weaving and untangling the GO

most functionings are implicit

histidine ammonia

lysase function

histidine ammonia

lyase REACTION

histidine catabolism

functioningof

haspart(?)

coarse

fine

function process

• current paradigm

Page 32: Weaving and untangling the GO

When do we manifest functions and processes?

• Need consistent stable policy• Nothing in function ontology should have

activity suffix– even though to a biochemist activity==potential,

this is still confusing

• Beyond this, do we retain current policy– some redundancy

• Or take a more extreme approach– eliminate redundancy– eliminate current ‘activity’ MF terms and manifest

corresponding reaction terms in BP (Amelia)

Page 33: Weaving and untangling the GO

‘purist process’ approach

histidine ammonia

lysase reaction

histidine ammonia

lyase function

histidine catabolism

functioningof

function process

some type of gene product

hasfunction part

Page 34: Weaving and untangling the GO

When is it safe to eliminate redundancy?

• Does functioning always imply function?– iron transport does not imply iron transporter– but we could still extend annotation to allow for

specification of functioning-as-function

• Reactions and other ‘single-step’ processes involving no helper– function and corresponding functioning imply one

another

• Redundancy between function and component should be retained

• Any obsoletion obviously causes disruption

Page 35: Weaving and untangling the GO

Difficult functionings

• Structural constituents• functioning happens at lower level

of granularity than is covered by GO

• these will not be linked to process - for now

Page 36: Weaving and untangling the GO

Implementation

• Still need to curate the actual links– trivial links can be computed automatically

• Can proceed independently of resolving ontological issues– most likely retain current policy re:

manifesting terms– need maintain 3 kinds of links

• granular (part, same ontology)• functioning_of (function and functioning)• ‘diagonal’

– ALL-SOME definition

Page 37: Weaving and untangling the GO

Sensu

Page 38: Weaving and untangling the GO

Sensu - outline

• Original use– A linguistic qualifier– denote differing community usage of a

terminological entity (a term)

• Perverted use– A type qualifier– Used for when the part_of structure is

specific to an organism type

• The fix– provide separate mechanisms for each

Page 39: Weaving and untangling the GO

Terms vs kinds

• The term ‘term’ is confusing– Term (sensu GO)– Term (sensu normal usage)

• strings, tokens

• GO is not a terminology• A GO ID identifies a type of entity

– a kind of entity– a universal (as opposed to instance)– more specific than a class– but not a concept

Page 40: Weaving and untangling the GO

Sensu - original usage

• Sometimes the same string refers to different types– nucleus (sensu particle physicist)– nucleus (sensu astrophysicist)– nucleus (sensu biologist)

• Canonical GO example:– bud

• no longer relevant, terms obsoleted

– trichome

Page 41: Weaving and untangling the GO

Linguistic qualifiers are about language, not

biological reality• No ontological requirement for

linguistically related terms to be ontologically related– current GO docs are not correct

• trichome, sensu plant community– should not state that there is some

biological relation between an instance of a trichome and the plant community

Page 42: Weaving and untangling the GO

The original usage has been conflated

• Organism type specificity is a genuine challenge for the GO– ‘contextual’ part_ofs– e.g. X part_of Y in species Z

• Sensu has been wrongly recruited to fix this– standard pattern:

• X, sensu Z part_of Y• X, sensu Z is_a Z

• Two problems– conflation of meaning of sensu– conflation results in lack of precision

• “as in, but not restricted to taxon” not rigorous enough

Page 43: Weaving and untangling the GO

Two problems, two solutions

• Retain sensu as a linguistic qualifier only– re-interpret as: sensu S community– no requirement for taxon IDs– no ontology structure requirements

• Introduce a new relation for genuine organism-type specific terms– in_organism – standard inference rules can be used

• e.g. – X in_organism X’, Y in_organism Y’, X is_a Y <=> X’

is_a Y’

Page 44: Weaving and untangling the GO

Contextual synonyms[Term]name: trichome (sensu insecta)synonym: EXACT “hair” [] synonym: EXACT “trichome” [] {context=insecta}def: “a polarized cellular extension that covers much of the insect

epidermis”

[Term]name: trichome (sensu plant)synonym: EXACT “trichome” [] {context=plant}def: “An outgrowth from the epidermis. Trichomes vary in size and complexity and include hairs, scales, and other structures and may be glandular. In Arabidopsis, patterning of trichome development is not random but does not appear to be lineage-based like

stomata”

Page 45: Weaving and untangling the GO

Advantages

• Lexical qualifiers dealt with use lexical oboedit tags

• No need to be as specific as a taxon– only as specific as is needed to decontextualise

• No false reasoning is done over synonyms– cellular component types and cell types should

not be siblings

• Big user-friendliness win?– Displays customised for particular users may

choose to display contextual exact synonyms in place of the wordier sensu name

Page 46: Weaving and untangling the GO

in_organism

• Standard ALL-SOME definition:• Type level definition:

– P in_organism O• for all instances p of P, there exists some

organism o of type O, and some time t, such that p in_organism o at time t

• More specific relation than located_in in OBO relations ontology

• Standard logical rules can be applied

Page 47: Weaving and untangling the GO

photosystem I

photosystem I,in cyanobacteria

is_a

cyanobacteria

inorganism

thylakoid

thylakoid,in cyanobacteria

is_a

inorganism

partof

Page 48: Weaving and untangling the GO

Open question

• Sometimes the relation between two types is largely lexical– eg trichome

• Sometimes it isn’t so clear• Can we have both a relation to a taxon,

and a contextual synonyms• Is ‘eye’ an exact contextual synonym

for ‘compound eye’ for the arthropod community?

Page 49: Weaving and untangling the GO

Practical considerations

• Use NCBI Taxonomy as our organism ontology

• xref or relationship tags?– xrefs are more lightweight– relationship tags are more accurate– relationship tags would be ‘dangling’ unless

organism ontology is loaded

• See next section…

Page 50: Weaving and untangling the GO

Composite terms in GO

- finally…

Page 51: Weaving and untangling the GO

Composite terms - outline

• The problems inherent in composite terms and diamonds - brief review

• Actively managing composite terms in GO– big change: parseable logical definitions

• Implementation plan• Progress so far: logical definitions referring

to cell types• Pre vs post composition

– composite terms in ontologies and annotations

Page 52: Weaving and untangling the GO
Page 53: Weaving and untangling the GO

biosynthesisis_ametabolism

Page 54: Weaving and untangling the GO

cysteineis_aserine family amino acidis_aamino acidis_aamine

Page 55: Weaving and untangling the GO

cysteineis_aserine family amino acidis_aamino acidis_aserine

Page 56: Weaving and untangling the GO

Composed terms currently cause problems

– No link to external ontology term– Redundancy– Inconsistency– Extra work– Annotation bottleneck– Tangled DAGs and confusing displays

• we have no way to disentangle

• Solution so far:– fix errors based on results of term name

parsing (Obol)• reactive, not proactive

Page 57: Weaving and untangling the GO

Solution: actively manage composed terms

• Composed terms should now/soon be generated using oboedit plugin– building block terms are recorded in

ontology along with composite term

• Correct DAG structure can be inferred from external ontologies– placement & consistency checking

automated– additional work can be automated

• synonyms, text definitions

Page 58: Weaving and untangling the GO

How will composite terms be recorded by oboedit?

• How do we record a definition for a composite term?– using a logical definition (computational essence)

• A logical definition consists of:– a generic term (aka genus)– relationships to other terms which serve to

discriminate this specific term from other is_a children of the generic term (aka differentiae)

• Can be written in natural language as:– A <generic term> which <discriminating

characteristics>

Page 59: Weaving and untangling the GO

Example of composite term record

• cysteine biosynthesis– generic term:

• biosynthesis

– discriminating characteristics:• outputs cysteine

– a biosynthesis process which outputs cysteine

id: GO:0019344 ! cysteine biosynthesisintersection_of: GO:0009058 ! biosynthesisintersection_of: outputs CHEBI:15356 ! cysteine

Page 60: Weaving and untangling the GO

Now we have the ability to untangle

• Process axis view (primary is_as, via generic term):– biological_process

• metabolism– biosynthesis

» cysteine biosynthesis

• Process participant axis view:– amine

• amino acid– serine family amino acid

» cysteine

• Combined view– (same as current tangled diamond lattice)

Page 61: Weaving and untangling the GO

Recording the relationship is important

• Why not just a simple cross-product?– e.g. biosynthesis x cysteine

• Relationships are important for reasoning and querying– Consider:

• cysteine biosynthesis from serine• mRNA export from nucleus during heat stress

• Without the relations, the logical definition is not specific enough– the essence is not captured

Page 62: Weaving and untangling the GO

Multiple discriminating characteristics are allowed• Cysteine biosynthesis from serine– Generic term:

• biosynthesis

– Discriminating characteristics:• output cysteine• input serine

intersection_of: GO:0009058intersection_of: outputs CHEBI:15356intersection_of: input CHEBI:17822

Page 63: Weaving and untangling the GO

Composite terms can be nested

• regulation of cysteine biosynthesis

intersection_of: GO:0050789 ! regulation of biological processintersection_of: regulates GO:0019344 ! cysteine biosynthesis

id: GO:0019344 ! cysteine biosynthesisintersection_of: GO:0009058intersection_of: outputs CHEBI:15356

Page 64: Weaving and untangling the GO

Composite terms can optionally be

manufactured in bulk• Generic term:

{metabolism,biosynthesis}• Differentia: has_output {serine,

cysteine, …}• With caution…

– Sparse vs dense matrices– not all combinations are types

Page 65: Weaving and untangling the GO

On the importance of necessary and sufficient

conditions• Why intersection_of?• Why not just make normal links in

the GO DAG?– normal relationships are for

necessary conditions only– we want both necessary and

sufficient conditions • captures the essence of the term

Page 66: Weaving and untangling the GO

Normal DAG links only capture necessary

conditions, not essence

immune cellactivation

inflammatoryresponse

part_ofA change in morphology and behavior of a macrophage resulting from exposure to a cytokine, chemokine, cellular ligand, pathogen, or soluble factor

text def:macrophage

activation

Page 67: Weaving and untangling the GO

Normal DAG links only capture necessary

conditions, not essence

macrophageactivation

immune cellactivation

is_ainflammatory

response

part_of

macrophage

activates

Page 68: Weaving and untangling the GO

essence captured by genus-differentia

macrophageactivation

immune cellactivation

is_ainflammatory

response

part_of

id: GO:macrophage_activationintersection_of: GO:cell_activationintersection_of: activates CL:macrophage

Page 69: Weaving and untangling the GO

essence captured by genus-differentia

macrophageactivation

immune cellactivation

is_ainflammatory

response

part_of

id: GO:macrophage_activationintersection_of: GO:cell_activationintersection_of: activates CL:macrophage

A change in morphology and behavior of a macrophage resulting from exposure to a cytokine, chemokine, cellular ligand, pathogen, or soluble factor

text def:

Page 70: Weaving and untangling the GO

essence captured by genus-differentia

macrophageactivation

immune cellactivation

is_ainflammatory

response

part_of

cellactivation

macrophage

(genus)

activates

Page 71: Weaving and untangling the GO

The power of reason

• with genus-differentia definitions that are computationally parseable, we can do a lot more consistency checking

Page 72: Weaving and untangling the GO

Pre- vs post- composition

• It makes sense to pre-compose terms and maintain them as part of GO

• Annotations can post-compose terms if they choose to do so– MGI, DictyBase are doing this already

• results remain local to MOD

– AmiGO-NG will allow querying of these

• The two approaches are complementary and compatible– proviso: if done properly

Page 73: Weaving and untangling the GO

SO already contains composite terms

• A silenced gene is a gene which has the quality of being silenced

Page 74: Weaving and untangling the GO

Plan: outline

• We want all new composite terms to be created using appropriate oboedit plugin– logical definitions automatically recorded– term management automated

• Changes:– editors must now be ‘OBO-aware’– annotators and end-users can remain unaware

of changes if they choose to do so• but using the logical defs can bring benefits

• But first we need to find logical definitions for all the existing composite terms

Page 75: Weaving and untangling the GO

Where we were at, 2005

• Lots of terms to be retrofitted– Where to start?

• Previous strategy:– Obol guesses logical def for each term– Obol uses logical def to reason

• errors of omission• inconsistencies

– Batch reports to curators

Page 76: Weaving and untangling the GO

go.obo oboedit

obolreport

cell.obocell.obocell.obo

cjm

GOeditorOBO

editor

obolconfig

nameparser

go+ldefs

reasoner

go‘fixed’

obol

Page 77: Weaving and untangling the GO

go.obo oboedit

obol

obolreport

cell.obocell.obocell.obo

cjm

GOeditorOBO

editor

obolconfig

nameparser

Ego.obo

reasoner

go‘fixed’

Obol produces genus-differentia logical definitions

Page 78: Weaving and untangling the GO

Limitations of this approach

• Good as proof-of-principle• But..

– only the end results are evaluated– Obol makes the identical mistakes in

guessing logical definitions each iteration

– we want to evaluate and preserve the logical definitions that are generated by Obol

Page 79: Weaving and untangling the GO

What we’ve been doing since then

• Focused on OBO Cell ontology• Used Obol to infer logical defs• Manually curate logical defs• Feed back results to improve Obol• Iterate and refine• Use oboedit reasoner to check

consistency between GO & CellO• Next: incorporate into curation process

Page 80: Weaving and untangling the GO

go.obo oboedit

obol

cell.obocell.obocell.obo

cjm

GOeditorOBO

editor

obolconfig

nameparser ego-cell

.obo

Page 81: Weaving and untangling the GO

Results so far

• Test set of 337 logical definitions curated– only a fraction of the composite terms

in GO

• Relations not finalised• Composite terms involving CellO

present some interesting challenges• …but first, here’s a demo

Page 82: Weaving and untangling the GO

Open issues: what relations do we use?

• We are concerned for now with relations between processes and cells

– neuroblast activation & neuroblast– T cell differentiation & T cell– T cell homeostasis & T cell– cell homeostasis & homeostasis– sperm incapacitation & sperm– sperm motility & sperm

Page 83: Weaving and untangling the GO

OBO Relations ontology

• OBO Relations ontology has– has_participant

• sub-relations:– has_agent (active participant)– has_patient (inactive participant)

» (not in obo-rel yet)

– between a process and a continuant– follows standard ALL-SOME structure

Page 84: Weaving and untangling the GO

has_participant

• P has_participant C if and only if: given any process p that instantiates P there is some continuant c, and some time t, such that: c instantiates C at t and c participates in p at t

• has_participant is a primitive instance-level relation between a process, a continuant, and a time at which the continuant participates in some way in the process. The relation obtains, for example, when this particular process of oxygen exchange across this particular alveolar membrane has_participant this particular sample of hemoglobin at this particular time

Page 85: Weaving and untangling the GO

Is this the appropriate relation?

neuroblast activation has_participant neuroblastT cell differentiation has_participant T cellT cell homeostasis has_participant T cellcell homeostasis has_participant homeostasissperm incapacitation has_participant spermsperm motility has_participant sperm

these are all correct……but are they too general?

Page 86: Weaving and untangling the GO

more specific kinds of participation

• has_agent (has_active_participant)– As for has_participant, but with the

additional condition that the component instance is causally active in the relevant process

• has_patient (has_inactive_participant)– Yes, this is a daft name– The component instance is acted upon

• (not yet in OBO REL)

Page 87: Weaving and untangling the GO

Cell differentiation

• T cell differentiation– A cell differentiation instance in which

a cell acquires_features_of T cell

• problem:– not a simple relation between the

process (T cell differentiation) and the cell (T cell)• 3-place relation: process, instance, type

Page 88: Weaving and untangling the GO

Cell differentiation, attempt 2

• T cell differentiation has_output T cell– Compare to:

• cysteine biosynthesis has_output cysteine

• We should distinguish between participation relations in which the continuant relations are – transformation_of– derives_from

• e.g. something made (biosynthesis) vs something transformed (differentiation)

Page 89: Weaving and untangling the GO

Cell differentiation, attempt 3

• T cell differentiation has_transformed_output_participant T cell– …not exactly catchy…

Page 90: Weaving and untangling the GO

has_primary_participant

• T cell differentiation has_primary_participant T cell– aka has_theme

• ontologically a good relation?• Meaning partly resides in the

process term• Can be migrated to other relations

later

Page 91: Weaving and untangling the GO

To decompose or not to decompose

• We could have a logical definition for sperm incapacitation– genus: incapacitation– differentia: has_participant sperm

• Requires creating a new term– incapacitation

• Not used in any other logical def• Logical def does not capture full essence

– this term is a little more complex• involves at least three continuants

• Instead just use a relationship to capture necessary conditions only

Page 92: Weaving and untangling the GO

‘Anonymous’ terms

• border follicle cell delamination– The splitting off of border cells from the

anterior epithelium• genus: delamination

– no such term• we can create as ‘anonymous’ term

– exists only in order to make logical definitions

• ..or we can just create a normal term

Page 93: Weaving and untangling the GO

Implementation

• We have 337 logical definitions (nearly) ready

• When can we merge them into the GO?

Page 94: Weaving and untangling the GO

adding logical defs to the GO

• Will this cause disruption to users?• gene_ontology.obo file exactly the same as

before, but will have– fewer inconsistencies!– new intersection_of tags

• specified in obo v1.2• can easily be ignored by parsers• oboedit users must either:

– load cell.obo, relationship.obo at same time as go.obo– OR select “allow dangling terms”

• may still confuse some users

– ‘anonymous’ terms

Page 95: Weaving and untangling the GO

cvs

cvs

gene_ontology_edit.obo oboedit

cell.obo

GOeditor

CellOeditor

cvs

rel.obo

gene_ontology.obo

filter

normal downstream stuff(website, amigo, users)unaffected

power users &advanced applications

Page 96: Weaving and untangling the GO

Applications may want to take advantage of

enhanced GO• enhanced GO isn’t just to help

curation• queries possible with ego:

– find genes associated with blood cells• annotations to microglial cell activation

– differentiation of any microglial precursor• annotations to monocyte differentiation

Page 97: Weaving and untangling the GO

Post-composition

• This approach is highly compatible with post-composition

• We should extend the annotation format to allow denoting more specific classes– e.g.

• cholesterol transport in liver

– advanced applications can query this– standard applications suffer no loss– extended annotations can be used to help seed new

terms in the ontology

• This is already being done (MGI,Dicty)– we just want to capture this in interopeable way

Page 98: Weaving and untangling the GO

Post-composition in gene association files

• New column in file format

Gene Product

Term ID … Slots

AABC1 GO:0030301(cholesterol transport)

OBOREL:located_in[MA:liver]

AABC2 GO:0048663(neuron fate development)

OBOREL:has_primary_participant[FBbt:Y_neuron]

AABC3 GO:000003

Page 99: Weaving and untangling the GO

Important note on post-composition

• This is not an either-or situation• We will retain pre-composed terms

– terms will continue to be created for real biological types

• Annotation post-composition can be used to further refine existing pre-composed terms– if the post-composed term is later created in the

GO, the annotation can be automatically migrated

• Tools can ignore post-composition for small loss in specificity– defaults to the current paradigm

Page 100: Weaving and untangling the GO

Avoiding diamonds

• Surely larval locomotory behavior involves a diamond?

• yes, but we can disentangle the two axes of classification

Page 101: Weaving and untangling the GO

id: GO:larval_locomotory_behaviorintersection_of: GO:locomotory_behavorintersection_of: occurs_in FBbt:larval_stage

Solution• Curator asserts:

• Oboedit infers diamond:

id: GO:larval_locomotory_behaviorintersection_of: GO:locomotory_behavorintersection_of: occurs_in FBbt:larval_stageis_a: GO:locomotory_behavor ! genusis_a: GO:larval_behavior ! inferred

Page 102: Weaving and untangling the GO

Next Steps

• Tidy up cell logical definitions• integrate them into curation

process• Look at composite terms within GO

– larval locomotory behaviour– regulation

• Chemicals• Anatomical entities

Page 103: Weaving and untangling the GO
Page 104: Weaving and untangling the GO