pato an ontology of phenotypic qualities george gkoutos university of cambridge
TRANSCRIPT
PATOAn Ontology of Phenotypic
Qualities
George GkoutosUniversity of Cambridge
Phenotype Information
Literature Qualitative descriptions
Experimental data Qualitative descriptions Quantitative descriptions
Various representation methodologies Complex phenotype data
Need for :“A platform for facilitating mutual understanding and
interoperability of phenotype information across species and domains of knowledge amongst people and machines” …..
Representation of Phenotypic data
Organism attributes T – Species
G – Genotype
I – Strain
S – Genotypic Sex
A – Alleles at named loci
E–Environmental/handling condition
D – Age/stage of development
Assay means of making observations
Phenotypic Character any feature of the organism thatis observed or 'assayed'.
Assay Controlled Vocabulary
• Abnormality• Relative_to• Ranges of values• Allows the schema to be dynamic
• Definition of qualities and their relations• Explicit differences (between laboratories)• Allows labs around the world to “plug-in” theirassays to the schema
Assay
Phenotypic Character
Phenotypic Character
Phenotypic Character
Phenotypic character representation methodologies
Pre-composition– Examples:
– MGI Mouse genotype-phenotype annotation (Mammalian Phenotype)– Gramene trait annotation (Plant trait ontology)– etc.
Pre-composition often follows the compositional structure occasionally adopted by GO terms.
Positive/negative regulation of mitosis positive/negative + regulation of mitosis (GO:0045839)
Increased/decreased angiogenesis increased/decreased + angiogenesis (GO:0001525)
Advantages Easy for annotation Control Complex phenotypic information
Disadvantages Lack of rigidity Ontology management Expansion Quantitative data
Methodologies (cont.) post-composition
The post-composition methodology takes advantage of the ability to describe phenotypes by describing the particular affected entity (bearer), which could be an anatomical structure, a biological process, a particular function etc. , and the qualities that this entity possess, which can be described either in qualitative or quantitative terms.
Advantages Ontology management Rigidity expansion Quantitative data Advanced queries
Disadvantages Complex phenotypic information More difficult for annotation Need for constraints for ensuring meaningful annotations
Phenotype And Trait Ontology (PATO)
• An ontology of phenotypic qualities, which can be shared across different species and domains of knowledge.
• Qualities are the basic entities that we can perceive and/or measure:– colors, sizes, masses, lengths etc.
• Qualities inhere to entities: every entity comes with certain qualities, which exist as long as the entity exist.
• Qualities belong in a finite set of quality types (i.e. color, size etc) and inhere in specific individuals. No two individuals can have the same quality, and each quality is specifically constantly dependent on the entity it inheres in.
Phenotypic Character
PATOSpecies Independent
Core Ontologies(e.g. anatomy, behaviour,
pathology)
EQ Phenotype Description
Entity (E) Quality (Q)
PATOSpecies Independent
EQ Phenotype Description
Simple phenotype descriptions
(mouse body weight) (mouse anatomy: body + PATO: weight)
(Drosophila anatomy: eye + PATO: colour)
(ChEBI: glucose + PATO: concentration)
(eye colour)(glucose concentration)
Phenotypic Character entity + quality
increased size hepatocellular carcinoma
hepatocellular carcinoma (MPATH:357) has_quality increased size (PATO:0000586)
Entity Quality
Evidence Qualifier
relationship
Units
EnvironmentGenetic
Phenotype annotation model
Source
Attribution
Who makes the assertion
Properties
When, what organization
Assertion
Assertion
E=eye disc(FBbt:00001768)
Q=condensed (PATO:0001485)
Source:PMID:8431945
M. Ashburner
influences
Date: 10/26/2007Organization: FlyBaseVersion: 1
eya1appears
Evidence:light microscopy
Annotation:Phenotypes in literature
Quantitative Data
• PATO – part of a representation of qualitative phenotypic information
• More often than not it is important to record quantitative information that results from a specific measurement of a quality
• Measurements involve units (Phenotypic Character + Unit)
The tail of my mouse is 2.1 cm
PATO & measurements
UO – an ontology of unit UO’s top-level division is between primary base units
of a particular measure and units that are derived from base units
mapping between the various scalar qualities (such as weight, height, concentration etc.) and the corresponding units used to measure those qualities
UO includes 264 terms, all of which are defined email list (
http://sourceforge.net/mailarchive/forum.php?forum_id=50613)
Mapping PATO to the UO
Linking quantitative data to qualitative descriptions
Measurement qualitative description Assay
range normality necessary & sufficient conditions
EQ descriptor high level annotation marking phenodeviance (e.g. MP)
Multiple phenotypic characters to describe complex phenotypes
SHH-/+ SHH-/-
shh-/+ shh-/-
Phenotype (character) = entity + quality
Phenotype (character) = entity + quality
P1 = eye + hypoteloric
Phenotype (character) = entity + quality
P1 = eye + hypoteloricP2 = midface + hypoplastic
Phenotype (character) = entity + quality
P1 = eye + hypoteloricP2 = midface + hypoplastic P3 = kidney + hypertrophied
Phenotype (character) = entity + quality
P1 = eye + hypoteloricP2 = midface + hypoplastic P3 = kidney + hypertrophied
PATO: hypoteloric
hypoplastic
hypertrophied
ZFIN: eye
midface
kidney
+
Phenotype (character) = entity + quality
P1 = eye + hypoteloricP2 = midface + hypoplastic P3 = kidney + hypertrophied
Phenotype = P1 + P2 + P3
(phenotypic profile) = holoprosencephaly
Assays for complex phenotype data & quantitative data
Assay
Phenotypic Character
Phenotypic Character
Phenotypic Character
• necessary• necessary & sufficient• phenodeviance
Linking qualitative descriptions across species
Decomposition of precomposed phenotype ontologies by providing logical definitions based on PATO
Link annotations across different knowledge domains and species
Link phenotypic descriptions of human diseases to animal models
Reconciling pre and post composed annotations
Retrospective PATO definitions of pre-coordinated terms in phenotype ontology
Precomposed Ontologies Mammalian Phenotype Plant trait Worm phenotype etc.
OMIM
EQ definitions
Aristotelian definitions (genus-differentia)
A <Q> *which* inheres_in an <E>
[Term] id: MP:0001262 name: decreased body weightnamespace: mammalian_phenotype_xpSynonym: low body weightSynonym: reduced body weightdef: " lower than normal average weight “[] is_a: MP:0001259 ! abnormal body weightintersection_of: PATO:0000583 ! decreased weightintersection_of: MA:0002405 ! adult mouse
Query # of records“large bone” 713"enlarged bone" 136"big bones" 16"huge bones" 4"massive bones" 28"hyperplastic bones" 8"hyperplastic bone" 34"bone hyperplasia" 122"increased bone growth" 543
Phenotypic information captured differently within the same domain (OMIM)
Phenotypic information captured differently across different domains
MP:0001265 – decreased body size MP:0001255 – decreased body height
WBPhenotype0000229 – small
OMIM %210710 – short stature
[Term]id: MP:0001265 ! decreased body sizeintersection_of: PATO:0000587 ! decreased sizeintersection_of: inheres_in MA:0002405 ! adult mouse
[Term]id: MP:0001255 ! decreased body heightintersection_of: PATO:0000569 ! decreased heightintersection_of: inheres_in MA:0002405 ! adult mouse
[Term]id: WBPhenotype0000229 ! smallintersection_of: PATO:0000587 ! decreased sizeintersection_of: OBO_REL:inheres_in WBls:0000041 ! Adult
[Term]id: OMIM:xxxxxxx ! short stature intersection_of: PATO:0000587 ! decreased sizeintersection_of: OBO_REL:inheres_in FMA!:20394 ! Body
[Term]id: OMIM:xxxxxxx ! short stature intersection_of: ATO:0000569 ! decreased heightintersection_of: OBO_REL:inheres_in FMA:20394 ! Body
Logical definitions allow for cross species – domain links
Suzie Lewis....
Experimental Design Annotate 11 human disease genes,
and their homologs Develop search algorithm that
utilizes the ontologies for comparison Test search algorithm by asking,
“given a set of phenotypic descriptions (EQ stmts), can we find…” alleles of the same gene homologs in different organisms members of a pathway (same organism) members of a pathway (other organisms)
Strategy for Annotation Leverage OMIM gene and related
disease records Use FMA, CL, GO, EDHAA, CHEBI,
PATO ontologies Annotate 5 (in parallel) to check for
curator consistency Annotate fly & fish orthologs (FB,
ZFA) Import mouse ortholog data (MA, MP)
Testing the methodology Annotated 11 gene-linked human
diseases described in OMIM, and their homologs in zebrafish and fruitfly:
Gene DiseaseATP2A1 Brody Myopathy
EPB41 ElliptocytosisEXT2 Multiple ExostosesEYA1 BOR syndromeFECH ProtoporphyriaPAX2 Renal-Coloboma SyndromeSHH Holoprosencephaly
SOX9 Campomelic DysplasiaSOX10 Peripheral Demyelinating NeuropathyTNNT2 Familial Hypertrophic Cardiomyopathy
TTN Muscular Dystrophy
An OMIM Record
Annotation Results
Gene# geno-types
phenotype statements
totalaverage/ allele
ATP2A1 5 16 3
EPB41 4 18 4
EXT2 5 35 7
EYA1* 16 335 19
FECH 14 37 3
PAX2* 24 183 8
SHH 19 207 9
SOX9* 13 321 23
SOX10* 15 192 12
TNNT2 10 36 4
TTN 21 63 3Total (11) 146 1443
Experimental Design Annotate 11 human disease genes,
and their homologs Develop search algorithm that
utilizes the ontologies for comparison Test search algorithm by asking,
“given a set of phenotypic descriptions (EQ stmts), can we find…” alleles of the same gene homologs in different organisms members of a pathway (same organism) members of a pathway (other organisms)
Ontology-based similarity scoring
Measure IC of any node:
Compute ‘similarity’ by finding IC ratios between any genotypes, genes, classes, etc.
Ontology-based Search Algorithm
Given a query node q, we try to find hits h1, h2,... that are of the same type as q, and are similar to q in terms of their annotation profile, A(q).
First step: create an annotation profile for the thing to be searched (i.e., a gene)
The annotation profile is the set of classes used to annotate that entity, and their ancestors
Comparing annotation profiles using same similarity IC metric
c A(q) iff link(r,q,c)∈ link(influences,sox9,curvature-of-tibia) → link(influences,sox9,morphology-of-bone)
Yes, we can find alleles of same gene
Gene# geno-types
allelic phenotype profiles phenotype statements
# alleles >0 sim ratio
average sim ratio
average IC ratio
totalaverage/ allele
ATP2A1 5 5 0.8 0.799 16 3EPB41 4 4 0.315 0.422 18 4EXT2 5 5 1 1 35 7EYA1* 16 16 0.226 0.229 335 19FECH 14 14 0.365 0.364 37 3PAX2* 24 24 0.068 0.063 183 8SHH 19 19 0.457 0.414 207 9SOX9* 13 13 0.207 0.197 321 23SOX10* 15 13 0.038 0.031 192 12TNNT2 10 10 0.517 0.505 36 4TTN 21 19 0.106 0.1 63 3Total (11)
146 142 1443
Experimental Design Annotate 11 human disease genes,
and their homologs Develop search algorithm that
utilizes the ontologies for comparison Test search algorithm by asking,
“given a set of phenotypic descriptions (EQ stmts), can we find…” alleles of the same gene homologs in different organisms members of a pathway (same organism) members of a pathway (other organisms)
UBERON: an anatomical linking ontology
Each organism has its own anatomical ontology
To connect annotations across species, need a way to link the anatomies
Wanted an ontology that incorporated both functional homology and anatomical similarity
Created an ontology linking anatomies from ZFA, FMA, XAO, MA, MIAA, WBbt, FBbt
UBERON connects phenotype entities from separate anatomy ontologies
Homologs are found by similarity search
simIC human/ mouse
simIC human/
zebrafishGene
ATP2A1 0.047 0.177EPB41 0.328 0.141EXT2 0.067 0.050EYA1 0.264 0.495FECH 0.430 0.101PAX2 0.157 0.375SHH 0.091 0.253SOX9 0.226 0.383
SOX10 0.380 0.443TNNT2 0.000 0.118
TTN 0.248 0.567
Experimental Design Annotate 11 human disease genes,
and their homologs Develop search algorithm that
utilizes the ontologies for comparison Test search algorithm by asking,
“given a set of phenotypic descriptions (EQ stmts), can we find…” alleles of the same gene homologs in different organisms members of a pathway (same organism) members of a pathway (other organisms)
shha is phenotypically similar to homologous pathway members
zebrafish shh pathway
mouse homologs
human homologs
shha Shh SHHsmo Smo disp1 Disp1 prdm1a Prdm1 hdac1 HDAC4scube2
wnt11Wnt1, 7b, 3a,
9b, 10bWNT6
gli1,2a Gli2, Gli3 GLI2bmp2b Bmp4 ndr1,2 NDRG1hhip Hhip
ptc1,ptc2 Ptch1,2 Rab23 Gas1 Nck1 Zic2
notch1a Notch1,2 Gsk3b
Potential candidates also foundGene Similarity Characterization
dharma 0.483Paired type homeodomain protein that has dorsal organizer inducing activity and is regulated by wnt signaling.
tbx16 0.401T-box transcription factor regulates mesenchyme to epithelial transition and LR patterning.
plod3 0.387Lysyl hydroxylase and glycosyltransferase important for axonal growth cone migration.
ntl 0.382T-box transcription factor important for notochord and mesoderm development.
kny 0.374 Glypican component of the wnt/PCP pathway
tll1 0.372Metalloprotease that can cleave Chordin and increase Bmp activity.
copa 0.372Cotamer vesicular coat complex important for maintenance of the Golgi and ER transport. Important for notochord differentiation.
sfpq 0.369RNA splicing factor required for cell survival and neuronal development.
lama1 0.369Basement membrane protein important for eye and body axis development.
lamc1 0.367Basement membrane protein important for eye development
atp7a 0.365 Copper transporting ATPase.
atp2a1 0.363Sarcoplasmic reticulum transmembrane ATPase that mediates calcium re-uptake.
flh 0.358Homeobox gene important for notochord and epiphysis development. Anterior/posterior expression determined by wnt activity.
wnt5b 0.327Extracellular cysteine rich glycoprotein required for convergent extension movements during posterior segmentation.
Results thus far Annotate 11 human disease genes,
and their homologs Develop search algorithm that
utilizes the ontologies for comparison Test search algorithm by asking,
“given a set of phenotypic descriptions (EQ stmts), can we find…”alleles of the same genehomologs in different organismsmembers of a pathway (same organism)members of a pathway (other organisms)
Conclusions Ontologies help Promising new directions for
ontology-based phenotype annotation
Promising ways for identifying novel pathway members, generating hypotheses to test at the bench
Acknowledgements
NCBO-Berkeley• Christopher Mungall• Nicole Washington• Mark Gibson• Rob Bruggner
U of Oregon• Monte Westerfield• Melissa Haendel
Cambridge Michael Ashburner George Gkoutos (PATO)
David Osumi-Sutherland
National Institutes of Health