pato an ontology of phenotypic qualities george gkoutos university of cambridge

51
PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Upload: kristian-harrell

Post on 18-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

PATOAn Ontology of Phenotypic

Qualities

George GkoutosUniversity of Cambridge

Page 2: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Phenotype Information

Literature Qualitative descriptions

Experimental data Qualitative descriptions Quantitative descriptions

Various representation methodologies Complex phenotype data

Need for :“A platform for facilitating mutual understanding and

interoperability of phenotype information across species and domains of knowledge amongst people and machines” …..

Page 3: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Representation of Phenotypic data

Organism attributes T – Species

G – Genotype

I – Strain

S – Genotypic Sex

A – Alleles at named loci

E–Environmental/handling condition

D – Age/stage of development

Assay means of making observations

Phenotypic Character any feature of the organism thatis observed or 'assayed'.

Page 4: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Assay Controlled Vocabulary

• Abnormality• Relative_to• Ranges of values• Allows the schema to be dynamic

• Definition of qualities and their relations• Explicit differences (between laboratories)• Allows labs around the world to “plug-in” theirassays to the schema

Assay

Phenotypic Character

Phenotypic Character

Phenotypic Character

Page 5: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Phenotypic character representation methodologies

Pre-composition– Examples:

– MGI Mouse genotype-phenotype annotation (Mammalian Phenotype)– Gramene trait annotation (Plant trait ontology)– etc.

Pre-composition often follows the compositional structure occasionally adopted by GO terms.

Positive/negative regulation of mitosis positive/negative + regulation of mitosis (GO:0045839)

Increased/decreased angiogenesis increased/decreased + angiogenesis (GO:0001525)

Advantages Easy for annotation Control Complex phenotypic information

Disadvantages Lack of rigidity Ontology management Expansion Quantitative data

Page 6: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Methodologies (cont.) post-composition

The post-composition methodology takes advantage of the ability to describe phenotypes by describing the particular affected entity (bearer), which could be an anatomical structure, a biological process, a particular function etc. , and the qualities that this entity possess, which can be described either in qualitative or quantitative terms.

Advantages Ontology management Rigidity expansion Quantitative data Advanced queries

Disadvantages Complex phenotypic information More difficult for annotation Need for constraints for ensuring meaningful annotations

Page 7: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Phenotype And Trait Ontology (PATO)

• An ontology of phenotypic qualities, which can be shared across different species and domains of knowledge.

• Qualities are the basic entities that we can perceive and/or measure:– colors, sizes, masses, lengths etc.

• Qualities inhere to entities: every entity comes with certain qualities, which exist as long as the entity exist.

• Qualities belong in a finite set of quality types (i.e. color, size etc) and inhere in specific individuals. No two individuals can have the same quality, and each quality is specifically constantly dependent on the entity it inheres in.

Page 8: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Phenotypic Character

PATOSpecies Independent

Core Ontologies(e.g. anatomy, behaviour,

pathology)

EQ Phenotype Description

Entity (E) Quality (Q)

PATOSpecies Independent

EQ Phenotype Description

Page 9: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Simple phenotype descriptions

(mouse body weight) (mouse anatomy: body + PATO: weight)

(Drosophila anatomy: eye + PATO: colour)

(ChEBI: glucose + PATO: concentration)

(eye colour)(glucose concentration)

Phenotypic Character entity + quality

increased size hepatocellular carcinoma

hepatocellular carcinoma (MPATH:357) has_quality increased size (PATO:0000586)

Page 10: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Entity Quality

Evidence Qualifier

relationship

Units

EnvironmentGenetic

Phenotype annotation model

Source

Attribution

Who makes the assertion

Properties

When, what organization

Assertion

Page 11: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Assertion

E=eye disc(FBbt:00001768)

Q=condensed (PATO:0001485)

Source:PMID:8431945

M. Ashburner

influences

Date: 10/26/2007Organization: FlyBaseVersion: 1

eya1appears

Evidence:light microscopy

Annotation:Phenotypes in literature

Page 12: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Quantitative Data

• PATO – part of a representation of qualitative phenotypic information

• More often than not it is important to record quantitative information that results from a specific measurement of a quality

• Measurements involve units (Phenotypic Character + Unit)

The tail of my mouse is 2.1 cm

Page 13: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

PATO & measurements

UO – an ontology of unit UO’s top-level division is between primary base units

of a particular measure and units that are derived from base units

mapping between the various scalar qualities (such as weight, height, concentration etc.) and the corresponding units used to measure those qualities

UO includes 264 terms, all of which are defined email list (

http://sourceforge.net/mailarchive/forum.php?forum_id=50613)

Page 14: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Mapping PATO to the UO

Page 15: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Linking quantitative data to qualitative descriptions

Measurement qualitative description Assay

range normality necessary & sufficient conditions

EQ descriptor high level annotation marking phenodeviance (e.g. MP)

Page 16: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Multiple phenotypic characters to describe complex phenotypes

SHH-/+ SHH-/-

shh-/+ shh-/-

Page 17: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Phenotype (character) = entity + quality

Page 18: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Phenotype (character) = entity + quality

P1 = eye + hypoteloric

Page 19: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Phenotype (character) = entity + quality

P1 = eye + hypoteloricP2 = midface + hypoplastic

Page 20: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Phenotype (character) = entity + quality

P1 = eye + hypoteloricP2 = midface + hypoplastic P3 = kidney + hypertrophied

Page 21: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Phenotype (character) = entity + quality

P1 = eye + hypoteloricP2 = midface + hypoplastic P3 = kidney + hypertrophied

PATO: hypoteloric

hypoplastic

hypertrophied

ZFIN: eye

midface

kidney

+

Page 22: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Phenotype (character) = entity + quality

P1 = eye + hypoteloricP2 = midface + hypoplastic P3 = kidney + hypertrophied

Phenotype = P1 + P2 + P3

(phenotypic profile) = holoprosencephaly

Page 23: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Assays for complex phenotype data & quantitative data

Assay

Phenotypic Character

Phenotypic Character

Phenotypic Character

• necessary• necessary & sufficient• phenodeviance

Page 24: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Linking qualitative descriptions across species

Decomposition of precomposed phenotype ontologies by providing logical definitions based on PATO

Link annotations across different knowledge domains and species

Link phenotypic descriptions of human diseases to animal models

Page 25: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Reconciling pre and post composed annotations

Retrospective PATO definitions of pre-coordinated terms in phenotype ontology

Precomposed Ontologies Mammalian Phenotype Plant trait Worm phenotype etc.

OMIM

Page 26: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

EQ definitions

Aristotelian definitions (genus-differentia)

A <Q> *which* inheres_in an <E>

[Term] id: MP:0001262 name: decreased body weightnamespace: mammalian_phenotype_xpSynonym: low body weightSynonym: reduced body weightdef: " lower than normal average weight “[] is_a: MP:0001259 ! abnormal body weightintersection_of: PATO:0000583 ! decreased weightintersection_of: MA:0002405 ! adult mouse

Page 27: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge
Page 28: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Query # of records“large bone” 713"enlarged bone" 136"big bones" 16"huge bones" 4"massive bones" 28"hyperplastic bones" 8"hyperplastic bone" 34"bone hyperplasia" 122"increased bone growth" 543

Phenotypic information captured differently within the same domain (OMIM)

Page 29: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Phenotypic information captured differently across different domains

MP:0001265 – decreased body size MP:0001255 – decreased body height

WBPhenotype0000229 – small

OMIM %210710 – short stature

Page 30: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

[Term]id: MP:0001265 ! decreased body sizeintersection_of: PATO:0000587 ! decreased sizeintersection_of: inheres_in MA:0002405 ! adult mouse

[Term]id: MP:0001255 ! decreased body heightintersection_of: PATO:0000569 ! decreased heightintersection_of: inheres_in MA:0002405 ! adult mouse

[Term]id: WBPhenotype0000229 ! smallintersection_of: PATO:0000587 ! decreased sizeintersection_of: OBO_REL:inheres_in WBls:0000041 ! Adult

[Term]id: OMIM:xxxxxxx ! short stature intersection_of: PATO:0000587 ! decreased sizeintersection_of: OBO_REL:inheres_in FMA!:20394 ! Body

[Term]id: OMIM:xxxxxxx ! short stature intersection_of: ATO:0000569 ! decreased heightintersection_of: OBO_REL:inheres_in FMA:20394 ! Body

Logical definitions allow for cross species – domain links

Page 31: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Suzie Lewis....

Page 32: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Experimental Design Annotate 11 human disease genes,

and their homologs Develop search algorithm that

utilizes the ontologies for comparison Test search algorithm by asking,

“given a set of phenotypic descriptions (EQ stmts), can we find…” alleles of the same gene homologs in different organisms members of a pathway (same organism) members of a pathway (other organisms)

Page 33: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Strategy for Annotation Leverage OMIM gene and related

disease records Use FMA, CL, GO, EDHAA, CHEBI,

PATO ontologies Annotate 5 (in parallel) to check for

curator consistency Annotate fly & fish orthologs (FB,

ZFA) Import mouse ortholog data (MA, MP)

Page 34: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Testing the methodology Annotated 11 gene-linked human

diseases described in OMIM, and their homologs in zebrafish and fruitfly:

Gene DiseaseATP2A1 Brody Myopathy

EPB41 ElliptocytosisEXT2 Multiple ExostosesEYA1 BOR syndromeFECH ProtoporphyriaPAX2 Renal-Coloboma SyndromeSHH Holoprosencephaly

SOX9 Campomelic DysplasiaSOX10 Peripheral Demyelinating NeuropathyTNNT2 Familial Hypertrophic Cardiomyopathy

TTN Muscular Dystrophy

Page 35: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

An OMIM Record

Page 36: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Annotation Results

Gene# geno-types

phenotype statements

totalaverage/ allele

ATP2A1 5 16 3

EPB41 4 18 4

EXT2 5 35 7

EYA1* 16 335 19

FECH 14 37 3

PAX2* 24 183 8

SHH 19 207 9

SOX9* 13 321 23

SOX10* 15 192 12

TNNT2 10 36 4

TTN 21 63 3Total (11) 146 1443

Page 37: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Experimental Design Annotate 11 human disease genes,

and their homologs Develop search algorithm that

utilizes the ontologies for comparison Test search algorithm by asking,

“given a set of phenotypic descriptions (EQ stmts), can we find…” alleles of the same gene homologs in different organisms members of a pathway (same organism) members of a pathway (other organisms)

Page 38: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Ontology-based similarity scoring

Measure IC of any node:

Compute ‘similarity’ by finding IC ratios between any genotypes, genes, classes, etc.

Page 39: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Ontology-based Search Algorithm

Given a query node q, we try to find hits h1, h2,... that are of the same type as q, and are similar to q in terms of their annotation profile, A(q).

First step: create an annotation profile for the thing to be searched (i.e., a gene)

The annotation profile is the set of classes used to annotate that entity, and their ancestors

Comparing annotation profiles using same similarity IC metric

c A(q) iff link(r,q,c)∈ link(influences,sox9,curvature-of-tibia) → link(influences,sox9,morphology-of-bone)

Page 40: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Yes, we can find alleles of same gene

Gene# geno-types

allelic phenotype profiles phenotype statements

# alleles >0 sim ratio

average sim ratio

average IC ratio

totalaverage/ allele

ATP2A1 5 5 0.8 0.799 16 3EPB41 4 4 0.315 0.422 18 4EXT2 5 5 1 1 35 7EYA1* 16 16 0.226 0.229 335 19FECH 14 14 0.365 0.364 37 3PAX2* 24 24 0.068 0.063 183 8SHH 19 19 0.457 0.414 207 9SOX9* 13 13 0.207 0.197 321 23SOX10* 15 13 0.038 0.031 192 12TNNT2 10 10 0.517 0.505 36 4TTN 21 19 0.106 0.1 63 3Total (11)

146 142 1443

Page 41: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Experimental Design Annotate 11 human disease genes,

and their homologs Develop search algorithm that

utilizes the ontologies for comparison Test search algorithm by asking,

“given a set of phenotypic descriptions (EQ stmts), can we find…” alleles of the same gene homologs in different organisms members of a pathway (same organism) members of a pathway (other organisms)

Page 42: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

UBERON: an anatomical linking ontology

Each organism has its own anatomical ontology

To connect annotations across species, need a way to link the anatomies

Wanted an ontology that incorporated both functional homology and anatomical similarity

Created an ontology linking anatomies from ZFA, FMA, XAO, MA, MIAA, WBbt, FBbt

Page 43: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

UBERON connects phenotype entities from separate anatomy ontologies

Page 44: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Homologs are found by similarity search

simIC human/ mouse

simIC human/

zebrafishGene

ATP2A1 0.047 0.177EPB41 0.328 0.141EXT2 0.067 0.050EYA1 0.264 0.495FECH 0.430 0.101PAX2 0.157 0.375SHH 0.091 0.253SOX9 0.226 0.383

SOX10 0.380 0.443TNNT2 0.000 0.118

TTN 0.248 0.567

Page 45: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Experimental Design Annotate 11 human disease genes,

and their homologs Develop search algorithm that

utilizes the ontologies for comparison Test search algorithm by asking,

“given a set of phenotypic descriptions (EQ stmts), can we find…” alleles of the same gene homologs in different organisms members of a pathway (same organism) members of a pathway (other organisms)

Page 46: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

shha is phenotypically similar to homologous pathway members

zebrafish shh pathway

mouse homologs

human homologs

shha Shh SHHsmo Smo  disp1 Disp1  prdm1a Prdm1  hdac1   HDAC4scube2    

wnt11Wnt1, 7b, 3a,

9b, 10bWNT6

gli1,2a Gli2, Gli3 GLI2bmp2b Bmp4  ndr1,2   NDRG1hhip Hhip  

ptc1,ptc2 Ptch1,2    Rab23    Gas1    Nck1    Zic2  

notch1a Notch1,2    Gsk3b  

Page 47: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge
Page 48: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Potential candidates also foundGene Similarity Characterization

dharma 0.483Paired type homeodomain protein that has dorsal organizer inducing activity and is regulated by wnt signaling.

tbx16 0.401T-box transcription factor regulates mesenchyme to epithelial transition and LR patterning.

plod3 0.387Lysyl hydroxylase and glycosyltransferase important for axonal growth cone migration.

ntl 0.382T-box transcription factor important for notochord and mesoderm development.

kny 0.374 Glypican component of the wnt/PCP pathway

tll1 0.372Metalloprotease that can cleave Chordin and increase Bmp activity.

copa 0.372Cotamer vesicular coat complex important for maintenance of the Golgi and ER transport. Important for notochord differentiation.

sfpq 0.369RNA splicing factor required for cell survival and neuronal development.

lama1 0.369Basement membrane protein important for eye and body axis development.

lamc1 0.367Basement membrane protein important for eye development

atp7a 0.365 Copper transporting ATPase.

atp2a1 0.363Sarcoplasmic reticulum transmembrane ATPase that mediates calcium re-uptake.

flh 0.358Homeobox gene important for notochord and epiphysis development. Anterior/posterior expression determined by wnt activity.

wnt5b 0.327Extracellular cysteine rich glycoprotein required for convergent extension movements during posterior segmentation.

Page 49: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Results thus far Annotate 11 human disease genes,

and their homologs Develop search algorithm that

utilizes the ontologies for comparison Test search algorithm by asking,

“given a set of phenotypic descriptions (EQ stmts), can we find…”alleles of the same genehomologs in different organismsmembers of a pathway (same organism)members of a pathway (other organisms)

Page 50: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Conclusions Ontologies help Promising new directions for

ontology-based phenotype annotation

Promising ways for identifying novel pathway members, generating hypotheses to test at the bench

Page 51: PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Acknowledgements

NCBO-Berkeley• Christopher Mungall• Nicole Washington• Mark Gibson• Rob Bruggner

U of Oregon• Monte Westerfield• Melissa Haendel

Cambridge Michael Ashburner George Gkoutos (PATO)

David Osumi-Sutherland

National Institutes of Health