use of semantic phenotyping to aid disease diagnosis

61
Use of semantic phenotyping to aid disease diagnosis Melissa Haendel July 10th, 2014

Upload: mhaendel

Post on 22-Apr-2015

216 views

Category:

Science


1 download

DESCRIPTION

Presented at Harvard CBMI to the Undiagnosed Disease Network 7.10.14

TRANSCRIPT

Page 1: Use of semantic phenotyping to aid disease diagnosis

Use of semantic phenotyping to aid disease diagnosis

Melissa HaendelJuly 10th, 2014

Page 2: Use of semantic phenotyping to aid disease diagnosis

Outline

Semantic Diagnosis of known diseases

Semantic similarity across species

Combining Exome analysis with cross-species semantic phenotyping

How much phenotyping is enough?

Page 3: Use of semantic phenotyping to aid disease diagnosis

The undiagnosed patient

Known disorders not recognized during prior evaluations?

Atypical presentation of known disorders?

Combinations of several disorders? Novel, unreported disorder?

Page 4: Use of semantic phenotyping to aid disease diagnosis

OMIM Query # Records

“large bone” 785

“enlarged bone” 156

“big bone” 16

“huge bones” 4

“massive bones” 28

“hyperplastic bones” 12

“hyperplastic bone” 40

“bone hyperplasia” 134

“increased bone growth” 612

Searching for phenotypes usingtext alone is insufficient

Page 5: Use of semantic phenotyping to aid disease diagnosis

The Challenge: Interpretation of Disease Candidates

?

What’s in the box? How are

candidates identified?

How do they compare?

Prioritized Candidates, Models, functional validation

M1

M2

M3

M4

...

Phenotypes

P1

P2

P3

Genotype info

G1

G2

G3

G4

Pathogenicity, frequency, protein interactions, gene expression, gene networks, epigenomics, metabolomics….

Page 6: Use of semantic phenotyping to aid disease diagnosis

What is an ontology?A set of logically defined, inter-related terms used to annotate data

Use of common or logically related terms across databases enables integration

Relationships between terms allow annotations to be grouped in scientifically meaningful ways

Reasoning software enables computation of inferred knowledge

Groups of annotations can be compared using semantic similarity algorithms

Page 7: Use of semantic phenotyping to aid disease diagnosis

Human Phenotype Ontology10,158 terms used to annotate:• Patients• Disorders• Genotypes• Genes• Sequence variantsIn human

Köhler et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 2014 Jan 1;42(1):D966-74.

Page 8: Use of semantic phenotyping to aid disease diagnosis

A human phenotype example

Abnormality of the eye

Vitreous hemorrhage

Abnormal eye morphology

Abnormality of the cardiovascular system

Abnormal eye physiology

Hemorrhage of the eye

Internal hemorrhage

Abnormality of the globe

Abnormality of blood circulation

Page 9: Use of semantic phenotyping to aid disease diagnosis

➔Phenotype annotations are unevenly distributed across different anatomical systems

Survey of Annotations in Disease Corpus

7,401 diseases99,045 annotations

Page 10: Use of semantic phenotyping to aid disease diagnosis

exome analysis

Recessive, De novo filters

Remove off-target, common variants, and variants not in known disease causing genes

Zemojtelet al., manuscript in presshttp://compbio.charite.de/PhenIX/

Target panel of 2,742 known Mendelian disease genes

Compare phenotype profiles using data from:HGMD, Clinvar, OMIM, Orphanet

Page 11: Use of semantic phenotyping to aid disease diagnosis

PhenIX performance testing

Simulated datasets for a given disease and inheritance model created by spiking DAG panel generated VCF file with mutations from HGMD

Page 12: Use of semantic phenotyping to aid disease diagnosis

PhenIX helped diagnose 11/38 patients

global developmental delay (HP:0001263)delayed speech and language development (HP:0000750)motor delay (HP:0001270)proportionate short stature (HP:0003508)microcephaly (HP:0000252)feeding difficulties (HP:0011968)congenital megaloureter (HP:0008676)cone-shaped epiphysis of the phalanges of the hand (HP:0010230)sacral dimple (HP:0000960)hyperpigmentated/hypopigmentated macules (HP:0007441)hypertelorism (HP:0000316)abnormality of the midface (HP:0000309)flat nose (HP:0000457)thick lower lip vermilion (HP:0000179)thick upper lip vermilion (HP:0000215)full cheeks (HP:0000293)short neck (HP:0000470)

Page 13: Use of semantic phenotyping to aid disease diagnosis

What to do when we can’t diagnose with a known disease?

Page 14: Use of semantic phenotyping to aid disease diagnosis

Outline

Semantic Diagnosis of known diseases

Semantic similarity across species

Combining Exome analysis with cross-species semantic phenotyping

How much phenotyping is enough?

Page 15: Use of semantic phenotyping to aid disease diagnosis

B6.Cg-Alms1foz/fox/J

increased weight,adipose tissue volume,

glucose homeostasis altered

ALSM1(NM_015120.4)[c.10775delC] + [-]

GENOTYPE

PHENOTYPE

obesity,diabetes mellitus, insulin resistance

increased food intake, hyperglycemia,

insulin resistance

kcnj11c14/c14; insrt143/+(AB)

Models recapitulate various phenotypic aspects of disease

?

Page 16: Use of semantic phenotyping to aid disease diagnosis

How much phenotype data?• Human genes have poor phenotype coverage

GWAS+

ClinVar +

OMIM

Page 17: Use of semantic phenotyping to aid disease diagnosis

How much phenotype data?• Human genes have poor phenotype coverage

• What else can we leverage?

GWAS+

ClinVar +

OMIM

Page 18: Use of semantic phenotyping to aid disease diagnosis

How much phenotype data?• Human genes have poor phenotype coverage

• What else can we leverage? …animal models

Orthology via PANTHER v9

Page 19: Use of semantic phenotyping to aid disease diagnosis

How much phenotype data?• Combined, human and model phenotypes can be

linked to >75% human genes.

Orthology via PANTHER v9

Page 20: Use of semantic phenotyping to aid disease diagnosis

Monarch phenotype data

Also in the system: Rat; IMPC; GO annotations; Coriell cell lines; OMIA; MPD; Yeast; CTD; GWAS; Panther, Homologene orthologs; BioGrid interactions; Drugbank; AutDB; Allen Brain …157 sources to date

Coming soon: Animal QTLs for pig, cattle, chicken, sheep, trout, dog, horse

Species Data source Genes Genotypes Variants Phenotype annotations

Diseases

mouse MGI 13,433 59,087 34,895 271,621  

fish ZFIN 7,612 25,588 17,244 81,406  

fly Flybase 27,951 91,096 108,348 267,900  

worm Wormbase 23,379 15,796 10,944 543,874  

human HPOA       112,602 7,401

human OMIM 2,970     4,437 3,651

human ClinVar 3,215   100,523 445,241 4,056

human KEGG 2,509     3,927 1,159

human ORPHANET 3,113     5,690 3,064

human CTD 7,414     23,320 4,912

Page 21: Use of semantic phenotyping to aid disease diagnosis

Survey of Annotations Disease/Model Corpus

Data from MGI, ZFIN, & HPO, reasoned over with cross-species phenotype ontology https://code.google.com/p/phenotype-ontologies/

➔Models have a different phenotype distribution

Page 22: Use of semantic phenotyping to aid disease diagnosis

Multiple ways to compare disease to models

Asserted models Inferred by orthology Inferred by gene enrichment Inferred by phenotypic similarity

Page 23: Use of semantic phenotyping to aid disease diagnosis

Models based on phenotypic similarity

Washington, N. L., Haendel, M. A., Mungall, C. J., Ashburner, M., Westerfield, M., & Lewis, S. E. (2009). Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation. PLoS Biol, 7(11). doi:10.1371/journal.pbio.1000247

Page 24: Use of semantic phenotyping to aid disease diagnosis

Problem: Clinical and model phenotypes are described differently

Page 25: Use of semantic phenotyping to aid disease diagnosis

lung

lung

lobular organ

parenchymatous organ

solid organ

pleural sac

thoracic cavity organ

thoracic cavity

abnormal lung morphology

abnormal respiratory system morphology

Mammalian Phenotype

Mouse Anatomy

FMA

abnormal pulmonary acinus morphology

abnormal pulmonary alveolus morphology

lungalveolus

organ system

respiratory system

Lower respiratory

tract

alveolar sac

pulmonary acinus

organ system

respiratory system

Human development

lung

lung bud

respiratory primordium

pharyngeal region

Another Problem: Data silos

develops_frompart_of

is_a (SubClassOf)

surrounded_by

Page 26: Use of semantic phenotyping to aid disease diagnosis

Solution: bridging semantics

Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative multi-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5

anatomical structure

endoderm of forgut

lung bud

lung

respiration organ

organ

foregut

alveolus

alveolus of lung

organ part

FMA:lung

MA:lung

endoderm

GO: respiratory gaseous exchange

MA:lung alveolus

FMA: pulmonary

alveolus

is_a (taxon equivalent)

develops_frompart_of

is_a (SubClassOf)

capable_of

NCBITaxon: Mammalia

EHDAA:lung bud

only_in_taxon

pulmonary acinus

alveolar sac

lung primordium

swim bladder

respiratory primordium

NCBITaxon:Actinopterygii

Haendel, M. A. et al. (2014). Unification of multi-species vertebrate anatomy ontologies for comparative biology in Uberon. Journal of Biomedical Semantics 2014, 5:21. doi:10.1186/2041-1480-5-21

Page 27: Use of semantic phenotyping to aid disease diagnosis

Modular phenotype description

Entity (Anatomy, Spatial, Gene Ontology)BSPO: anterior region part_of ZFA:head

ZFA:heartZFA:ventral mandibular arch

GO:swim bladder inflation

Quality (PATO)Small sizeEdematousThickArrested

Page 28: Use of semantic phenotyping to aid disease diagnosis

Mammalian Phenotype Ontology

Smith et al. (2005). The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol, 6(1). doi:10.1186/gb-2004-6-1-r7

10,097 terms used to annotate and query:• Genotypes• Alleles• GenesIn mice

Page 29: Use of semantic phenotyping to aid disease diagnosis

Phenotype representation requires more than “phenotype ontologies”

glucose metabolism (GO:00060

06)

Gene/protein function

data

glucose(CHEBI:17

234)

Metabolomics,

toxicogenomics

data

Disease & phenotyp

e data

type II diabetes mellitus

(DOID:9352)

pyruvate(CHEBI:15

361)

Disease Gene Ontology Chemical

pancreatic beta cell

(CL:0000169)

transcriptomic data

Cell

Page 30: Use of semantic phenotyping to aid disease diagnosis

Uberpheno – building a cross-species semantic framework

Köhler et al. (2014) Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research F1000Research 2014, 2:30

Page 31: Use of semantic phenotyping to aid disease diagnosis

Uberpheno construction

Page 32: Use of semantic phenotyping to aid disease diagnosis

Uberpheno construction

Page 33: Use of semantic phenotyping to aid disease diagnosis

Uberpheno construction

Page 34: Use of semantic phenotyping to aid disease diagnosis

Uberpheno construction

Page 35: Use of semantic phenotyping to aid disease diagnosis

OWLsim: Phenotype similarity across patients or organisms

https://code.google.com/p/owltools/wiki/OwlSim

Page 36: Use of semantic phenotyping to aid disease diagnosis

Visualizing phenotypic similarity

➔Each model recapitulates some of the disease phenotypes

Holoprosencephaly I (unknown gene, mapped to 21q22.3)compared to most similar mouse models

Page 37: Use of semantic phenotyping to aid disease diagnosis

Models of disease based on phenotypic similarity

Holoprosencephaly I (unknown gene, mapped to 21q22.3)compared to most similar mouse models

➔The ontologies enable comparison across species

Page 38: Use of semantic phenotyping to aid disease diagnosis

Outline

Semantic Diagnosis of known diseases

Semantic similarity across species

Combining Exome analysis with cross-species semantic phenotyping

How much phenotyping is enough?

Page 40: Use of semantic phenotyping to aid disease diagnosis

Exomiser results for the Undiagnosed Disease Program 11 previously diagnosed families Exomiser 2.0 identified the causative variants with a rank of at least 7/408 potential variants

23 families without identified disorders

We have now prioritized variants in STIM1, ATP13A2, PANK2, and CSF1R in 5 different families (2 STIM1 families)

Page 41: Use of semantic phenotyping to aid disease diagnosis

Exomiser performance on solved UDP cases

Exo Variant Exo Pheno Exo Exo no Mendelian Exo Novel0

1

2

3

4

5

6

7

8

9

10

11

top10top 5top candidate

Page 42: Use of semantic phenotyping to aid disease diagnosis

UDP_2731 candidatesChromosome Position Reference Allele Variant Allele GENE Phenotype score Variant Score Exomiser ScorechrX 19554576T C SH3KBP1 0.5051473 0.995576 0.7503617chr2 179658310T C TTN 0.64627105 0.79311335 0.71969223chr2 179632598C T TTN 0.64627105 0.79311335 0.71969223chr2 179567340G A TTN 0.64627105 0.79311335 0.71969223chr2 179553542G T TTN 0.64627105 0.79311335 0.71969223chr2 179549131C T TTN 0.64627105 0.79311335 0.71969223chr18 67836115G T RTTN 0.7629328 0.25979215 0.51136243chr18 67721492G C RTTN 0.7629328 0.25979215 0.51136243chr18 67673764T C RTTN 0.7629328 0.25979215 0.51136243

chrX 140993905-

GCTCCTTCTCCTCCACTTTATTGAGTATTTTCCAGAGTTCCCCTGAGAGAAGTCAGAGAACTTCTGAGGGTTTTGCACAGTCTCCTCTCCAGATTCCTGTGAGCT MAGEC1 0.5416666 0.85 0.6958333

chr6 30858858G A DDR1 0.37619072 1 0.68809533

chr3 129308149AGCCTCCCACCCCCACCCCCTCCCCACATCCCCAACCATACCTACCTTGAGA - PLXND1 0.34432834 0.95 0.64716417

chr5 37245866G A C5orf42 0.7855199 0.5 0.6427599chr5 37169169T C C5orf42 0.7855199 0.5 0.6427599chr6 42946264G A PEX6 0.7187602 0.5 0.6093801chr6 42931861G A PEX6 0.7187602 0.5 0.6093801chrX 53113897G C TSPYL2 0.59999996 0.4906897 0.5453448chr13 75911097T C TBC1D4 0.23643239 0.7895149 0.51297367chr13 75900510G A TBC1D4 0.23643239 0.7895149 0.51297367chr13 75861174- A TBC1D4 0.23643239 0.7895149 0.51297367chr18 67836115G T RTTN 0.7629328 0.25979215 0.51136243chr18 67721492G C RTTN 0.7629328 0.25979215 0.51136243chr18 67673764T C RTTN 0.7629328 0.25979215 0.51136243

Page 43: Use of semantic phenotyping to aid disease diagnosis

UDP_2731

Page 44: Use of semantic phenotyping to aid disease diagnosis

What if there aren’t any similar diseases or models?

UDP_1166

➔ Exomiser can utilize phenotypic similarity via the interactome

Page 45: Use of semantic phenotyping to aid disease diagnosis

Outline

Semantic Diagnosis of known diseases

Semantic similarity across species

Combining Exome analysis with cross-species semantic phenotyping

How much phenotyping is enough?

Page 46: Use of semantic phenotyping to aid disease diagnosis

How does the clinician know they’ve provided enough phenotyping?

How many annotations…? How many different categories? How many within each?

Page 47: Use of semantic phenotyping to aid disease diagnosis

Method Create a variety of “derived” diseases that

are less-specific

Assess the change in similarity between the derived disease and it’s parent.

Ask questions: Is the derived disease still considered

similar to the original disease? …or more similar to a different disease? Is it distinguishable beyond random?

Page 48: Use of semantic phenotyping to aid disease diagnosis

Image credit: Viljoen and Beighton, J Med Genet. 1992

Example: Schwartz-jampel Syndrome, Type I

Rare disease

Caused by Hspg2 mutation, a proteoglycan

~100 phenotype annotations

Page 49: Use of semantic phenotyping to aid disease diagnosis

Example: Schwartz-jampel Syndrome, Type Ito test influence of a single phenotypic category

Page 50: Use of semantic phenotyping to aid disease diagnosis

Schwartz-jampel Syndrome derivationsto test influence of a single phenotypic category

Page 51: Use of semantic phenotyping to aid disease diagnosis

Schwartz-jampel Syndrome derivations

Page 52: Use of semantic phenotyping to aid disease diagnosis

Example: Schwartz-jampel Syndrome, Type I

*

**

➔ When averaged over all diseases, the absence of a single phenotypic category has far less impact when there’s more breadth in annotations

Page 53: Use of semantic phenotyping to aid disease diagnosis

How much phenotyping is enough?

• How many annotations…? • How many different categories?• How many within each?

As much as is

reasonable

Page 54: Use of semantic phenotyping to aid disease diagnosis

Annotation Sufficiency Score• Measurement of breadth and depth of an

phenotype profile

• Uses human disease, mouse and fish* gene phenotype profiles to seed the individual phenotype scores

• Custom queries available via REST services• http://monarchinitiative.org/page/services

*soon to add more species

Page 55: Use of semantic phenotyping to aid disease diagnosis

Annotation Sufficiency Score

http://www.phenotips.orghttp://www.monarchinitiative.org

Page 56: Use of semantic phenotyping to aid disease diagnosis

Conclusions Semantic representation of patient

phenotypes can aid disease diagnosis

There exists a lot of phenotype data in model organisms that is complementary to known human data

Ontological integration and use of cross-species inferencing can aid prioritization of variants

The entire cross-species corpus can be utilized to support quality assurance processes for phenotype data capture

Page 57: Use of semantic phenotyping to aid disease diagnosis

NIH-UDPWilliam BoneMurat SincanDavid AdamsAmanda LinksDavid DraperJoie DavisNeal BoerkoelCyndi TifftBill Gahl

OHSUNicole VasileskyMatt BrushBryan LarawayShahim Essaid

Lawrence BerkeleyNicole WashingtonSuzanna LewisChris Mungall

UCSDAmarnath GuptaJeff GretheAnita BandrowskiMaryann Martone

U of PittChuck BoromeoJeremy EspinoBecky BoesHarry Hochheiser

AcknowledgmentsSanger

Anika OehlrichJules JacobsonDamian Smedley

TorontoMarta GirdeaSergiu DumitriuHeather TrangMike Brudno

JAXCynthia Smith

CharitéSebastian KohlerSandra DoelkenSebastian BauerPeter Robinson

Funding:NIH Office of Director: 1R24OD011883NIH-UDP: HHSN268201300036C

Page 58: Use of semantic phenotyping to aid disease diagnosis
Page 59: Use of semantic phenotyping to aid disease diagnosis

Candidate gene prioritization

Page 60: Use of semantic phenotyping to aid disease diagnosis

Survey of Annotations in Disease Corpus*

➔Most diseases impact >1 system

Page 61: Use of semantic phenotyping to aid disease diagnosis

PhenoViz: Integrate all human, mouse, and fish data to understand CNVs

Desktop application for differential diagnostics in CNVs

Explain manifestations of CNV diseases based on genes contained in CNV

E.g., Supravalcular aortic stenosis in Williams syndrome can be explained by haploinsufficiency for elastin Double the number of explanations using model data

Doelken, Köhler, et al. (2013) Dis Model Mech 6:358-72