ncbo haendel talk 2013

Post on 24-May-2015

277 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Part of the NCBO seminar series http://www.bioontology.org/webinar-series

TRANSCRIPT

Removing roadblocks: leveraging ontologies for data aggregation and

computation

NCBO Seminar seriesMarch 6th, 2013

Melissa HaendelOn behalf of very many team members

Topics for today

The Research Symbiosis Some Integration Projects Leveraging

Ontologies A more complete research profile – integrating

research resources and person information Improving query across multiple biospecimen

repositories Identifying disease candidates by leveraging

cross-species anatomy and phenotype queries

Consult Databases

Share Resources/Data

Publish papers

Contribute to Databases

The Research Symbiosis

Get funding

Do Experiments

The Web

We’ve all been here before:

Ontologies can help us do better.

OMIM Query # of records“large bone” 1032"enlarged bone" 207"big bones" 22"huge bones" 4"massive bones" 39"hyperplastic bones" 12"hyperplastic bone" 44"bone hyperplasia" 173"increased bone growth" 836

Why not just map to ontology terms?Class A Class B Mapped? Useful?

FMA: extensor retinaculum of wrist

MouseAnatomy: retina Yes No

Vivo: legal decision Cognitive Atlas: decision Yes No

PlantOntology: Pith MouseAnatomy: medulla Yes No

TaxRank: domain NCI: protein domain Yes No

ZfishAnat: hypophysis MouseAnatomy: pituitary No Yes

FMA: tibia FlyAnatomy: tibia Yes No

FMA: colon GAZ: Colón, Panama Yes No

Quality: male Chebi: maleate 2(-) Yes No

Mapping requires manual work to perform and maintain; string matching for mapping can lead to spurious results; semantics of mappings and provenance are not always clear

Topics for today

The Research Symbiosis Some Integration Projects Leveraging

Ontologies A more complete research profile – integrating

research resources and person information Improving query across multiple biospecimen

repositories Identifying disease candidates by leveraging

cross-species anatomy and phenotype queries

CTSAconnect: A Linked Open Data

approach to represent clinical and research

expertise, activities, and resources

CTSA 10-001: 100928SB23PROJECT #: 00921-0001

Research generates many resources that are rarely shared or published:

About eagle-i: inventories “invisible” resources

Ontology-system for collecting and querying research resources

eagle-i.net

About VIVO Primarily focused on people, activities, and

outcomes typically associated with research networking

Eager to represent more diverse components of expertise, across domains e.g., exhibits, performances, specifics about research

Had worked with core facilities at Cornell to represent labs, equipment, and services

Started collaborating with eagle-i to go further with research resources

At the intersection of Vivo and eagle-i

www.ctsaconnect.org CTSAconnectReveal Connections. Realize

Potential.

And then was born the “CTSAconnect” project

Ok, so it is perhaps not a very informative name for an effort to consolidate researcher, research activities, and research resource representation, but what else are we going to call it?

ARG! The Agents, Resources, and Grants ontology

ISF Content and modularization

eagle-IResearch resources

VIVOPerson profiling

ShareCenterDiscussions, requests,

share documents

ISF

Contact OrganizationsAffiliations

Services EventsClinical

ExpertiseReagents

OrganismsCredentials

ISF Modularization

Constraints• Different ontology modeling principles• Active ongoing development of eagle-i and VIVO applications

• Investments in existing RDF datasets and the need for stable targets

Benefits• Flexibility in what modules to populate at a given site• Extensibility as needs and feedback influence future evolution

Annotation view with approved or pending approval. Module view shows pending axiom changes per module and has ability to save the

changes with a log comment, and generate the spreadsheet summary

Protégé refactoring plugin

ISF Merging

Relating ICD9 to MeSH in support of clinical expertise

Clinical expertise data visualization

Building translational teams

We want to assemble teams of scientists to examine, for example, specific drugs released for repurposing

Hard to identify and connect complementary basic and clinical expertise across disciplines

Bringing together clinical expertise and basic science expertise

Representation of a clinician expertise extracted From ICD-9 codes for

Basic Researcher with Similar Expertise based on MeSH TermsResources

a resource related to Autoimmune disease

Relating researchers across disciplines

Topics for today

The Research Symbiosis Some Integration Projects Leveraging

Ontologies A more complete research profile – integrating

research resources and person information Improving query across multiple biospecimen

repositories Identifying disease candidates by leveraging

cross-species anatomy and phenotype queries

OHSU’s Biolibrary and Search Engine

Data aggregated from two repositories:– Department of Pathology repository (600K)– Knight Cancer Institute repository (16K)

A web-based search engine over de-identified data

Our group is applying semantic informatics to improve– Data format and quality– Data integration across the two repositories– Search capabilities Funded by Medical Research

Foundation of Oregon

Opportunities for improving the Biolibrary data

Limited anatomical data– Cancer registry table has 300+

anatomical entities– Pathology table only 86 – 99% of pathology reports (600K)

have no anatomical codes– No anatomical relationships– Coded sites are not as specific as

descriptions in the pathology reports

Current Search Interface

Two separate search interfaces

Multiple forms

Biolibrary Text Search

Syntactic free text search

Coded Syntactic Search

Search through anatomy and histology lists

Extracting ontology concepts

Pathology reports were the main focus– Main source of data in the current system– Contain richer information

NLP tools were used to identify concepts Existing ontology resources were used to

add semantics

Developing a Biospecimen ontology

Phenotypes (PATO)

Information Ontology (IAO)

•HPO•SNOMED•NCI Thesaurus•ICDO/ICD9•GO•CHEBI•Cell

Anatomy (FMA, Uberon)

Medicine(OGMS)

Classes, Types, Vocabulary

Data, Instances

Pathology Catalog

Pathology Inventory

Pathology Report Instance #123 Instance #456

Instantiates Classifies asUses

Structured data vs. pathology report(about 7K cases)

However, pathology report also includes:•Low grade pancreatic intraepithelial neoplasia•Extensive perineural invasion•Acute and chronic cholecystitis•Bile duct tissue with chronic inflammation•Chronic pancreatitis•Acute gastric serositis

Available structured data from one case:

Adding Logical Relationships

About 400 anatomical entities were mapped to the Foundational Model of Anatomy

An additional 300 to SNOMED Used the is_a and part_of relations Re-represented this in a semantic and

computable format Allows for semantic queries

Considerations Concept mapping helps with document retrieval Does not necessarily imply a fact

– Negation– Differential diagnosis– Past case history

Researchers will likely need aggregated facts from multiple sources to support real research queries

Information extraction options are being explored as part of this work

Topics for today

The Research Symbiosis Some Integration Projects Leveraging

Ontologies A more complete research profile – integrating

research resources and person information Improving query across multiple biospecimen

repositories Identifying disease candidates by leveraging

cross-species anatomy and phenotype queries

Vertebrata

Ascidians

Arthropoda

Annelida

Mollusca

Echinodermata

tetrapod limbs

ampullae

tube feet

parapodia

We want to understand gene function across taxa

Databasing phenotypes is hard

• Free text descriptions• Clinical note• Models• Atlases• Images• Controlled terms• Multiple file formats• Measurements• …

ATTCGGATTACCGTATTA…genes, regulatory elements, …

sequence

Sequence data

Databases proliferate

ATTCGGATTACCGTATTA…genes, regulatory elements, …

sequence

Sequence data

Ontologies as a tool for unification

Disease-Phenotypedatabases

Disease phenotype ontology

Expressiondata

Gene functiondata

Cell and tissueontology GO

annotations

ontologies

Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 25(1). doi:10.1038/75556

Yet problems remains

Incompletedata

Not connected

ontology

Missing & incorrectannotations

MultipleOverlappingOntologies

ontologyontology

ontology

ontologyontology

ontologyontology

ontologyontology

ontologyontology

ontologyontology

ontologyontology

Annotationsmiss the importantbiology

Ontologies built for one species will not work for others

http://fme.biostr.washington.edu:8080/FME/index.html

http://ccm.ucdavis.edu/bcancercd/22/mouse_figure.html

Uberon: a multi-species anatomy ontology

• Contents:– Over 8,000 classes (terms)– Multiple relationships, including subclass, part-of and

develops-from• Scope: metazoa (animals)

– Current focus is chordates– Federated approach for other taxa

• Uberon classes are generic / species neutral– ‘mammary gland’: you can use this class for any mammal!– ‘lung’: you can use this class for any vertebrate (that has lungs)

Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative multi-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5 http://genomebiology.com/2012/13/1/R5

Bridging anatomy ontologies

ZFA

MA FMA

EHDAA2EMAPA

Uberon

CJ Mungall, C Torniai, GV Gkoutos, SE Lewis, MA Haendel.Uberon, an integrative multi-species anatomy ontology. Genome biology 13 (1), R5

SNOMED

NCIt

GO

CL

UBERON

cerebellum

cerebellarvermis

pp

cerebellum

cerebellar vermis

cerebellum

vermis of cereblleum

posterior lobe of

cerebellum

pp

MA:mouseFMA:human

GO/NIF: subcellular GO/NIF: subcellular

axon

CL:Purkinje cell

p

i i

CL:Purkinje cell

axon

i

ii

i

dendrite dendrite

cerebellum posterior

lobe

cerebellum posterior

lobe

p

pp

Uberon enables queries across

granularity

Niknejad, A., Comte, A., Parmentier, G., Roux, J., Frederic, B., & Robinson-rechavi, M. (2011). vHOG , a multi-species vertebrate ontology of homologous or- gans groups. Ecology, 1-5.

http://bgee.unil.ch

Evo-devo applications

Dahdul, et al. 2010. Evolutionary characters, phenotypes and ontologies: curating data from the systematic biology literature. PLoS ONE 5(5):e10708. doi:10.1371/journal.pone.0010708

The Monarch InitiativeThe model systems research network

We are under construction

Goals are to: Aggregate model systems genotype

and phenotype information Integrate with network, genomic, and

functional data Leverage ontologies for phenotype

similarity matching Build knowledge exploration tools for

end users Build services for other applications

Funded by NIH # 1R24OD011883-01

Can we search by phenotype alone?

Washington NL, Haendel MA, Mungall CJ, Ashburner M, et al. (2009) Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation. PLoS Biol 7(11): e1000247. doi:10.1371/journal.pbio.1000247http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1000247

Integrating phenotypes using ontologies

But..different organisms record genotypes differently

Phenotypes can be attached to full or partial genotypes, alleles, or variants

Model systems phenotype and genotype data

Pulling it togetherNIF DISCO

Data ingest Ontology annotation

OWLSIM

Enabling phenotype-based knowledge discovery tools

ONTOQUEST

Extensible Web resource DISCOvery, registration and interoperation framework

MONARCH tools and services

These integration projects…well, integrate

CTSAconnectReveal Connections. Realize

Potential.

OHSU Biolibrary

peopleResearch resources

Clinical encounters

Phenotypes

biospecimens

genes

variations

Conclusions

Ontologies have provided us the capability to integrate a variety of biomedical data, at different levels of granularity, from different applications, and across domains

Describing biology works best with multiple connected ontologies

We need smart data, not just big data We need better tools to integrate multiple ontologies We need better tools to make use of smarter data

structures (e.g. reasoning costs)

Monarch Initiative

CTSAconnect

Biospecimen Ontology

OHSUMelissa HaendelCarlo TorniaiNicole VasilevskyChris KelleherShahim Essaid

Cornell UniversityDean KrafftJon Corson-RikertBrian Lowe

University of FloridaMike ConlonChris BarnesNicholas Rejack

OHSUMelissa HaendelShahim EssaidCarlo Torniai

OHSUMelissa HaendelCarlo TorniaiShahim EssaidNicole VasilevskyScott HoffmanMatt Brush

LBNLChris MungallSuzi LewisNicole Washington

UCSD/NIFMaryann MartoneAnita BandrowskiJeff GretheAmarnath Gupta

Stony Brook UniversityMoises EisenbergErich BremerJanos Hajagos

Harvard UniversityDaniela BourgesSophia Cheng

University at BuffaloBarry SmithDagobert Soergel

ZaloniWill CorbettRanjit DasBen Sharma

University of PittsburghHarry HochheiserChuck Borromeo

top related