the national center for biomedical ontology · data integration efforts are laborious barriers to...

21
Copyright © Daniel Rubin Stanford University [email protected] 1 The National Center for The National Center for Biomedical Ontology Biomedical Ontology Daniel Rubin, MD, MS Stanford Medical Informatics Stanford – Berkeley Mayo – Victoria – Buffalo UCSF – Oregon – Cambridge http://www.bioontology.org Copyright © Daniel L. Rubin 2006 Explosion in online biomedical data Genomics (genetic sequences, SNPs) Gene expression microarrays Proteomics (mass spectrometry, protein arrays) Tissue arrays, ICH Need for people & machines to make sense of massive data sets The biomedical data explosion The biomedical data explosion

Upload: others

Post on 30-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The National Center for Biomedical Ontology · Data integration efforts are laborious Barriers to accessing and effectively ... ontologies for data annotation Mayo Clinic: Tools for

Copyright © Daniel Rubin Stanford University [email protected] 1

The National Center for The National Center for Biomedical OntologyBiomedical Ontology

Daniel Rubin, MD, MSStanford Medical Informatics

Stanford – Berkeley Mayo – Victoria – Buffalo

UCSF – Oregon – Cambridgehttp://www.bioontology.org

Copyright © Daniel L. Rubin 2006

● Explosion in online biomedical dataGenomics (genetic sequences, SNPs)Gene expression microarraysProteomics (mass spectrometry, protein arrays)Tissue arrays, ICH

● Need for people & machines to make sense of massive data sets

The biomedical data explosionThe biomedical data explosion

Page 2: The National Center for Biomedical Ontology · Data integration efforts are laborious Barriers to accessing and effectively ... ontologies for data annotation Mayo Clinic: Tools for

Copyright © Daniel Rubin Stanford University [email protected] 2

Copyright © Daniel L. Rubin 2006

Biomedical researchers use ontologiesBiomedical researchers use ontologies

● Controlled vocabulary for science

● Representation of biomedical knowledge, shared by humans and computers

● Terms for annotating experimental data

● Knowledge source for biomedical applicationsDecision supportNatural language-processingData integration

Ontologies are popping up everywhere

Page 3: The National Center for Biomedical Ontology · Data integration efforts are laborious Barriers to accessing and effectively ... ontologies for data annotation Mayo Clinic: Tools for

Copyright © Daniel Rubin Stanford University [email protected] 3

Copyright © Daniel L. Rubin 2006

Ontology development is fragmentedOntology development is fragmented

● Many different groups/consortia create ontologies—efforts are uncoordinated

● Many different ontologies, overlapping content and variable quality

● Ontologies are not interoperable

● Data integration efforts are laborious

● Barriers to accessing and effectively using numerous existing ontologies

Copyright © Daniel L. Rubin 2006

● Consortium of informaticians, biologists, clinicians, and ontologists, funded by the NIH Roadmap

● Ontology research and servicesOntology access, alignment, and managementOntology-based annotation of large data setsEnhance quality of ontology developmentCollaboration with diverse biomedical projects

Page 4: The National Center for Biomedical Ontology · Data integration efforts are laborious Barriers to accessing and effectively ... ontologies for data annotation Mayo Clinic: Tools for

Copyright © Daniel Rubin Stanford University [email protected] 4

Copyright © Daniel L. Rubin 2006

● Stanford: Tools for ontology search, alignment, versioning, and peer review

● Lawrence Berkeley Labs: Tools to use ontologies for data annotation

● Mayo Clinic: Tools for access to large controlled terminologies

● Univ. of Victoria: Tools for ontology visualization● Univ. at Buffalo: Dissemination of best practices

for ontology engineering● Univ. of Cambridge, Univ. of Oregon, UCSF:

Driving biomedical projects

National Center for Biomedical Ontology

Capture and index

experimental results

BIOMEDICALTHEORY

EXPERIMENTALDATA

Relate experimental

data to results from other

sources

Visualizationand

Analysis

Biomedical OntologiesAnnotations on

Experimental Data

Information Integration

IFF

LIGAND

ENZYME

GENE

IFF

LIGAND

ENZYME

GENE

Open Biomedical Ontologies (OBO)

Open BiomedicalData (OBD)

BioPortalIFF

LIGAND

ENZYME

GENE

IFF

LIGAND

ENZYME

GENE

National Center for Biomedical Ontology

Capture and index

experimental results

BIOMEDICALTHEORY

EXPERIMENTALDATA

Relate experimental

data to results from other

sources

Visualizationand

Analysis

Biomedical OntologiesAnnotations on

Experimental Data

Information Integration

IFF

LIGAND

ENZYME

GENE

IFF

LIGAND

ENZYME

GENE

Open Biomedical Ontologies (OBO)

Open BiomedicalData (OBD)

BioPortalIFF

LIGAND

ENZYME

GENE

IFF

LIGAND

ENZYME

GENE

IFF

LIGAND

ENZYME

GENE

IFF

LIGAND

ENZYME

GENE

Page 5: The National Center for Biomedical Ontology · Data integration efforts are laborious Barriers to accessing and effectively ... ontologies for data annotation Mayo Clinic: Tools for

Copyright © Daniel Rubin Stanford University [email protected] 5

Copyright © Daniel L. Rubin 2006

Copyright © Daniel L. Rubin 2006

Core technologies for BioPortalCore technologies for BioPortal● Protégé:

Ontology visualizationOntology alignment and version diff

● LexGrid: Defines common information model for terminology contentAccess to controlled terminologies in many different formatsOntology content indexing and search

● Tiered Web app/services architecture

Page 6: The National Center for Biomedical Ontology · Data integration efforts are laborious Barriers to accessing and effectively ... ontologies for data annotation Mayo Clinic: Tools for

Copyright © Daniel Rubin Stanford University [email protected] 6

Copyright © Daniel L. Rubin 2006

BioPortal Architecture:BioPortal Architecture:Unifying ontologies and annotationsUnifying ontologies and annotations

Copyright © Daniel L. Rubin 2006

Biomedical ontology challengesBiomedical ontology challenges

● Find ontologies or terms of interest

● Visualize and navigate ontologies and annotated data

● Support distributed, collaborative ontology development

● Enable community-based evaluation of ontology quality

● Use ontologies to annotate data, and use annotations to make discoveries

Page 7: The National Center for Biomedical Ontology · Data integration efforts are laborious Barriers to accessing and effectively ... ontologies for data annotation Mayo Clinic: Tools for

Copyright © Daniel Rubin Stanford University [email protected] 7

Copyright © Daniel L. Rubin 2006

Biomedical ontology challengesBiomedical ontology challenges

● Find ontologies or terms of interest

● Visualize and navigate ontologies and annotated data

● Support distributed, collaborative ontology development

● Enable community-based evaluation of ontology quality

● Use ontologies to annotate data, and use annotations to make discoveries

Copyright © Daniel L. Rubin 2006

LexGridLexGrid::Ontology indexing and searchOntology indexing and search

● Terminological servicesSearch for ontology termsUse homophone, exact and partial searchMap free-text to ontologies

● Ontology indexes and servicesLucene index on ontology terms, definitions, and synonymsGlobal identifiers for termsOntology version information

Page 8: The National Center for Biomedical Ontology · Data integration efforts are laborious Barriers to accessing and effectively ... ontologies for data annotation Mayo Clinic: Tools for

Copyright © Daniel Rubin Stanford University [email protected] 8

Copyright © Daniel L. Rubin 2006

LexGridLexGrid

Dynamic auto-completion of closest matching term

Copyright © Daniel L. Rubin 2006

Mapping databases and text to Mapping databases and text to ongologiesongologies

Prostate DuctalAdenocarcinoma

Work by Nigam Shah

Page 9: The National Center for Biomedical Ontology · Data integration efforts are laborious Barriers to accessing and effectively ... ontologies for data annotation Mayo Clinic: Tools for

Copyright © Daniel Rubin Stanford University [email protected] 9

Copyright © Daniel L. Rubin 2006

Biomedical ontology challengesBiomedical ontology challenges

● Find ontologies or terms of interest

● Visualize and navigate ontologies and annotated data

● Support distributed, collaborative ontology development

● Enable community-based evaluation of ontology quality

● Use ontologies to annotate data, and use annotations to make discoveries

Copyright © Daniel L. Rubin 2006

Degree of Interest modelingDegree of Interest modeling

● Creates a user profile to identify relevant information

● Developed by monitoring the user activities (e.g. navigation actions, editing and annotations)

● Permits model-based highlighting or filteringof “interesting” entities in the ontology

● Based on Degree of Interest Trees (Stuart Card) and Mylar (Mik Kersten)

Page 10: The National Center for Biomedical Ontology · Data integration efforts are laborious Barriers to accessing and effectively ... ontologies for data annotation Mayo Clinic: Tools for

Copyright © Daniel Rubin Stanford University [email protected] 10

Copyright © Daniel L. Rubin 2006

DIaMONDDIaMOND

● Degree of Interest Modeling for Ontology Navigation and Development

● Integrates Mylar degree of interest model (DOI) for Eclipse with Protégé

● Uses the DOI to provide adaptive visualizations of the ontology

Work by Tricia d’Entremont

Copyright © Daniel L. Rubin 2006Without DOIWithout DOI DOI HighlightingDOI Highlighting DOI FilteringDOI Filtering

Page 11: The National Center for Biomedical Ontology · Data integration efforts are laborious Barriers to accessing and effectively ... ontologies for data annotation Mayo Clinic: Tools for

Copyright © Daniel Rubin Stanford University [email protected] 11

Copyright © Daniel L. Rubin 2006

PictorialPictorial--guided ontology guided ontology navigationnavigation

● Users often interested in ontology subset pertinent to

Biological scale (organ/tissue/cell/molecule)Image regions (locations, components)

● Strategy: browse ontology views driven by the biological scale of the image

● Accomplished by annotating multi-scale images using ontologies to describe their contents

● Also enables image retrieval driven by ontology

Copyright © Daniel L. Rubin 2006

PictorialPictorial--guided ontology guided ontology navigationnavigation

Work by Nigam Shah

Page 12: The National Center for Biomedical Ontology · Data integration efforts are laborious Barriers to accessing and effectively ... ontologies for data annotation Mayo Clinic: Tools for

Copyright © Daniel Rubin Stanford University [email protected] 12

Copyright © Daniel L. Rubin 2006

Navigating by different image scaleNavigating by different image scale

Copyright © Daniel L. Rubin 2006

Biomedical ontology challengesBiomedical ontology challenges

● Find ontologies or terms of interest

● Visualize and navigate ontologies andannotated data

● Support distributed, collaborative ontology development

● Enable community-based evaluation of ontology quality

● Use ontologies to annotate data, and use annotations to make discoveries

Page 13: The National Center for Biomedical Ontology · Data integration efforts are laborious Barriers to accessing and effectively ... ontologies for data annotation Mayo Clinic: Tools for

Copyright © Daniel Rubin Stanford University [email protected] 13

Copyright © Daniel L. Rubin 2006

Visualizing ontologyVisualizing ontology--annotated annotated clinical trial dataclinical trial data

Different clinical trials vary in treatments, inclusion, methodology, etc.

Work by Maleh Hernandez

Copyright © Daniel L. Rubin 2006

Visualizing differences in Visualizing differences in NevirapineNevirapine trialstrials

Page 14: The National Center for Biomedical Ontology · Data integration efforts are laborious Barriers to accessing and effectively ... ontologies for data annotation Mayo Clinic: Tools for

Copyright © Daniel Rubin Stanford University [email protected] 14

Copyright © Daniel L. Rubin 2006

Biomedical ontology challengesBiomedical ontology challenges

● Find ontologies or terms of interest

● Visualize and navigate ontologies and annotated data

● Support distributed, collaborative ontology development

● Enable community-based evaluation of ontology quality

● Use ontologies to annotate data, and use annotations to make discoveries

Copyright © Daniel L. Rubin 2006

Challenges for community Challenges for community ontology developmentontology development

● Need to communicate ways to improve & evolve ontologies

Missing attributes (e.g., definitions)Class too broad, should be split or deletedClass should be moved, renamed

● Current approach: email lists, F2F meetingsOntology feedback is disconnected from the ontologyCannot determine what parts of ontologies are stable, contentious, or evolving

Page 15: The National Center for Biomedical Ontology · Data integration efforts are laborious Barriers to accessing and effectively ... ontologies for data annotation Mayo Clinic: Tools for

Copyright © Daniel Rubin Stanford University [email protected] 15

Copyright © Daniel L. Rubin 2006

Example email communications Example email communications from from fugofugo--discussdiscuss

● I'd like to propose a few relationships between some higher level classes:

study executes study_designstudy_design has_factor owl:Thing

● What is the definition of biomaterial?

● Should biomaterial be a subclass of FuGO_54 study_object?

Copyright © Daniel L. Rubin 2006

Ontology Ontology ““marginal notesmarginal notes””

● Structured annotations on ontologies and their contents

● Capture community feedback on ontologies

● Localized to parts of ontology to which they apply

● Make explicit the types of ontology evolutionary changes

Page 16: The National Center for Biomedical Ontology · Data integration efforts are laborious Barriers to accessing and effectively ... ontologies for data annotation Mayo Clinic: Tools for

Copyright © Daniel Rubin Stanford University [email protected] 16

Copyright © Daniel L. Rubin 2006

Ontology marginal notesOntology marginal notes

Work by Ravi Tiruvury andKaustubh Supekar

Copyright © Daniel L. Rubin 2006

Biomedical ontology challengesBiomedical ontology challenges

● Find ontologies or terms of interest

● Visualize and navigate ontologies and annotated data

● Support distributed, collaborative ontology development

● Enable community-based evaluation of ontology quality

● Use ontologies to annotate data, and use annotations to make discoveries

Page 17: The National Center for Biomedical Ontology · Data integration efforts are laborious Barriers to accessing and effectively ... ontologies for data annotation Mayo Clinic: Tools for

Copyright © Daniel Rubin Stanford University [email protected] 17

Copyright © Daniel L. Rubin 2006

Ontology metadata and peer Ontology metadata and peer reviewreview

● Variable ontology quality; no venue for community rating of ontologies

● Building a peer review platform for ontologies based on “Web of Trust”

● Providing tools to enable community to evaluate and improve ontology quality

Copyright © Daniel L. Rubin 2006

Peer review of ontologiesPeer review of ontologies

Metadata Ontology

Web-based tools to enter ontology metadata, post reviews, and rate reviewers

Work by KaustubhSupekar

Page 18: The National Center for Biomedical Ontology · Data integration efforts are laborious Barriers to accessing and effectively ... ontologies for data annotation Mayo Clinic: Tools for

Copyright © Daniel Rubin Stanford University [email protected] 18

Copyright © Daniel L. Rubin 2006

Biomedical ontology challengesBiomedical ontology challenges

● Find ontologies or terms of interest

● Visualize and navigate ontologies and annotated data

● Support distributed, collaborative ontology development

● Enable community-based evaluation of ontology quality

● Use ontologies to annotate data, and use annotations to make discoveries

Copyright © Daniel L. Rubin 2006

Preliminary resultsPreliminary results

● Two unrelated biomedical knowledge sources (ZFIN, OMIM)

● Each annotated using ontologies to describe phenotypes

● Search for similar phenotype annotations discover disease genes

● Example: holoprosencephaly genes

Page 19: The National Center for Biomedical Ontology · Data integration efforts are laborious Barriers to accessing and effectively ... ontologies for data annotation Mayo Clinic: Tools for

Copyright © Daniel Rubin Stanford University [email protected] 19

Copyright © Daniel L. Rubin 2006

SHH-/+

SHH-/-

shh-/+

shh-/-

ZEBRAFISH HUMAN

SHH gene was known to be associated with human holopros-encephaly

Are any other genes assoc?

SHH gene was known to be associated with human holopros-encephaly

Are any other genes assoc?

Copyright © Daniel L. Rubin 2006

Phenotype(clinical sign) = entity + quality

P1 = eye + hypoteloricP2 = midface + hypoplastic P3 = kidney + hypertrophied

Syndrome = P1 + P2 + P3(disease)

= holoprosencephaly

Encoding disease phenotypesEncoding disease phenotypes

Page 20: The National Center for Biomedical Ontology · Data integration efforts are laborious Barriers to accessing and effectively ... ontologies for data annotation Mayo Clinic: Tools for

Copyright © Daniel Rubin Stanford University [email protected] 20

Copyright © Daniel L. Rubin 2006

Human holo-prosencephaly

Zebrafishshh

Zebrafishoep

Similar phenotypes

Gene homology?Finding human disease genesFinding human disease genes

1. Search ontology-annotated data for genes with similar phenotypes

2. Orthologs in human may cause disease?

Copyright © Daniel L. Rubin 2006

National Center for Biomedical Ontology

Capture and index

experimental results

BIOMEDICALTHEORY

EXPERIMENTALDATA

Relate experimental

data to results from other

sources

Visualizationand

Analysis

Biomedical OntologiesAnnotations on

Experimental Data

Information Integration

IFF

LIGAND

ENZYME

GENE

IFF

LIGAND

ENZYME

GENE

Open Biomedical Ontologies (OBO)

Open BiomedicalData (OBD)

BioPortalIFF

LIGAND

ENZYME

GENE

IFF

LIGAND

ENZYME

GENE

National Center for Biomedical Ontology

Capture and index

experimental results

BIOMEDICALTHEORY

EXPERIMENTALDATA

Relate experimental

data to results from other

sources

Visualizationand

Analysis

Biomedical OntologiesAnnotations on

Experimental Data

Information Integration

IFF

LIGAND

ENZYME

GENE

IFF

LIGAND

ENZYME

GENE

Open Biomedical Ontologies (OBO)

Open BiomedicalData (OBD)

BioPortalIFF

LIGAND

ENZYME

GENE

IFF

LIGAND

ENZYME

GENE

IFF

LIGAND

ENZYME

GENE

IFF

LIGAND

ENZYME

GENE

Page 21: The National Center for Biomedical Ontology · Data integration efforts are laborious Barriers to accessing and effectively ... ontologies for data annotation Mayo Clinic: Tools for

Copyright © Daniel Rubin Stanford University [email protected] 21

Copyright © Daniel L. Rubin 2006

AcknowledgementsAcknowledgements

● National Center for Biomedical OntologyExecutive Team: Mark Musen, Suzanna Lewis, Daniel Rubin, Sima MisracBiO staff: Natasha Noy, Tim Redmond, Lynn Murphy, ArchanaVerbakam, Chris Mungall, John Day-Richter, Mark Gibson, ShengQiang Shu, Nicole Washington, Harold Solbrig, Deepak Sharma, James Buntrock, Tom Johnson, Chris CallendarCollaborators: Michael Ashburner, Monte Westerfield, Ida Sim, Chris Chute, Barry Smith, Peggy Storey, Richard Olshen, Werner Ceusters, Deborah McGuinnessStudents & post-docs: Kaustubh Supekar, Nigam Shah, FabianNeuhaus, Tricia d'Entremont, Maria-Elena Hernandez, Sean Falconer, Ravi Tiruvury

● Funded through NIH Roadmap for Medical Research grant U54 HG004028

Program officer: Peter Good (NIGMS)Lead Science Officer: Carol Bean (NCRR)

Copyright © Daniel L. Rubin 2006

Contact informationCenter: [email protected]

Thank you.Thank you.