biomedical informatics; cg chute - ?· biomedical informatics; cg chute © mayo clinic college of...

Download Biomedical Informatics; CG Chute - ?· Biomedical Informatics; CG Chute © Mayo Clinic College of Medicine…

Post on 01-Sep-2018

212 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Biomedical Informatics; CG Chute

    Mayo Clinic College of Medicine 201 1

    Integration of Knowledge and Data from Science to Clinical Practice

    Christopher G. Chute, MD DrPHBloomberg Distinguished Professor of Health InformaticsProfessor of Medicine, Public Health, and NursingChief Health Research Information OfficerDeputy Director, Institute for Clinical and Translational ResearchJohns Hopkins University, Baltimore, MD, USA

    Advancing Research in the Digital AgeBaltimore, 27 February 2018

    2007 Mayo2

    The Historical Center of theHealth Data Universe

    Clinical Data

    Billable Diagnoses

    Billable Diagnoses

    2007 Mayo3

    Copernican Healthcare

    Billable Diagnoses

    Clinical Data(Niklas Koppernigk)

    Clinical Guidelines

    Scientific LiteratureMedical Literature

    Clinical Data

    Origins of Big ScienceAstronomy

    4

    Sloan Digital Sky Survey III DR9

    Total area of imaging

    31,637 square degrees

    Image field size 1361x2048 pixels

    Number fields 938,046 (excluding supernovae runs)Catalog objects 1,231,051,050Number of unique, primary sources Total 469,053,874Stars 260,562,744Galaxies 208,478,448Unknown 12,682 5

    Images Spectra Object catalog Metadata

    Rutherford Table-top Experiment

    6

  • Biomedical Informatics; CG Chute

    Mayo Clinic College of Medicine 201 2

    That Higgs Boson

    7

    600 institutions10,000 scientists800 trillion collisions200PB of data =21017 bytes of data

    Boarding on an astronomical number in its own right! $13.25B USD

    Dimensionality of Higgs Big Data

    Mass/Energy Direction Charge

    8

    Medicine is more complicated than that

    Dimensionality of Big Data Broad

    Small amounts of data; Huge number observations National Claims data

    Deep Large amounts of data; Few observation NGS Complete Genome

    Rich Broad and Deep Clinical Phenotyping data (EMRs)

    Labs, Vitals, Exam, Waveform, Images, Omics, Social, environmental, diet,

    9 10

    BiocomputingUnimaginable potential, next 60 years

    Moores had a long reign 1013 fold increase in computing power / 60 yrs World-class Supercomputers of 1990

    Game platforms and cell phones of 2015 (GPUs) Extensions to networks, memory, & storage

    110 baud to 100Gb backbone (1010 / 60 yrs)Vacuum tubes to 1Tb RAM (1012 / 60 yrs)Paper-tape to Pb drives (1015 / 60 yrs)Cloudy computing (1010 / 10 yrs)

    Synergistic Summary 1060 / 60 yrs

    11

    Biomedical ScienceAn Information Intensive Enterprise

    Biomedical discovery dependent upon: Web of data, information, and knowledge Public repositories of omics data

    Genes, proteins, small molecules, pathways Associations, annotations, dynamic pathways

    Clinical data is becoming plentiful - EHRs Privacy and confidentiality limit public access De-identification remains asymptotic ... enabled by computational capacity enhanced with linkage and connections

    Some Fashionable Biology DatabasesThis Week

    12

  • Biomedical Informatics; CG Chute

    Mayo Clinic College of Medicine 201 3

    Some Fashionable Clinical DatabasesThis Week

    13

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    Estim

    ated

    Act

    ivity

    Biology Medicine

    The Continuum Of Biomedical InformaticsBioinformatics meets Medical Informatics

    14

    The Chasm of Semantic Despair

    15

    From Practice-based Evidenceto Evidence-based Practice

    ClinicalDatabases

    Registries et al.

    ClinicalGuidelines

    ExpertSystems

    Data Inference

    KnowledgeManagement

    Decisionsupport

    StandardsComparability and Consistency

    Terminologies & Data Models

    Foundations for Learning Health System

    PatientEncounters

    MedicalKnowledge

    NCATS Translator Program

    16June, 2016

    Chris ChuteClinical terminologies,

    frameworks & text mining

    Melissa HaendelDevelopmental bio, model organisms &

    ontologies

    Chris MungallSemantic engineering &

    similarity algorithms

    Peter RobinsonClinical ontologies &

    diagnostic algorithms,Medical genetics

    Hongfang LiuText mining, pathway

    modeling, data normalization

    Ben GoodCrowdsourcing,

    community curation, semantic web

    Andrew SuData integration, data

    services

    Shannon McWeeney

    cancer imaging, omics, drugs, flow cytometry,

    machine learning

    Maureen HoatlinFanconi Anemia, rare

    disease, biochemistry, basic research

    Julie McMurryProject Management,

    User Experience, Public Health

    David KoellerClinician of inborn

    errors of metabolism, Undiagnosed

    diseases

    Casey OverbyKnowledge-based

    methods & evaluation of precision medicine

    applications

    Guoqian JiangClinical data standards

    & Interoperability, semantic modeling &

    validation

  • Biomedical Informatics; CG Chute

    Mayo Clinic College of Medicine 201 4

    11

    22

    33

    The Monarch InitiativeMelissa Haendel, Chris Mungall

    Integrate, align, and re-distribute cross-species gene, genotype, variant, disease, and phenotype data

    Provide a portal for exploration of phenotype-based similarity Facilitate identification of animal models of human disease

    through phenotypic similarity Enable quantitative comparison of cross-species phenotypes Develop embeddable widgets for data exploration Influence genotype and phenotype reporting standards Improve ontologies to better curate genotype-phenotype

    data make all the data count monarchinitiative.org

    21Image from Washington et al, 2009 PMC2774506

    Computational Comparison of Animal Model Phenotypes with Human Disease

    Comparisons computationally established using ontological characterizations

    Robinson et al, 2014.

    PMC3912424

    22

    Leadership in existing projects to be leveraged in the translator

    Team Experienced in Open Source Efforts and Public Data

    Dissemination / deployment

    Extraction of knowledge from diverse sources

    Data & knowledge integration,

    algorithms & tools

    Contribution in other projects

    1

    2

    3

    Leadership in other projects

    OBO Foundry

    Extraction of knowledge from diverse sources

    My*

    Tools to integrate vocabularies Tools to identify equivalencies:

    Identifier and synonym alignment Conceptual alignment based upon logic

    determinations and prior probabilities Text mining for clinical concepts and

    pathway fragments

    TRANSMED KNOWLEDGE GRAPH

    kBOOM

  • Biomedical Informatics; CG Chute

    Mayo Clinic College of Medicine 201 5

    GRAPH INFERENCE OWL-based reasoning over large graphs BELpathway chains of causation Probabilistic inference across disease-phenotype-

    gene Bayesian Ontology Query Algorithm (Boqa)

    GRAPH QUERY Query related entities within/across species,

    sources Query similar sets of entities (OWLsim) Sets of phenotypes Expression patterns Pathway modules

    TRANSMED KNOWLEDGE GRAPH

    Data & knowledge integration, algorithms & tools

    Example queryWhat pathways are uniquely targeted by stem cell therapy pre-conditioning drugs that are

    well- vs poorly- tolerated by FA patients?

    Query Path with Semantic TypesDrug - [molecularly_controls] > Protein - [encoded_by] >Gene [member_of] > Pathway|| ||

    drugbank:DB01073 - [molecularly_controls] > Uniprot:P09884 - [encoded_by] > HGNC:9173 [member_of] > WikiPath:WP2446(Fludarabine) (POLA1 protein) (POLA1 gene) (Rb Pathway)

    Data Types and Sources:1. Drug-Protein Interactions

    a. from DrugBank via BioThings APIb. from DGIdb, via DGIdb API

    2. Protein-Gene Associationsa. from Ensembl via Ensembl API or BioLink APIb. from Uniprot via Wikidata API

    3. Gene-Pathway Membershipa. from Wikipathway via Wikidata APIb. from Reactome via BioLink API

    (Query implemented in OrangeQ2.4_Drug_Gene_Pathway Jupyter notebook)

    Pharmaceutical Product

    RxNorm CUIEMEA ID

    Chemical CompoundRxNorm CUI, CAS, Pubchem, InChIKey, Drugbank, UNII, ChEMBL, ChEBI, NDF-RT

    active ingredient

    has active ingredient

    DiseaseOMIM, MeSH, ICD-9, DOID, ICD-10, UMLS, Orphanet genetic association

    Drug used for treatment

    GeneEnsembl, Entrez, Refseq, HGNC, OrthologStrand, ChromosomeGenomic start, end, taxon

    ProteinEnsembl, uniprot, Refseqfunction, cell component, biological process (GO term)Has part (IPR domain)Subclass of (IPR family), taxon

    encodes

    encoded by

    Physically interacts withAs (agonist, inhibitor, modulator..

    )

    Variant

    Chromosome, genomic start/end, protein, CIViC variant ID

    negative therapeutic predictorpositive therapeutic predictor

    medical condition treated

    negative prognostic predictorpositive prognostic predictorpositive diagnostic predictornegative diagnostic predictor

    Medical condition treatedmedicine marketing autho.

    Medical condition treated

    RED indicates new type or predicate (since Sept) ~150k items added - Garbanzo API (knowledge beacon impl)

    Biological PathwayWikiPathways IDHas part

    895,065 110,306 535,409 46,3568694

    1,081

    31

    Total counts New

    216

    1,165

    1,259

    250

    2506 157,340

    adaptedfromRussellandNorvig,2009Credit: NCATS Translator Team Ultraviolet, Broad Institute

    TheKnowledgeMapisasetofactionstheagentcanperform

    Credit: NCATS Translator Team Ultraviolet, Broad Institute