from stars to patients: lessons from space …...edm forum edm forum community issue briefs and...
TRANSCRIPT
EDM ForumEDM Forum Community
Issue Briefs and Reports Learn
2015
From Stars to Patients: Lessons from Space Scienceand Astrophysics for Health Care InformaticsS. George DjorgovskiCalifornia Institute of Technology
A. A. MahabalCaltech
D. J. CrichtonJPL
Basit ChaudhryTuple Health
Follow this and additional works at: http://repository.edm-forum.org/edm_briefs
Part of the Health Information Technology Commons
This Conceptual Model is brought to you for free and open access by the Learn at EDM Forum Community. It has been accepted for inclusion in IssueBriefs and Reports by an authorized administrator of EDM Forum Community.
Recommended CitationDjorgovski, S. George; Mahabal, A. A.; Crichton, D. J.; and Chaudhry, Basit, "From Stars to Patients: Lessons from Space Science andAstrophysics for Health Care Informatics" (2015). Issue Briefs and Reports. Paper 19.http://repository.edm-forum.org/edm_briefs/19
From Stars to Pa+ents: Lessons from Space Science and Astrophysics
for the Health Care Informa+cs S.G. Djorgovski, A.A. Mahabal (Caltech), D.J. Crichton (JPL), B. Chaudhry (TupleHealth)
This work was supported in part by a grant from the AcademyHealth Electronic Data Methods Forum, by the Center for Data-‐Driven Discovery at Caltech, and by the Center for Data Science and technology at JPL
How Did the VO Succeed? • All data collected in a digital form • Computer-‐ and data-‐savvy community • Some standard formats in place • Large data collecLons in funded, agency mandated archives
• Established culture of data sharing • Community iniLaLve driven by the needs of an exponenLal data growth
• Federal agency support/funding • Data have no commercial value or privacy issues
A Recommended Path Forward: • Promote mulLdisciplinary collaboraLons between communiLes of pracLce that have developed large scale open data environments (e.g., astronomy, space science, physics) and HCI
• Organize knowledge transfer acLviLes through which best pracLces and lessons learned can be shared between research communiLes to help accelerate progress in HCI
• Work with funding agencies to facilitate mulLdisciplinary collaboraLon around large scale data analysis and infrastructure development
• Facilitate development of training programs in HCI that draw on experLse from other domains
The Key Outstanding Challenges: • The HC community does not yet have a well developed data culture and
technical skills; acquiring them will take some Lme and resources • The data and metadata must be understood, relevant, repeatable, with
standard formats, properly curated, with interoperability protocols • Diverse stakeholders may have colliding interests, yet find common goals • Adequate, but not sLfling privacy protecLon mechanisms are needed
IT Driven ExponenLal Data Growth
MoLvated Community
Solu+on(s) Including: • Data access / standards • Legal and privacy issues • Interoperable data grid • Data analyLcs tools • Trained user community
InvestigatorDatabases
Common Data Elements
Data Infrastructure (Data Storage, Computation, Apache OODT, Apache Hadoop, Apache SOLR)
KnowledgePortal
BioinformaticsClients
and Tools
Study/ProtocolDatabases
LaboratoryData Processing
& StorageRepository
InternalOmics
Database
SemanticRelations
hips ExternalData andServices
ExternalSystems(Other
Databases, Repositories, Services,etc)
PublishDescription
(RDF)
Laboratory/Investigators/NCI Programs
PublicScience Data
Repository
Search, Present/Visualize, Distribute
Access, Process, Retrieve
IngestData
Data and Services Access and Distributon
Topology Decisions Distribution (data, computation) Data Accessibility Network Capacity Computational Capacity Analysis Choreography/Workflow
Data Science Architectural
Tradeoffs
Methodology
Decisions
Software and
Hardware Decisions
Use Cases, Scenarios
Data Science Analytics Framework Data Management Capabilities Storage (e.g., Cloud) Visualization Algorithms/Data Processing Server Resources (HPC, etc) Data Movement Technologies
Output Uncertainty, performance, cost, and computing stack based on a set of capacities
Data Collections Data Products/Objects/Files Metadata Products Data Formats, Data Size
Methods Data Reduction Feature Extraction Classification Detection Fusion
Data Big Data Analytics
Capability
Methodology
Decisions
HPC = High Performance Computing
Instrument Operations Science
Data Processing
Data Distribution
Bioinforma)cs Tools
Instrument Public Biorepository
External Science
Community
Bioinformatics Community
Laboratory Biorepository
Analysis Team
Publish Data Sets
• Scalable Computa+onal Biology Infrastructures (cloud, HPC, etc)
• Local algorithms processing
• On-demand algorithms • Data fusion methods • Machine learning techniques
Results
http://oodt.appache.org
LabCAS -‐ Laboratory Catalog and Archive Service to Integrate the processing and data management capabiliLes to support the automated capture of the data directly from the labs into the knowledge environment.
eCAS -‐ EDRN Catalog and Archive Service to capture both structured and unstructured data using EDRN CDEs to catalog the metadata. Provenance informaLon is also captured (approx. 40 data set and analysis informaLon).
Biomarker Database-‐ to capture and share EDRN biomarker annotaLons (approx. 900 encompass 11 organ and Lssue sites. EDRN researchers have published over 230 papers on these biomarkers, with many more papers in preparaLon).
EDRN Virtual Specimen Repository – access to EDRN distributed specimen repository (approx. 200,000) .
EDRN Protocols – to share scienLfic protocols (approximately 200).
VO provides and federates data and metadata from distributed archives, develops and provides data services, standards, and data analysis and exploraLon services
The mo+va+on and the challenge: Big Data are revoluLonizing nearly every aspect of the modern society. One area where this can have a profound posiLve societal impact is the field of Health Care InformaLcs (HCI), which faces many challenges. The key idea here is: can we use some of the experience and soluLons from the fields that have successfully adapted to the Big Data era, namely astronomy and space science, to help accelerate the progress of HCI?
The Astronomy community’s grassroots response to the challenges and opportuniLes of Big Data was The Virtual Observatory (VO) Concept:
A complete, dynamical, distributed, open research environment for the new astronomy with massive and complex data sets!
Data Sources
Today, VO is the global data grid of astronomy, and is regarded as one of the success stories of the virtual scienLfic organizaLons and cyberinfrastructure
An Approach to HCI It is a complex field with many consLtuencies and goals:
… requiring mulLple soluLons
We surveyed the HCI literature: • Published studies cover: BioinformaLcs; NeuroinformaLcs; Clinical InformaLcs; Public Health InformaLcs; TranslaLonal BioinformaLcs
• Tools/analyLcs used include: Fuzzy trees, Random Forests, Area Under the Curve, SensiLvity, Specificity, Social Networks and Crowdsourcing
• The most frequently cited problems include:
o Lack of interoperability, standards
o Especially across data types: paLents, disease, treatments, healthcare management
• There is great diversity in plagorms/systems used, a likely hindrance to reproducibility: CareWeb, Caradigm Amalga, CER Hub, RedX, GLORE etc.
We surveyed the publicly available data sets: • Most data are not publicly exposed due to: Proprietary nature; Monetary interests; InformaLon blocking; Privacy issues
• Typical available data sets cover molecular, Lssue, paLent, populaLon studies – but not in a connected fashion (diversity of formats and access mechanisms)
• Typical sizes of the publicly available data range from kB (highly derived data products) to GB-‐TB or even PB (raw instrument data)
• Most data sets do not have enough: metadata, standards, provenance
• Privacy protecLon limits hinder the aiempts at reproducibility and possibiliLes of connecLng mulLple datasets easily
• There is growing emphasis on sustainability, rewards systems, usability
Health Care Data Domains/Stakeholders
Research Biomedial, Clinical
HC Delivery Clinical, PaLents
Management
HC Providers: Hospitals, HMOs, …
Commercial: Insurance, Pharma, …
Government: Local, State, Federal
An Example of a Successful Methodology Transfer From Space Science to Medicine:
Designing a scalable architecture:
Using a state-‐of-‐the-‐art informaLcs infrastructure developed at JPL and leveraging successful efforts to build similar open source systems in space science open-‐source Object Oriented Data Technology (OODT) to design and an effecLve naLonal integrated Knowledge System for the NaLonal Cancer InsLtute’s Early DetecLon Research Network (EDRN).
Conclusions and Recommenda+ons:
The EDRN Public Portal: A resource for collaboraLve science: hip://cancer.gov/edrn
As the example above shows, data methodology transfer from astronomy and space science into HCI is clearly possible, at least within a given domain. However, the greater complexity and sociological issues will likely delay progress.