foundations for discovery informatics
TRANSCRIPT
Foundations for Discovery Informatics
Philip E. Bourne, Ph.D.
Associate Director for Data Science
What I am going to describe is not a body of research, but an approach
we are taking to (hopefully) facilitate discovery informatics…
I am here to get your feedback
http://saltypeppergames.com/why-your-feedback-matters/
Motivation:My students report the same problems I experienced as a
graduate student many years ago
• What software exists for a task?
• How good is that software?
• Where is the data?
• How accessible is that data?
Answers to These Questions Today is Thwarted By…
• Quality is by trial and error or by limited word of mouth
• Data and software outside of recognized and funded resources atrophies very quickly
• The connection between research objects is frequently non-existent
In summary (from a funders perspective):
The informatics has advanced, but the discovery is still in the dark ages
I am going to step you through what we are putting in place to address
discovery and moreIt is embodied in the notion of a Commons and a series of funding
initiatives
http://pebourne.wordpress.com/2014/10/07/the-commons/
http://www.economist.com/node/11848182
Definitions
• Research Object (RO)– Any discrete component of the research lifecycle –
grants, software, data, narrative etc.• Research Object Identifier (ROI)
– A unique community agreed upon unique identifier for a research object
• Commons– A long range objective– A shared space where research objects reside which
is hardware agnostic• Big Data to Knowledge (BD2K)
– An extramural research program – FY15 ~$80M
Further Motivation for the Commons
• Increasing the productivity of scientists• A need to share research objects• Making research objects FAIR:
– Findable
– Accessible – Interoperable– Reusable
• The need to take computing to the data
What The Commons Is and Is Not
• The Commons is not:– A replacement for well characterized data
resources/databases– An IT investment or system owned or operated by NIH
• The Commons is:– Exploiting emergent cloud computing capabilities– Utilizing existing extramural research infrastructure – Supported by resources and tools from BD2K e.g. The
Data Discovery Index (DDI)– An agile experiment that might lead to a long term effort
The Commons Begins with a Set of “Commons-compliant” Resources
• Public cloud providers, national labs, HPC facilities have expressed interest in participating
• What it means to be compliant will change over time, initially it will mean little more that an agreement to share research objects with appropriate access controls
The Next Step in Compliance will be the Assignment of ROIs
• An ROI uniquely defines and resolves the research object
• DOIs are one option for more persistent research objects and already fully embraced by the scholarly community
• Another handle system might be more appropriate for large numbers of transient ROIs
• Subject of a joint US-EU meeting in a week
What is the incentive to participate?
The business model (a model of supply and demand) is perhaps the
most critical part of this initiative and I will come to that
Various BD2K Initiatives will Enable the Commons …
• The DDICC will create an index of discoverable research objects, starting with the Commons.
• Hopefully that index will:– Resolve the location of RO’s– Contain an increasing amount of metadata
about different types of RO– Contain usage statistics pertaining to that RO– Offer crowd sourced commentary on that RO
Various BD2K Initiatives will Enable the Commons
• A software coordination center (tentative)
• National Standards Information Framework (NSIR) – FY15
• BD2K Consortium (12 centers +)
• Software development efforts– Compression - Visualization
– Wrangling - Provenance
• Training
Incentives: The Commons as a Credit Model
• Only pay for compute used
• Drives competition - Better value for money?
• Drives scientific research into the Commons?
• Facilitates public-private partnership?
* These are open questions to be tested
Cloud ProviderN
Cloud ProviderA
The Commons
Digital ObjectSteward
CommonsConsortium
NIH
Distributes “CommonsCredits”
Bills Partnership
Makes Digital Objects AvailableUsing Commons Credits
Bills Partnership
Provides Funding
Business andPhilanthropic Groups
Provides Funding
...
Commons Credit Model
The Commons: Status
• Standing up pilots:– With BD2K Centers e.g. BEACON– With reference data sets– With intramural researchers
Evaluation Questions (next 18 months)• Is sharing enhanced?
• Do researchers feel they benefit?
• Is there the sense the commons improves productivity?
• Is it easier to find research objects via the Commons than otherwise?
• Is there evidence of a more sustainable environment?
Dream Outcome - General
• In 5 years biomedical research productivity increases 1%
• More research can be sustained than would otherwise be the case
• Disparate Commons efforts begin to merge
Dream Outcome – Discovery Informatics
• Hoifung– Papers are examples of research objects for
which shared APIs exist in the Commons
– A variety of pathway extraction tools can be selected from the DDI and consensus views established, which are also accessible in the Commons
Dream Outcome – Discovery Informatics
• Bruce, Peter– Crowd sourcing of curation– Iterative model building
• Larry– Better utilization of existing data resources– Supports the notion of cloud laboratories
Acknowledgements
Commons BD2K Efficiency
• Cloud – Data & Compute
• Search• Security • Reproducibility
Standards• App Store
• Coordinate• Hands-on• Syllabus• MOOCs
• Community• Centers• Training Grants• Catalogs• Standards• Analysis
• Data Resource Support
• Metrics• Best
Practices• Evaluation• Portfolio
Analysis
Partnerships
Deliverable
Example Features • IC’s• Researchers• Federal
Agencies• International
Partners• Computer
Scientists
Training
Vivien Bonazzi
GeorgeKomatsoulis
(NCBI)
Mark Guyer
MichelleDunn
JennieLarkin
LeighFinnegan
BethRussell
NIHNIH……Turning Discovery Into HealthTurning Discovery Into Health