foundations for discovery informatics

23
Foundations for Discovery Informatics Philip E. Bourne, Ph.D. Associate Director for Data Science [email protected]

Upload: philip-bourne

Post on 12-Jul-2015

615 views

Category:

Education


4 download

TRANSCRIPT

Page 1: Foundations for Discovery Informatics

Foundations for Discovery Informatics

Philip E. Bourne, Ph.D.

Associate Director for Data Science

[email protected]

Page 2: Foundations for Discovery Informatics

What I am going to describe is not a body of research, but an approach

we are taking to (hopefully) facilitate discovery informatics…

I am here to get your feedback

http://saltypeppergames.com/why-your-feedback-matters/

Page 3: Foundations for Discovery Informatics

Motivation:My students report the same problems I experienced as a

graduate student many years ago

• What software exists for a task?

• How good is that software?

• Where is the data?

• How accessible is that data?

Page 4: Foundations for Discovery Informatics

Answers to These Questions Today is Thwarted By…

• Quality is by trial and error or by limited word of mouth

• Data and software outside of recognized and funded resources atrophies very quickly

• The connection between research objects is frequently non-existent

Page 5: Foundations for Discovery Informatics

In summary (from a funders perspective):

The informatics has advanced, but the discovery is still in the dark ages

Page 6: Foundations for Discovery Informatics

I am going to step you through what we are putting in place to address

discovery and moreIt is embodied in the notion of a Commons and a series of funding

initiatives

http://pebourne.wordpress.com/2014/10/07/the-commons/

http://www.economist.com/node/11848182

Page 7: Foundations for Discovery Informatics

Definitions

• Research Object (RO)– Any discrete component of the research lifecycle –

grants, software, data, narrative etc.• Research Object Identifier (ROI)

– A unique community agreed upon unique identifier for a research object

• Commons– A long range objective– A shared space where research objects reside which

is hardware agnostic• Big Data to Knowledge (BD2K)

– An extramural research program – FY15 ~$80M

Page 8: Foundations for Discovery Informatics

Further Motivation for the Commons

• Increasing the productivity of scientists• A need to share research objects• Making research objects FAIR:

– Findable

– Accessible – Interoperable– Reusable

• The need to take computing to the data

Page 9: Foundations for Discovery Informatics

What The Commons Is and Is Not

• The Commons is not:– A replacement for well characterized data

resources/databases– An IT investment or system owned or operated by NIH

• The Commons is:– Exploiting emergent cloud computing capabilities– Utilizing existing extramural research infrastructure – Supported by resources and tools from BD2K e.g. The

Data Discovery Index (DDI)– An agile experiment that might lead to a long term effort

Page 10: Foundations for Discovery Informatics

The Commons Begins with a Set of “Commons-compliant” Resources

• Public cloud providers, national labs, HPC facilities have expressed interest in participating

• What it means to be compliant will change over time, initially it will mean little more that an agreement to share research objects with appropriate access controls

Page 11: Foundations for Discovery Informatics

The Next Step in Compliance will be the Assignment of ROIs

• An ROI uniquely defines and resolves the research object

• DOIs are one option for more persistent research objects and already fully embraced by the scholarly community

• Another handle system might be more appropriate for large numbers of transient ROIs

• Subject of a joint US-EU meeting in a week

Page 12: Foundations for Discovery Informatics

What is the incentive to participate?

The business model (a model of supply and demand) is perhaps the

most critical part of this initiative and I will come to that

Page 13: Foundations for Discovery Informatics

Various BD2K Initiatives will Enable the Commons …

• The DDICC will create an index of discoverable research objects, starting with the Commons.

• Hopefully that index will:– Resolve the location of RO’s– Contain an increasing amount of metadata

about different types of RO– Contain usage statistics pertaining to that RO– Offer crowd sourced commentary on that RO

Page 14: Foundations for Discovery Informatics

Various BD2K Initiatives will Enable the Commons

• A software coordination center (tentative)

• National Standards Information Framework (NSIR) – FY15

• BD2K Consortium (12 centers +)

• Software development efforts– Compression - Visualization

– Wrangling - Provenance

• Training

Page 15: Foundations for Discovery Informatics

Incentives: The Commons as a Credit Model

• Only pay for compute used

• Drives competition - Better value for money?

• Drives scientific research into the Commons?

• Facilitates public-private partnership?

* These are open questions to be tested

Page 16: Foundations for Discovery Informatics

Cloud ProviderN

Cloud ProviderA

The Commons

Digital ObjectSteward

CommonsConsortium

NIH

Distributes “CommonsCredits”

Bills Partnership

Makes Digital Objects AvailableUsing Commons Credits

Bills Partnership

Provides Funding

Business andPhilanthropic Groups

Provides Funding

...

Commons Credit Model

Page 17: Foundations for Discovery Informatics

The Commons: Status

• Standing up pilots:– With BD2K Centers e.g. BEACON– With reference data sets– With intramural researchers

Page 18: Foundations for Discovery Informatics

Evaluation Questions (next 18 months)• Is sharing enhanced?

• Do researchers feel they benefit?

• Is there the sense the commons improves productivity?

• Is it easier to find research objects via the Commons than otherwise?

• Is there evidence of a more sustainable environment?

Page 19: Foundations for Discovery Informatics

Dream Outcome - General

• In 5 years biomedical research productivity increases 1%

• More research can be sustained than would otherwise be the case

• Disparate Commons efforts begin to merge

Page 20: Foundations for Discovery Informatics

Dream Outcome – Discovery Informatics

• Hoifung– Papers are examples of research objects for

which shared APIs exist in the Commons

– A variety of pathway extraction tools can be selected from the DDI and consensus views established, which are also accessible in the Commons

Page 21: Foundations for Discovery Informatics

Dream Outcome – Discovery Informatics

• Bruce, Peter– Crowd sourcing of curation– Iterative model building

• Larry– Better utilization of existing data resources– Supports the notion of cloud laboratories

Page 22: Foundations for Discovery Informatics

Acknowledgements

Commons BD2K Efficiency

• Cloud – Data & Compute

• Search• Security • Reproducibility

Standards• App Store

• Coordinate• Hands-on• Syllabus• MOOCs

• Community• Centers• Training Grants• Catalogs• Standards• Analysis

• Data Resource Support

• Metrics• Best

Practices• Evaluation• Portfolio

Analysis

Partnerships

Deliverable

Example Features • IC’s• Researchers• Federal

Agencies• International

Partners• Computer

Scientists

Training

Vivien Bonazzi

GeorgeKomatsoulis

(NCBI)

Mark Guyer

MichelleDunn

JennieLarkin

LeighFinnegan

BethRussell

Page 23: Foundations for Discovery Informatics

NIHNIH……Turning Discovery Into HealthTurning Discovery Into Health

[email protected]