second open economics workshop - thoughts from the biosciences

18
Thoughts from the Biomedical Sciences Philip E. Bourne UCSD [email protected] Second Open Economics Workshop 1 June 11, 2013

Upload: philip-bourne

Post on 06-May-2015

198 views

Category:

Education


1 download

DESCRIPTION

A brief description of best practices and what needs to be considered regarding open data. A view from the biomedical sciences.

TRANSCRIPT

Page 1: Second Open Economics Workshop - Thoughts from the Biosciences

Thoughts from the Biomedical Sciences

Philip E. BourneUCSD

[email protected]

Second Open Economics Workshop 1June 11, 2013

Page 2: Second Open Economics Workshop - Thoughts from the Biosciences

My Perspective is Drawn from Being:

A data producer and a data user* An overseer of data curation efforts A database provider (PDB & IEDB) Suspicious of workshop reports, data

standards bodies … A supporter of data publication An open access journal founder Opinionated

Second Open Economics Workshop 2June 11, 2013

Page 3: Second Open Economics Workshop - Thoughts from the Biosciences

The Big Picture

The Good News:– NLM – Entrez - A Great Job

– Open data/software/papers have spawned science and jobs

– Success stories: Encode, PDB

– D2K?

The Bad News:– We have resources but now

they are perceived as silos

– Lack of reproducibility revealed

– Sustainability is unsolved

– Failures: CaBIG, DataNet

– D2K?

June 11, 2013 Second Open Economics Workshop

Page 4: Second Open Economics Workshop - Thoughts from the Biosciences

The Big Picture – What is the Way Forward? Driven by scientific outcomes – not build it and they

will come Community, community, - which means:

– A simple vision that many stakeholders can buy into

– Transparency

– Shared ownership

– A code of conduct

– A reward system for individuals and teams

– Strategic policies eg open access, data sharing plans

– Use resources as drivers – funding bodies, societies, institutions have a role here

– Building trust through quality data/software

June 11, 2013 Second Open Economics Workshop 4

Page 5: Second Open Economics Workshop - Thoughts from the Biosciences

Worldwide Protein Data Bank

www.wwpdb.org

Personal Experiences to Support My Big Picture View

June 11, 2013 Second Open Economics Workshop 5

Page 6: Second Open Economics Workshop - Thoughts from the Biosciences

Its All About Trust

6Second Open Economics Workshop

PDB

Trust in the datais perhaps ourbiggest achievement

Page 7: Second Open Economics Workshop - Thoughts from the Biosciences

Its All About Trust

Trust is like compound interest Comes from listening Comes from engaging the community in

every aspect of the process Comes from data consistency and level of

annotation Comes from responsiveness Comes from the quality of the delivery service

7Second Open Economics Workshop June 11, 2013

Page 8: Second Open Economics Workshop - Thoughts from the Biosciences

Data Quality Begats Trust

About 25% of our budget has been spent on data remediation

Support for versioning hence the copy of record

Our ontology/data model has been a critical component of our workflow and data accuracy

Until recently the same data model was too complex to facilitate wide adoption by others that use our data

Second Open Economics Workshop 8June 11, 2013

Page 9: Second Open Economics Workshop - Thoughts from the Biosciences

http://collections.plos.org/ploscompbiol/biocurators.php

Its All About PeopleCurators are the Unsung Heroes

• They really should do more to promote themselves

• Institutions must do more to respect their efforts

9

Page 10: Second Open Economics Workshop - Thoughts from the Biosciences

Its All About PeopleThe Users

Constantly striving to have the user distinguish raw from derived data

All data are not created equal but the user thinks so

Second Open Economics Workshop 10June 11, 2013

Page 11: Second Open Economics Workshop - Thoughts from the Biosciences

Its All About PeopleThe Global Personalities

11 Second Open Economics Workshop

Page 12: Second Open Economics Workshop - Thoughts from the Biosciences

Its NOT All About Institutions

As far as I am aware no data standards body has directly influenced anything we have done in 15 years of running the PDB

The structural biology community created a very successful data sharing plan long before funding bodies did

12Second Open Economics Workshop June 11, 2013

Page 13: Second Open Economics Workshop - Thoughts from the Biosciences

It is About Openness

There are no restrictions on the usage of the data beyond attribution

The PDB runs exclusively on open source software

We maintain and contribute to the Biojava repository

We need to be transparent about data usage

Second Open Economics Workshop 13June 11, 2013

Page 14: Second Open Economics Workshop - Thoughts from the Biosciences

Worldwide Protein Data Bank

www.wwpdb.org

So What Needs to Change re Data?

Second Open Economics Workshop 14June 11, 2013

Page 15: Second Open Economics Workshop - Thoughts from the Biosciences

That All Data Are Created Equal Must End

We need to understand how data are used

Sustainability is not more money from the funding agencies its about business models

Reductionism is not a dirty word – Reference Data!

We need to do more with the long tail

Second Open Economics Workshop

On the Future of Genomic DataScience 11 February 2011: vol. 331 no. 6018 728-729

June 11, 2013

Page 16: Second Open Economics Workshop - Thoughts from the Biosciences

Institutions That Generate Data Must Play a Greater Role

We need institutional data sharing plans

We need data scientists to be better recognized by institutions – its not all about papers – this implies new metrics

Second Open Economics Workshop 16June 11, 2013

Page 17: Second Open Economics Workshop - Thoughts from the Biosciences

www.force11.org– Tim Clark– Ivan Herman– Paul Groth– Ed Hovy– Maryann Martone– Cameron Neylon– David Shotton– Anita de Waard

www.plos.org Beyond the PDF Many others

Second Open Economics Workshop

Funding Agencies: NSF, NIGMS, DOE, NLM, NCI, NCRR, NIBIB, NINDS, NIDDK

17

Acknowledgements

June 11, 2013

Page 18: Second Open Economics Workshop - Thoughts from the Biosciences

The {Lack of} Distinction Between Data and Knowledge Needs to be Better Appreciated

• The PDB paper has been cited 14,000 times • No one has ever read it• Some PDB datasets have 1,000’s of downloads • These data are not associated with publications 18