Transcript
Page 1: Computational Challenges in Biological Data Science: an Optimistically Cautionary Tale

Community challenges in biological data science: an optimistically cautionary tale

Genomics

AssemblathonCAMISequence squeeze

Structural Biology

CASPCAPRIFold.it

Systems biology

sbv-IMPROVER

Function

CAFA

Text mining

BioCREATIVE

CACAO

Data providers

Programmers

Steering committee

Assessors

Assemble the Teams

DREAM

Prepare Challenge

Computational scientists enjoy developing new methods, and the community encourages them to do so. However, it is often confusing to know which method to choose: which method is best? Moreover, what does “best” mean?

To help choose an appropriate method for a particular task, scientists often form community-based challenges for the unbiased evaluation of methods in a given field. These challenges help evaluate existing and novel methods, while helping to coalesce a community and leading to new ideas and collaborations.

• Use more than one metric• Avoid redundant analyses

Identify social media, mailing lists, & other communicationvenues of your community

Publish a flagship paper& specialized “satellite”papers to maximize impact & credit

Agree on co-authorship &credits before challenge

Have a data sharing plan

Data & software sharing policy should be accepted by all

Lots of work for everyone: understand commitment

Advertise challenge

Run challenge

Analyze & score

Maintain database, website & code Hold a conference

Code & website Set challenge rules

Identifying a communityChoosing a clear question

Start by...

The Challenge

PublishSeek funding

Nurture the Challenge and your Community

Methods may overfit to win at challenge metrics rather than real-life problems

Risk-taking may be discouraged by “surefire” incremental additions to existing methods, rather than novel development

Methods improve due to challenges

Communities form, expand, and become more cohesive

Classroom educational value (CACAO, CAFA)

Citizen science value (Fold.it)

Data

Create incentives

Challenge Goal Notes

CASP Predicting protein structure from sequence Long running. Improved and quantified structure prediction methods

CAPRI Protein protein interaction

Fold.it Protein structure energy minima prediction game

Improved protein structure prediction via gameification

Assemblathon Genomic DNA sequence assembly A better understanding of assembly metrics and species-specific considerations

CAMI Metagenomic DNA assembly and analysis

sbv-IMPROVER Systems biology and precision medicine

CAMDA Large-scale systems biology problems

Sequence Squeeze Sequence data compression Finding ways to efficiently store large volumes of sequence data

DREAM Framework for many systems biology challenges

Provides a framework and easy setup for many challenges.

CACAO Improve function annotations in UniProt Competition between student teams teaches biocuration to college students

CAGI Predicting phenotypes from genetic variants Strong wetlab engagement

CAFA Predicting protein function Improve function prediction methods

BioCreative Evaluating text mining and information extraction systems

Funders may be tempted to judge methods primarily by challenge performance

Iddo Friedberg, Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA USA

Some Community Challenges in Life Science

Saez-Rodriguez J, Costello-JC, Friend SH et al Crowdsourcing biomedical research: leveraging communities as innovation engines (2016) Nature Reviews Genetics 17, 470–486

Friedberg I, Mooney SD and Radivojac P Ten Simple Rules for a Community Computational Challenge (2015) PloS Computational Biology 17, 470–486 (2016)

Further Reading

CAGI

CAMDA

Top Related