2014 bosc-keynote

56
It’s hard to make predictions – especially about the future. -- attributed to Niels Bohr Monday, July 11th, 2039

Upload: ctitusbrown

Post on 10-Sep-2014

3.388 views

Category:

Science


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: 2014 bosc-keynote

Monday, July 11th, 2039

It’s hard to make predictions –

especially about the future.

-- attributed to Niels Bohr

Page 2: 2014 bosc-keynote

Monday, July 11th, 2039

A History of “Bioinformatics”

C. Titus BrownMonday, July 11th, 2039

Page 3: 2014 bosc-keynote

Monday, July 11th, 2039

Invited to reminisce!

…and perhaps inform the BRAIN2050 initiative.

Note for the young: “bioinformatics” and “systems biology” are now simply “biology”.

Page 4: 2014 bosc-keynote

Monday, July 11th, 2039

The 20-teens and onwards

1. Too Much Data: The Datapocalypse

2. Great results, seen once: the reproducibility crisis.

3. Mind the gap: computation in biology.

Page 5: 2014 bosc-keynote

Monday, July 11th, 2039

1. The Datapocalypse

Page 6: 2014 bosc-keynote

Monday, July 11th, 2039

Too… much… data…Between –omics, automated sensor data, and data sharing, biology grew into a data-intensive science.

Volume, velocity, variety: the general problem.

But also!

Biology was optimized for hypothesis-driven investigation, not data exploration!

Long arguments over “which is better”, with the people who controlled the funding => winning.

Page 7: 2014 bosc-keynote

HTC, not HPCFor lots of data, High Throughput Computing was

needed – but compute was cheap, not throughput!

Monday, July 11th, 2039 Figure from bbc.co.uk

Page 8: 2014 bosc-keynote

Monday, July 11th, 2039

2. The reproducibility crisis

Trials

Failed

Page 9: 2014 bosc-keynote

Monday, July 11th, 2039

The reproducibility crisis - why??

Well known fact among biotech that the majority of published experiments were largely lab-specific.

Neither career incentives nor funding were there! (In fact, quite the contrary…)

This slowly started to change later in the decade, as the public caught on…

Page 10: 2014 bosc-keynote

Monday, July 11th, 2039

Shift in “publication” recognition

Hard to believe now, but back then, people were rewarded for the first (claimed) “observation” of an effect.

Two-lab rule was only instated as best practice in the early 2020s, once reviewers started rejecting papers unaccompanied by a replication report.

Funding shift followed, of course.

Page 11: 2014 bosc-keynote

Monday, July 11th, 2039

3. Computing & data in biologyOf the sciences, biology had always been the weakest in terms of computing education.

This became a complete disaster once the data tsunami hit – labs generated data sets they couldn’t analyze, graduate students planned experiments that relied on computing they couldn’t do. Photo from Wikipedia

Page 12: 2014 bosc-keynote

Monday, July 11th, 2039

The “easy to use” tools fiasco

Immense investment in late ‘teens in tools that were “easy to use” – push-button data analysis, etc.

This worked well outside of research; however, it turns out you can’t place most data analysis in a black box.

“Easy to use” tools embodied so many assumptions that most results were simply invalid.

Page 13: 2014 bosc-keynote

Monday, July 11th, 2039

=> Bioinformatics “sweatshops”

Cadre of students and low-paid employees devoted to “service bioinformatics”

No career path, no significant authorship…

…but necessary for big labs to make progress!

Page 14: 2014 bosc-keynote

Monday, July 11th, 2039

Things came to a head…

www.sanantonio-urbanliving.com

Page 15: 2014 bosc-keynote

Monday, July 11th, 2039

The tipping pointThe well-trained students left for the data

science industry;

More and more papers were being written by people who didn’t understand the computing…

…and an increasing number of them were being rejected…

…until the supply of reviewers ran out…

Page 16: 2014 bosc-keynote

Monday, July 11th, 2039

And then… California.

Map from Wikimedia

Page 17: 2014 bosc-keynote

Monday, July 11th, 2039

Bioinformaticians, revolt!Bioinformatics reviewers essentially unionized and laid down three rules:

1. All of the data and source code must be provided for any paper.

2. Full methods sections and references are included in the primary paper review.

3. No unpublished methods can be used in data analysis.

In the end, the only people that complained were companies like MS Elsevier, because preprints.

Page 18: 2014 bosc-keynote

Monday, July 11th, 2039

Replication “parties”

A community of practice emerged around replication!

Page 19: 2014 bosc-keynote

Monday, July 11th, 2039

Part of a larger renaissance for biology!

Starting in ~2020,

1. Biomedical enterprise rediscovers basic biology;

2. Rise and triumph of open science;

3. A transition to networked science;

4. Massive investment in the people;

Page 20: 2014 bosc-keynote

Monday, July 11th, 2039

1. Rediscovering basic biology

Page 21: 2014 bosc-keynote

Monday, July 11th, 2039

The biomedical community backs away from translational medicine.

Several veterinary and agricultural animals proved to be better model organisms for human disease than mouse;

Ecology and evolution provided valuable theoretical and empirical observations for understanding human genetics.

Microbial interactions between environment and human proved to be important as well; built environment, disease reservoirs, etc.

Cheap sequencing enabled a vast array of studies.

Page 22: 2014 bosc-keynote

Monday, July 11th, 2039

2. Open science triumphs!The computational community knew this by 2016,

but it took a few years for the rest of biology…

A curious story!

1. Biotech pressured congresspeople into decreasing funding for experiments, since analysis was usually wrong and raw data was never available;

2. Funding crunch, more generally, tightened the screws further;

3. Hypothesis driven labs couldn’t compete…

Page 23: 2014 bosc-keynote

Monday, July 11th, 2039

…hypothesis-driven lab science joined with discovery.

Eventually, funders mandated data availability;

Labs that made use of available data had a dramatic edge in hypothesis-driven experimentation;

Data-driven modeling and model-driven data interpretation blossomed!

Image from emory.edu

Page 24: 2014 bosc-keynote

Monday, July 11th, 2039

3. A transition to networked science

Page 25: 2014 bosc-keynote

Monday, July 11th, 2039

Universities collapsed!So all the senior professors and administrators retired…

Massive brain drain…

… enabled a massive increase in creativity in the research enterprise!

Collaboration tools, data sharing, distributed team science…

Page 26: 2014 bosc-keynote

Monday, July 11th, 2039

“Walled garden” modelPioneered by Sage Bionetworks in ~2010s

Data collection done by small consortia;

Data made available to all, but publication in step.

Model is of course obsolete nowadays, but was quite effective back then.

Page 27: 2014 bosc-keynote

Monday, July 11th, 2039

4. Massive investment in people

The NIH finally invested heavily in training.

Among other things:

Data Carpentry

Model Carpentry

(We won! Yay!)

Page 28: 2014 bosc-keynote

Monday, July 11th, 2039

There are still problems, of course!

What do most genes do? Functional annotations are still poor. Some approaches --BiogeochemistrySynthetic biology

Career paths for experimental biologists are very uncertain.

“Glam data”

Cancer is cured, but many complex diseases – especially neurodegenerative ones – remain poorly understood.

Page 29: 2014 bosc-keynote

Monday, July 11th, 2039

BRAIN2050Ambitious 10-year proposal to “understand the

brain” by 2050.

Focus on neurodegenerative diseases, regeneration, and a mechanistic understanding of intelligence.

What mistakes can they avoid, with the benefit of hindsight?

Page 30: 2014 bosc-keynote

Monday, July 11th, 2039

Correlation is not causation

You’d think we’d have learned this by now!?

Original MIND project 25 years ago failed for this reason. (“Record ALL the neurons”)

Image from Wikipedia

Page 31: 2014 bosc-keynote

Monday, July 11th, 2039

(Computational) modeling is critical

Can we develop models that embody hypotheses that we can then “test” against the data?

Holistic multidisciplinary research.

(Brain community has always been better off here…)

Page 32: 2014 bosc-keynote

Monday, July 11th, 2039

Focus less on reproducibility

A strict requirement for independent replication is strangling us!

Completely independent replication is a strong requirement; understandable, given disasters of the past, but also slow.

Can we compromise?

Page 33: 2014 bosc-keynote

Monday, July 11th, 2039

“Replication debt”Can we borrow idea of “technical debt” from

software engineering?

Semi-independent replication after initial exploratory phase, followed by articulation of protocols and independent replication.

Image from blog.crisp.se

Page 34: 2014 bosc-keynote

Monday, July 11th, 2039

“Replication debt”Semi-independent replication after initial

exploratory phase, followed by articulation of protocols and independent replication.

Public acknowledgement of debt is important.

Image from blog.crisp.se

Page 35: 2014 bosc-keynote

Monday, July 11th, 2039

Invest in infrastructure for collaboration and sharing

Data sharing is a given

But existing tools still merely support rather than drive science with data sharing!

Push for collaborative process from the outset.

Page 36: 2014 bosc-keynote

Monday, July 11th, 2039

Can we help drive collaboration with technology?

See e.g. pebourne.wordpress.com/2014/01/04/universities-as-big-data/

Page 37: 2014 bosc-keynote

Monday, July 11th, 2039

Tool up! But evaluate, compare, understand.

Having a robust and competitive software ecosystem is important for innovation and creativity.

Available, open, reusable, remixable: all critical!

Benchmarks are not always useful; understanding always is.

Page 38: 2014 bosc-keynote

Monday, July 11th, 2039

Build commercial software only when basics are understood

Page 39: 2014 bosc-keynote

Monday, July 11th, 2039

Invest in training as first-class research citizen!

The high school students of yesterday are the research scientists of

tomorrow.

Page 40: 2014 bosc-keynote

Monday, July 11th, 2039

It’s the network, dummies. Single molecule full genome sequences did not provide

understanding.

Reductionist studies of gene function did not provide understanding.

Neither will high resolution ensemble neuronal sampling.

Our main obstacle in understanding aging has been that it seems to be systemic, just like neurogeneration.

Page 41: 2014 bosc-keynote

Monday, July 11th, 2039

Concluding thoughts (I)Many things the BRAIN2050 field can do to

invest in its own future and accelerate progress!

Bitter lessons learned from decades of mistakes in other fields; maybe we can do better?

Page 42: 2014 bosc-keynote

Monday, July 11th, 2039

Page 43: 2014 bosc-keynote

All right…Future talk over

I thought I’d use this as a foil to highlight issues that I think are important for the future.

But:

Page 44: 2014 bosc-keynote

We have to get used to the idea that radical change keeps happening ... even after 1997.

First published by Broadway Books on May 5, 1997. Via Erich Schwarz

Page 45: 2014 bosc-keynote

We have to get used to the idea that radical change keeps happening ... even after 1997.

"Among the pessimists, molecular biologist Gunther Stent suggests that science is reaching a point of incremental, diminishing returns as it comes up against the limits of

knowledge..." --review by Publishers Weekly

First published by Broadway Books on May 5, 1997. Via Erich Schwarz

Page 46: 2014 bosc-keynote

Robert Heinlein's four curves of predicted human progress (described in 1950)

Ref.: Heinlein, R.A. (1950), "Where To?".

"The solid curve ... represents many things -- use of power, speed of transport, numbers of scientific and technical workers, advance in communication, average miles traveled per person per year, advances in mathematics ... Call it the curve of human achievement."

Via Erich Schwarz

Page 47: 2014 bosc-keynote

Robert Heinlein's four curves of predicted human progress (described in 1950)

"Despite everything, there is a stubborn'common sense' tendency to project it along dotted line number (1) like the patent office official of a hundred years back who quit his job 'because everything had already been invented'."

Ref.: Heinlein, R.A. (1950), "Where To?". Via Erich Schwarz

Page 48: 2014 bosc-keynote

Robert Heinlein's four curves of predicted human progress (described in 1950)

"Even those who don't expect a slowing up at once tend to expect us to reach a point of diminishing returns -- dotted line number (2)."

Ref.: Heinlein, R.A. (1950), "Where To?". Via Erich Schwarz

Page 49: 2014 bosc-keynote

Robert Heinlein's four curves of predicted human progress (described in 1950)

"Very daring minds are willing to predict that we will continue our present rate of progress -- dotted line number (3) -- a tangent."

Ref.: Heinlein, R.A. (1950), "Where To?". Via Erich Schwarz

Page 50: 2014 bosc-keynote

Robert Heinlein's four curves of predicted human progress (described in 1950)

Ref.: Heinlein, R.A. (1950), "Where To?".

"But the proper way to project the curve is dotted line number (4), because there is no reason, mathematical, scientific, or historical, to expect that curve to flatten out... The correct projection ... is for the curve to go on up indefinitely with increasing steepness..."

Via Erich Schwarz

Page 51: 2014 bosc-keynote

Conclusion -- I certainly don’t know where we’re headed; no one

else does either.

We must invest in people and process; we must help figure out what the right process is and then provide career incentives for people to do things that way.

This community should be leading the way:

Bioinformatics Open Source Conference

(Reminder: we will win.)

Page 52: 2014 bosc-keynote

But: economics matter

50-million mark note. Weimar Germany, 1923.

Page 53: 2014 bosc-keynote

Economics matter

Ref.: U.S. Government Accountability Office, Citizen's Guide of 2010.

Page 54: 2014 bosc-keynote

Prospects for U.S. public funding of science

Ref.: U.S. Government Accountability Office, Citizen's Guide of 2010.

Page 55: 2014 bosc-keynote

Monday, July 11th, 2039

Public support for science matters!

Data sharing, openness => maximizing return.

Must figure out how to align career and funding incentives.

We are currently doing a horrible job of this…

…I’m looking forward to Phil Bourne’s talk :)

Page 56: 2014 bosc-keynote

Monday, July 11th, 2039

Thanks!Discussions with Phil Bourne (NIH), Erich Schwarz

(Caltech & Cornell), Katherine Mejia-Guerra (OSU) and Jeffrey Campbell (OSU).

All of this will be (is already?) posted online.

“The next 10 years of quant bio” by Mike Schatz

…with apologies to Gary Bernhardt(Birth & Death of JavaScript – go watch it!)