"got a nail? i got a hammer": lessons for data science from the "dawn" of big...

48
“Got a nail? I got a hammer!” Lessons for Data science from the “dawn” of big science Ben Keller Data Science Dojo 14 January 2015 @vinegarbin bjkeller.github.io linkedin.com/in/bjkeller [ ] Creative Commons Attribution- ShareAlike 4.0 International License ] [

Upload: benjamin-keller

Post on 14-Jul-2015

162 views

Category:

Data & Analytics


1 download

TRANSCRIPT

“Got a nail? I got a hammer!” Lessons for Data science from the “dawn” of big science

Ben Keller Data Science Dojo

14 January 2015

@vinegarbinbjkeller.github.io linkedin.com/in/bjkeller [ ] Creative Commons Attribution-

ShareAlike 4.0 International License][

Some context

Almost 10 years ago the NIH program “National Centers for Biomedical Computing” started

Goal to answer questions of driving projects in biomedicine using computing

“Big science” [though maybe not the “dawn” ]

A different perspective

Questions center around story-telling

“what molecular activity links this genetic change to the symptoms of type 2 diabetes?”

Overriding goal to build software to find answers

Proof-of-concept analysis to drive development

"I got a hammer – you got a

nail?"

A computational scientist – trained in algorithms – thinks of a problem as

...and will build tools that solve it

Given: a set of genes G, a covariance matrix M over expression of genes in G

Find: a family of gene sets {Gi}, subsets of G, such that...

But, biologists think of their problems in different ways:

What do you see?

p1 p2 …s1 42.211 9.3211 …s2 2.192 8.9942 …⋮ ⋮ ⋮ ⋱

We see what we recognize

the van carrying my geology class stops next to a rock feature that looks like:

We see what we recognize

John, napping in the back seat, wakes up briefly and looks out the window.

What did he see?

We see what we recognizeJohn will tell you he saw a chevron fold

formed by opposing pressure on the rock layers

We see what we recognize

Everyone else saw that water flowing along a crack had formed a v-shaped channel in steeply sloping layers (Sorry, John. You’re still wrong.)

So, what do you see?

p1 p2 …s1 42.211 9.3211 …s2 2.192 8.9942 …⋮ ⋮ ⋮ ⋱

"tabular data"

p1 p2 …s1 42.211 9.3211 …s2 2.192 8.9942 …⋮ ⋮ ⋮ ⋱

You might see

columns on which to do regression

a matrix on which to do matrix factorization

a graph connecting subjects to features

variables on which to measure mutual information

variables on which to Bayesian inference

You might see

A proxy problem that you already know how to solve

We solve what we seeScientist had results linking genomic regions to each other in subjects with bipolar disorder

Asking: what is common?

We solve what we see

Asking: what is common?

……

Previously used graph to represent what was common in recommender systems

We solve what we seeAsking: what is common?

……

Previously used graph to represent what was common in recommender systems

We get answers, but hard to interpret biologically

CDKN2A/B

PPARG

HHEX TCF7L2

"mortality"

"g1""repression"

Cognitive engineering tells us that we have to manage relationships of

how we think of problem

how represented

by tools

Cognitive engineering tells us that we have to manage relationships of

way we think of

tasks

what is allowed by

tools

Lesson:

see the data and problem as the expert “owner” sees them

Lesson:see the data and problem as

the expert “owner” sees them

Read as: - leave the data as it is, and avoid exposing

abstractions not already in the original problem - provide a expert-understandable explanation of

analysis, and, if you can’t, rethink whether the approach is useful

"Need hot water for your bath? Here’s a bucket,

a pot and a stove. The well is outside!”

Stories we are looking are complex

A

B

C

D

Stories we are looking are complex

A

B

C

D

with interrelated data

Stories we are looking are complex

with interrelated data

and interrelated chains of analysis

Stories we are looking are complex

with interrelated data

and interrelated chains of analysis

Often have to translate data between tools, and change perspective

Lesson

Any analysis is part of a larger question

LessonAny analysis is part of a larger question

Read as: - reduce cognitive load of interpreting

between analysis steps - understand how different steps relate

and try to help expert understand flow of analysis

use different modes of reasoning

may switch between them at any time

Experts reason in complex ways

Corollary:

Appeal to cognitive science

Corollary:Appeal to cognitive science

Read as: - use studies already done to understand

how scientists/experts do their work - work with cog science expert to develop

understanding of domain experts

"Oh, that's easy! Just use a hammer!"

A complex problem

- involves uncertainty

- draws on incomplete and diverse sources of information

- may be affected by several factors and be driven by competing objectives

(Mirel, Interaction design for complex problem solving, 2004)

A systems biology problem:

A complex problem because

- uncertain what is an actual solution

- involves diverse, incomplete, and possibly irrelevant information

- based on incomplete observations, affected by technology/methodology

- conflicting objectives of predicting/remediating/understanding disease

Lesson:

Analysis is a complex problem

Corollary:

Embrace the uncertainty

Corollary:Embrace the uncertainty

Read as: - expect not to know what expert needs,

and for them not to know what they need - be agile: build analysis in conversation

with expert to push understanding

Corollary

Data will be “special”

CorollaryData will be “special”

Read as: - understand where your data is coming

from, what it represents, and how it the data owner sees it

- understand sources of error/noise

Corollary

Objectives drive the question

CorollaryObjectives drive the question

Read as: - understand objectives for analysis - be clear to data owner which objectives

being met

Corollary

The question will change

CorollaryThe question will change

Read as: once analysis gives the answer, expert may recognize it was the wrong question, or may come up with another one

Ultimate Lesson: it's the people

Barbara Mirel (U.Michigan)

Thanks to

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0

International License.