"got a nail? i got a hammer": lessons for data science from the "dawn" of big...
TRANSCRIPT
“Got a nail? I got a hammer!” Lessons for Data science from the “dawn” of big science
Ben Keller Data Science Dojo
14 January 2015
@vinegarbinbjkeller.github.io linkedin.com/in/bjkeller [ ] Creative Commons Attribution-
ShareAlike 4.0 International License][
Some context
Almost 10 years ago the NIH program “National Centers for Biomedical Computing” started
Goal to answer questions of driving projects in biomedicine using computing
“Big science” [though maybe not the “dawn” ]
A different perspective
Questions center around story-telling
“what molecular activity links this genetic change to the symptoms of type 2 diabetes?”
Overriding goal to build software to find answers
Proof-of-concept analysis to drive development
A computational scientist – trained in algorithms – thinks of a problem as
...and will build tools that solve it
Given: a set of genes G, a covariance matrix M over expression of genes in G
Find: a family of gene sets {Gi}, subsets of G, such that...
We see what we recognize
the van carrying my geology class stops next to a rock feature that looks like:
We see what we recognize
John, napping in the back seat, wakes up briefly and looks out the window.
What did he see?
We see what we recognizeJohn will tell you he saw a chevron fold
formed by opposing pressure on the rock layers
We see what we recognize
Everyone else saw that water flowing along a crack had formed a v-shaped channel in steeply sloping layers (Sorry, John. You’re still wrong.)
You might see
columns on which to do regression
a matrix on which to do matrix factorization
a graph connecting subjects to features
variables on which to measure mutual information
variables on which to Bayesian inference
We solve what we seeScientist had results linking genomic regions to each other in subjects with bipolar disorder
Asking: what is common?
We solve what we see
Asking: what is common?
…
……
Previously used graph to represent what was common in recommender systems
We solve what we seeAsking: what is common?
…
……
Previously used graph to represent what was common in recommender systems
We get answers, but hard to interpret biologically
CDKN2A/B
PPARG
HHEX TCF7L2
"mortality"
"g1""repression"
Cognitive engineering tells us that we have to manage relationships of
how we think of problem
how represented
by tools
Cognitive engineering tells us that we have to manage relationships of
way we think of
tasks
what is allowed by
tools
Lesson:see the data and problem as
the expert “owner” sees them
Read as: - leave the data as it is, and avoid exposing
abstractions not already in the original problem - provide a expert-understandable explanation of
analysis, and, if you can’t, rethink whether the approach is useful
Stories we are looking are complex
with interrelated data
and interrelated chains of analysis
Often have to translate data between tools, and change perspective
LessonAny analysis is part of a larger question
Read as: - reduce cognitive load of interpreting
between analysis steps - understand how different steps relate
and try to help expert understand flow of analysis
Corollary:Appeal to cognitive science
Read as: - use studies already done to understand
how scientists/experts do their work - work with cog science expert to develop
understanding of domain experts
A complex problem
- involves uncertainty
- draws on incomplete and diverse sources of information
- may be affected by several factors and be driven by competing objectives
(Mirel, Interaction design for complex problem solving, 2004)
A complex problem because
- uncertain what is an actual solution
- involves diverse, incomplete, and possibly irrelevant information
- based on incomplete observations, affected by technology/methodology
- conflicting objectives of predicting/remediating/understanding disease
Corollary:Embrace the uncertainty
Read as: - expect not to know what expert needs,
and for them not to know what they need - be agile: build analysis in conversation
with expert to push understanding
CorollaryData will be “special”
Read as: - understand where your data is coming
from, what it represents, and how it the data owner sees it
- understand sources of error/noise
CorollaryObjectives drive the question
Read as: - understand objectives for analysis - be clear to data owner which objectives
being met
CorollaryThe question will change
Read as: once analysis gives the answer, expert may recognize it was the wrong question, or may come up with another one
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0
International License.