data history / data science @ nyt
TRANSCRIPT
data science @ The New York Times
[email protected]@nytimes.com@chrishwiggins
references: bit.ly/icerm
data science @ The New York Times
references: bit.ly/icerm
data science @ The New York Times
references: bit.ly/icerm
data science @ The New York Times
references: bit.ly/icerm
data science @ The New York Times
references: bit.ly/icerm
“data science” jobs, jobs, jobs
references: bit.ly/icerm
“data science” jobs, jobs, jobs
references: bit.ly/icerm
“data science” jobs, jobs, jobs
references: bit.ly/icerm
data science: mindset & toolset
drew conway, 2010
references: bit.ly/icerm
modern history:2009
references: bit.ly/icerm
“data science” blogs, blogs, blogs
references: bit.ly/icerm
“data science” blogs, blogs, blogs
The first time I heard "data science" was in 2007 while reading a proposal that my adviser had passed along, outlining an academic program similar to what we think of as data science.
The first time I heard "data science" was in 2007 while reading a proposal that my adviser had passed along, outlining an academic program similar to what we think of as data science.
references: bit.ly/icerm
“data science” blogs, blogs, blogs
references: bit.ly/icerm
“data science” ancient history: 2001
references: bit.ly/icerm
“data science” ancient history: 2001
references: bit.ly/icerm
data science context
references: bit.ly/icerm
home schooled
references: bit.ly/icerm
PhD in topology
references: bit.ly/icerm
“By the end of late 1945, I was a statistician rather than a topologist”
references: bit.ly/icerm
invented: “bit”
references: bit.ly/icerm
invented: “software”
references: bit.ly/icerm
invented: “FFT”
references: bit.ly/icerm
“the progenitor of data science.” - @mshron
references: bit.ly/icerm
“The Future of Data Analysis,” 1962John W. Tukey
references: bit.ly/icerm
introduces: “Exploratory data anlaysis”
references: bit.ly/icerm
Tukey 1965, via John Chambers
references: bit.ly/icerm
TUKEY BEGAT S WHICH BEGAT R
references: bit.ly/icerm
Tukey 1972
references: bit.ly/icerm
? 1972
references: bit.ly/icerm
Jerome H. Friedman
references: bit.ly/icerm
Tukey 1975
In 1975, while at Princeton, Tufte was asked to teach a statistics course to a group of journalists who were visiting the school to study economics. He developed a set of readings and lectures on statistical graphics, which he further developed in joint seminars he subsequently taught with renowned statistician John Tukey (a pioneer in the field of information design). These course materials became the foundation for his first book on information design, The Visual Display of Quantitative Information
references: bit.ly/icerm
TUKEY BEGAT VDQI
references: bit.ly/icerm
Tukey 1977
references: bit.ly/icerm
TUKEY BEGAT EDA
references: bit.ly/icerm
fast forward -> 2001
references: bit.ly/icerm
“The primary agents for change should be university departments themselves.”
references: bit.ly/icerm
data science @ The New York Times
histories
1. in academia -> Bell: as heretical statistics (see also Breiman)
2. in industry: as job description
historical rant: bit.ly/data-rant
data science @ The New York Times
[email protected]@nytimes.com@chrishwiggins
references: bit.ly/icerm
biology: 1892 vs. 1995
biology changed for good.
references: bit.ly/icerm
genetics: 1837 vs. 2012
ML toolset; data science mindset
references: bit.ly/icerm
genetics: 1837 vs. 2012
references: bit.ly/icerm
genetics: 1837 vs. 2012
ML toolset; data science mindset
arxiv.org/abs/1105.5821 ; github.com/rajanil/mkboost
data science: mindset & toolset
references: bit.ly/icerm
1851
references: bit.ly/icerm
news: 20th century
church state
references: bit.ly/icerm
church
references: bit.ly/icerm
church
references: bit.ly/icerm
church
news: 20th century
church state
references: bit.ly/icerm
news: 21st century
church state
engineering
references: bit.ly/icerm
1851 1996
newspapering: 1851 vs. 1996
references: bit.ly/icerm
example:
millions of views per hour2015
references: bit.ly/icerm
data science: the web
references: bit.ly/icerm
data science: the web
is your “online presence”
references: bit.ly/icerm
data science: the web
is a microscope
references: bit.ly/icerm
data science: the web
is an experimental tool
references: bit.ly/icerm
data science: the web
is an optimization tool
references: bit.ly/icerm
1851 1996
newspapering: 1851 vs. 1996 vs. 2008
2008
references: bit.ly/icerm
“a startup is a temporary organization in search of a repeatable and scalable business model” —Steve Blank
references: bit.ly/icerm
every publisher is now a startup
references: bit.ly/icerm
news: 21st century
church state
engineering
references: bit.ly/icerm
news: 21st century
church state
engineering
references: bit.ly/icerm
learnings
references: bit.ly/icerm
learnings
- supervised learning- unsupervised learning- reinforcement learning
references: bit.ly/icerm
learnings
- supervised learning- unsupervised learning- reinforcement learning
cf. modelingsocialdata.org
references: bit.ly/icerm
stats.stackexchange.com
references: bit.ly/icerm
from “are you a bayesian or a frequentist” —michael jordan
L =NX
i=1
' (yif(xi;�)) + �||�||
supervised learning, e.g.,
cf. modelingsocialdata.org
supervised learning, e.g.,
“the funnel”
cf. modelingsocialdata.org
interpretable supervised learning
supe
r co
ol s
tuff
cf. modelingsocialdata.org
interpretable supervised learning
supe
r co
ol s
tuff
cf. modelingsocialdata.org
arxiv.org/abs/q-bio/0701021
optimization & learning, e.g.,
“How The New York Times Works “popular mechanics, 2015
optimization & prediction, e.g.,
“How The New York Times Works “popular mechanics, 2015
(some models)
(som
e mo
neys
)
recommendation as supervised learning
recommendation as predictive modeling
bit.ly/AlexCTM
unsupervised learning, e.g,
cf. daeilkim.com ; import bnpy
modeling your audiencebit.ly/Hughes-Kim-Sudderth-AISTATS15
modeling your audience(optimization, ultimately)
also allows recommendation as inferencemodeling your audience
prescriptive modeling, e.g,
prescriptive modeling, e.g,
Reporting
Learning
Test
Optimizing
Exploreunsupervised:
supervised:
reinforcement:
Reporting
Learning
Test
Optimizing
Exploreunsupervised:
supervised:
reinforcement:
common requirements in data science:
common requirements in data science:
1.people2.ideas3.things
cf. USAF
things:what does DS team deliver?
things:what does DS team deliver?
- build data prototypes- build APIs- impact roadmaps
- build data prototypes
- build data prototypes
cf. daeilkim.com
- build data prototypes
cf. daeilkim.com
- in puppet, w/python2.7- collaboration w/pers. team
- build APIs
- impact roadmaps
flickr/McJex
data science: ideas
data skills
- data engineering- data science- data visualization- data product- data multiliteracies- data embeds
cf. “data scientists at work”, ch 1
data skills
- data engineering- data science- data visualization- data product- data multiliteracies- data embeds
cf. “data scientists at work”, ch 1
data science: people
- new mindset > new toolset
data science: people
summary:pay attention to:
1.people2.ideas3.things
cf. USAF
thanks to the data science team!
data science @ The New York Times
[email protected]@nytimes.com@chrishwiggins