from data to deployment- full stack data science
TRANSCRIPT
From Data to Deployment:
Full Stack Data Science
Ben LinkData Scientist
Indeed is the #1 external
source of hire
64% of US job searchers search
on Indeed each month
80.2Munique US visitors per month
16Mjobs
50+countries
28languages
200Munique visitors
2010
Unique Visitors (millions)
2009 2011 2012 2013 2014 2015
0
40
80
120
160
200
We helppeopleget jobs.
Data Science @ Indeed
Applicant Quality
Job / Employer
Application Model
Resume / Job
Seeker
Good Fit?
What does a data scientist do at Indeed?
Gather Data
Label
Data
Prototype
Models
Generate
Features
Model
Review
Choose final
parameters
A/B Test
Model
Hypothesis
Formulation
Explore
Data
Analyze
Labels
Analyze
FeaturesLabel Hold-out
Data
Deploy
Model
Monitor
Model
Repeat
Evaluate
Model
Gather Data
Label
Data
Prototype
Models
Generate
Features
Model
Review
Choose final
parameters
A/B Test
Model
Hypothesis
Formulation
Explore
Data
Analyze
Labels
Analyze
FeaturesLabel Hold-out
Data
Deploy
Model
Monitor
Model
Repeat
Evaluate
Model
Gather Data
Label
Data
Prototype
Models
Generate
Features
Model
Review
Choose final
parameters
A/B Test
Model
Hypothesis
Formulation
Explore
Data
Analyze
Labels
Analyze
FeaturesLabel Hold-out
Data
Deploy
Model
Monitor
Model
Repeat
Evaluate
Model
Gather Data
Label
Data
Prototype
Models
Generate
Features
Model
Review
Choose final
parameters
A/B Test
Model
Hypothesis
Formulation
Explore
Data
Analyze
Labels
Analyze
FeaturesLabel Hold-out
Data
Deploy
Model
Monitor
Model
Repeat
Evaluate
Model
Gather
Data
Label
Data
Hypothesis
Formulation
Explore
Data
Prototype ModelsGenerate
Features
Analyze
Labels
Analyze
Features
Evaluate
Model
Model
Review
Deploy
Model
Label Hold-out
Data
Choose Final
Parameters
A/B Test
Model
Monitor
Model
Repeat
Full-stack data scientists
Prevent handoff
mistakes
Can contribute on
any team
Have big picture
in mind
1 2 3
Prevent handoff mistakes
1
IpythonModel
Feature
Extraction
Model Building
DBRaw
Data
DB
Web Infrastructure
Model
Feature
Extraction
Raw
Data
DB
Web Infrastructure
Model
Feature
Extraction
JSON
Data
Data
Service
DB
NoSQL
Web Infrastructure
Model
Feature
Extraction
JSON
Data
Data
Service
Web Infrastructure
Model
Feature
Extraction
JSON
Data
Data
ServiceNoSQL
New
Service
Web Infrastructure
Model
Feature
Extraction
JSON
Data
Data
ServiceNoSQL
Web Infrastructure
New
Service
Model
JSON
DataNoSQL
Feature
Extraction
Web Infrastructure
New
Service
Model
JSON
DataNoSQL
Java Feature
Extraction
Contribute on any team
2
Drive logging of data
Drive product decisions
using external data
Get first data science solution
into production quickly
Iterate on existing solutions
Recognize deployment costs during
feature / model development
Think Big
3
Focus on right problem
Aware of big picture
Practical Data Science
Job Description Classifiers
Predicting (min) years of experience
from a job description
Simple features for first models
{ ‘regex:5+’:1, ‘tfidf:expert’:1.75, ‘tfidf:advanced’:0.93, ‘tfidfBigram:5
years’:2.25 }
Label data before, during, and after you build a model
Extract features in one place
Reuse your model building code
Release softly and log everything
Validate and review every model
Monitor after deploying
Retrain when needed
Label data before, during, and after you build a model
Extract features in one place
Reuse your model building code
Release softly and log everything
Validate and review every model
Monitor after deploying
Retrain when needed
The best way to understand your
problem is to label your own data
The fastest way to get labels for your
data is to label your own data
The easiest way to know your labels are
consistent is to label your own data
Labeling encourages
feature development
Labeling creates a human
performance benchmark
Labeling throughout gives you
indications of shifting data
Is the job part time, full time, or both?
Sometimes you don’t need much data
Need to only do better
than a simple heuristic
Training Samples
Sco
reLearning Curve
0 1000 3000 70005000
1.00
0.98
0.94
0.88
0.84
0.92
0.96
0.90
0.86
Training
score
Cross-validation score
Now train others to label
Or use experts
Check their consistency
Can build next generation model quickly
Always flag weird data
Label data before, during, and after you build a model
Extract features in one place
Reuse your model building code
Release softly and log everything
Validate and review every model
Monitor after deploying
Retrain when needed
Model
Feature Extraction
Features PredictionsModel
Builder
Model
Predictor
Prevents feature inconsistency
between train / serve time
Allows faster feature iteration
Encourages feature extraction reuse
Deploy feature extraction services
Features ModelModel Builder
Feature Extraction
Job Description
Feature Extractor
"tfidf:experience"0.007
"bigramTfidif:5 years"0.049
"bigramTfidf:experience in"0.006
"tfidf:expert"0.026
"averageWordLength"5.506
"tfidf:2" 0.017
"tfidf:5" 0.029
"tfidf:years"0.017
... }
Label data before, during, and after you build a model
Extract features in one place
Reuse your model building code
Release softly and log everything
Validate and review every model
Monitor after deploying
Retrain when needed
Features Model
● feature sampling
● feature scaling
● feature selection
Model Builder
● test/train splits
● cross validation
● generate plots
● email results
● export model
input_file=job_decription_years_exp.gzoutput_dir=output/job_description_years_exp_model_builds
model_name=JobExperiencemodel_version=1.2
model_type=RandomForestClassifiermodel_params=[{`n_estimators`:[100, 125, 150], `max_depth`:[3, 4, 5, 6]}]
downsampling_ratio=1.75use_feature_selection=Truefeature_selection_variance_retained=0.9plot_learning_curve=True
Tru
e P
osit
ive R
ate
False Positive Rate
ROC Curve
0.0 0.2 1.00.0 0.80.60.4
1.00
0.2
0.4
0.6
0.8
Feature
Name
Feature
Importance
experience 0.27
5 years 0.19
experience in 0.17
expert 0.16
averageWordLength 0.11
years 0.08
... ...
ClassPrecisio
nRecall
F1-
ScoreSupport
1.0 0.92 0.90 0.91 353
2.0 0.87 0.92 0.90 310
5.0 0.90 0.86 0.88 213
avg
/total0.90 0.90 0.90 876
Output your models into
a standard format
Deploy quickly
Model
Model Predictor
Feature Extraction
Predictions
Putting it all together
Model
Feature Extraction
Features PredictionsModel
Builder
Model
Predictor
Label data before, during, and after you build a model
Extract features in one place
Reuse your model building code
Release softly and log everything
Validate and review every model
Monitor after deploying
Retrain when needed
viewjobeval_en_US JUDY-419: Proctor test for viewjob evaluation test
editdetails
control test1control test1
viewjobeval_en_US JUDY-419: Proctor test for viewjob evaluation test
editdetails
control test1
viewjobeval_en_US JUDY-419: Proctor test for viewjob evaluation test
test1 - 50%editdetails
Log everything
uid=1b0un002j1jfi8mp&type=judyQoaEvalFeatures&appdcname=aus&appinstance=judy&tk=1b0un002d1jfid0o&locale=en_US&f.jdTfidf%3A794=0.079
31499364678474&f.candidateResumeRead=0.0&f.trigramJDTfidf%3A2365=0.03493229123324494&f.trigramJDTfidf%3A1135=0.03964128705308954
&f.jdTfidf%3A1618=0.08411276446891801&f.jdTfidf%3A2025=0.07554196313862578&f.jdTfidf%3A796=0.10368340560564313&f.trigramJDTfidf%3A
1324=0.023586131767642488&f.trigramJDTfidf%3A1300=0.013675981072748583&f.jobApplicantDistance=25000.0&f.tfidfBestFitJobsJobDescription
Similarity=0.0&f.jdTfidf%3A2357=0.12212208847891733&f.jdTfidf%3A1786=0.24798453870628528&f.jdTfidf%3A1583=0.11102969484158107&f.trigra
mJDTfidf%3A440=0.009580278396637679&f.bestFitJobsJobDescriptionSimilarity=0.0&f.jdTfidf%3A16=0.09676734768924529&f.trigramJDTfidf%3A3
42=0.052695755493244574&f.jdTfidf%3A2961=0.12933227874206563&f.jdTfidf%3A2559=0.0781937359029168&f.coverLetterJobTitleSimilarity=0.0&f
.jdTfidf%3A313=0.13274661170267346&f.trigramJDTfidf%3A2844=0.011672658147330478&f.jdTfidf%3A1228=0.0826878541112167&f.jdTfidf%3A38
6=0.09321074430754722&f.jdTfidf%3A587=0.09338485474725206&f.trigramJDTfidf%3A2007=0.03398987646377408&f.jdTfidf%3A25=0.0848508555
3898714&f.trigramJDTfidf%3A743=0.052044363109186274&f.trigramJDTfidf%3A742=0.00936380975357828&f.jdTfidf%3A21=0.08956959630539192
&f.trigramJDTfidf%3A1465=0.05695667014121465&f.trigramJDTfidf%3A170=0.019054361889691666&f.trigramJDTfidf%3A2041=0.078672252220736
76&f.jdTfidf%3A178=0.06740515563149391&f.trigramJDTfidf%3A1348=0.020307558998175355&f.yearsOfWorkExperience=0.0&f.trigramJDTfidf%3A2
874=0.021452684048600148&f.trigramJDTfidf%3A2739=0.008846404277542146&f.jtYrsExpRegex%3A0=0.0&f.pastJobTitleSimilarity%3A0=0.0&f.pas
tJobTitleSimilarity%3A1=0.0&f.tfidfResumeJobDescriptionSimilarity=0.020420184609032756&f.jdTfidf%3A276=0.0865108192737853&f.pastJobTitleSi
milarity%3A2=0.0&f.jdTfidf%3A882=0.09227660841710272&f.trigramJDTfidf%3A904=0.028517392545983834&f.applicantsPerJob=0.0&f.majorJobDe
scriptionSimilarity=0.018518518518518517&f.jobDescriptionCharacterLength=501.0&f.trigramJDTfidf%3A221=0.03856671987843533&f.jdSupervisorTi
tleRegex%3A3=1.0&f.jdSupervisorTitleRegex%3A1=0.0&f.jdSupervisorTitleRegex%3A2=0.0&f.jdSupervisorTitleRegex%3A0=0.0&f.jdTfidf%3A1937=0.
10276933510059638&f.jdTfidf%3A2240=0.16550210190515535&f.jdTfidf%3A264=0.1061544307504775&f.jdTfidf%3A1933=0.08140883446275106&f.
trigramJDTfidf%3A2932=0.04909455318062527&f.jdTfidf%3A1082=0.09783192017828135&f.jdTfidf%3A2454=0.08232280250175841&f.jdLicenceReg
exp%3A2=0.0&f.tfidfCoverLetterJobDescriptionSimilarity=0.0&f.jdTfidf%3A485=0.11773996424853242&f.trigramJDTfidf%3A1942=0.03500133306124
8&f.jdLicenceRegexp%3A0=0.0&f.jdLicenceRegexp%3A1=0.0&f.jdTfidf%3A299=0.08046452951090553&f.trigramJDTfidf%3A2261=0.0539089291266
305&f.jdTfidf%3A872=0.08711259378092336&f.trigramJDTfidf%3A1377=0.037898645513041965&f.trigramJDTfidf%3A487=0.022278961460829243&
f.trigramJDTfidf%3A485=0.029495461171052794&f.numMonthsExperience=134.0&f.trigramJDTfidf%3A207=0.040840685741050896&f.trigramJDTfidf
Reuse logs for future models
Logs give us insight
into changing data
Logs allow us to see
what went wrong
Label data before, during, and after you build a model
Extract features in one place
Reuse your model building code
Release softly and log everything
Validate and review every model
Monitor after deploying
Retrain when needed
Quantitative Validation
Training Setclass precision recall f1-score
support0.0 1.00 1.00 1.00
4481.0 0.99 1.00 1.00
6632.0 1.00 0.98 0.99
269
avg / total 1.00 1.00 1.00 1380
[ 2015-12-15 21:42:27,537 INFO ] [indeed.model_builder]
Test Setclass precision recall f1-score
support0.0 0.85 0.90
0.87 1461.0 0.92 0.96
0.94 2262.0 0.91 0.70
0.79 88
Tru
e P
osit
ive R
ate
False Positive Rate
ROC Curve
0.0 0.2 1.00.0 0.80.60.4
1.00
0.2
0.4
0.6
0.8
Qualitative Validation
Review your Models
Another perspective
Transparency and Reproducibility
Awareness
1 Context
2 Data
3 Response variable
4 Features
5 Model selection and performance
6 Transparency and recommendations
Context
What should this model enable us to do
(highlighting, filtering, sorting, etc.)?
What products / interfaces / workflows
will initially use this model ?
Data
What queries and filters were used?
From what time range did your data originate?
Did you sample your dataset?
Response variable
How was the response variable
labeled or collected?
What the model outputs (predictions) represent
and how they should be scaled or thresholded?
Features
How were your features generated?
Which features were most important?
Model selection and performance
Performance reports on train / test sets
Overall CV search strategy and scoring function
Other performance tests
(e.g. newer hold out sets, stress testing)
Expected model performance
Transparency and recommendations
Properties files for Model Builder
Link to branch of Model Builder code
Examples of Model Predictions
Possible directions for future improvements
A couple sentences on why you think the
model is ready for production
Label data before, during, and after you build a model
Extract features in one place
Reuse your model building code
Release softly and log everything
Validate and review every model
Monitor after deploying
Retrain when needed
Features and data are hard dependencies
Need a post deploy plan
Use log data to check for feature changes
Bu
cke
t co
un
t
tfidf:`excel`
Test Name ttest_ind ks_2samp mannwhit levene ranksums
p-value 3.79e-09 0.00021 8.41e-05 3.79e-09 0.00017
Check prediction class distributions
Label data before, during, and after you build a model
Extract features in one place
Reuse your model building code
Release softly and log everything
Validate and review every model
Monitor after deploying
Retrain when needed
Every model should be validated,
retraining is time expensive
Use feature monitoring to
determine feature stability
Choose less sensitive features
Avoid counts
Full stack data scientists
Full stack data science organizations
More Indeed Engineering
Careers
indeed.jobs
@IndeedEng
Engineering Blog &
Talks
indeed.tech
Open Source
opensource.indeedeng.io
Questions?
Label data before, during, and after you build a model
Extract features in one place
Reuse your model building code
Release softly and log everything
Validate and review every model
Monitor after deploying
Retrain when needed