using deep learning and nlp to predict performance from resumes
TRANSCRIPT
Using Deep Learning To Predict Performance FromResumes
Ben Taylor, Chief Data Scientist
INTRODUCTIONS
Ben Taylor @bentaylordata
Background Personal
• Sequoia Capital
• Largest Video Interviewing Platform
• Forbes #10 most promising companies
• Global: 189 countries
NATURAL LANGUAGE PROCESSING (NLP)
GRIT MOTIVATION ENGAGEMENT PERFORMANCE
1 55 80 95%
0 75 10 22%
0 50 20 57%
1 20 90 91%
0 40 60 11%
BasicTutorialOnHowToBuildANumericFeatureModel
BUILDING A MODEL
ESSAY GRIT MOTIVATION ENGAGEMENT PERFORMANCE
I want to work here 1 55 80 95%
I have great teamwork 0 75 10 22%
Synergy 0 50 20 57%
I have so much grit 1 20 90 91%
They fired that individual 0 40 60 11%
Now what?!?
BUILDING A MODEL
ESSAY PERFORMANCE
I want to work here 95%
I have great teamwork 22%
Synergy 57%
I have so much grit 91%
They fired that individual 11%
There are really two different options, mapping or tokenizing
BUILDING A MODEL
Map:Bad=0Good=1Better=2Best=3
Tokenize:Female=1Male=1
Female Male
1 0
0 1
I want to work here have great PERF.1 1 1 1 1 0 0 95%1 0 0 0 0 1 1 22%0 0 0 0 0 0 0 57%1 0 0 0 0 1 0 91%0 0 0 0 0 0 0 11%
Tokenizethetextintouniquewordcolumns
BUILDING A MODEL
ESSAY PERFORMANCE
I want to work here 95%
I have great teamwork 22%
Synergy 57%
I have so much grit 91%
They fired that individual 11%
I want to work here have great PERF.1 1 1 1 1 0 0 95%1 0 0 0 0 1 1 22%0 0 0 0 0 0 0 57%1 0 0 0 0 1 0 91%0 0 0 0 0 0 0 11%
Bagofwordsmodeling,sequenceandorderingislost
BUILDING A MODEL
Bagofwordsmodeling,sequenceandorderingislost
BUILDING A MODEL
I want Want to to go work here PERF.
1 1 1 1 1 95%1 0 0 0 0 22%0 0 0 0 0 57%1 0 0 0 0 91%0 0 0 0 0 11%
Band-Aid:Conceptofn-grams
BUILDING A MODEL
SENTIMENT EXAMPLE(multiclass)
Weneedalabeleddataset,sometimesgettingonewithlabelsisthebiggestchallengeofall.
SENTIMENT DATASET, 1.5M TWEETS
label textneg @Christian_Rocha i miss u!!!!!pos @llanitos there's still some St Werburghs hone...pos @Ashley96 it's meneg @Phillykidd we use to be like bestfriends
negJust got back from Manchester. I went to the T...
pos @LauraDark thnks x el rt
neg"Ughh it's so hot & the singing lady is st...
neg@hnprashanth @dkris I was out to my native for...
pos Girls night with the bests Wish you were here J!
negJust watched @paulkehler rock the crap out of ...
pos i got the gurl! i got the ride! now im just on...pos @ninthspace how is the table building going?pos by d way guyz I must log out na see u again to...neg @dreday11 its only 20 mins...
Sentiment140 cs.stanford.edu:( :)
Beforewecanprocessthisweneedtodotheproperformattingtogetitready
SENTIMENT DATASET - FORMATTING
text@Christian_Rocha i miss u!!!!!@llanitos there's still some St Werburghs hone...@Ashley96 it's me@Phillykidd we use to be like bestfriendsJust got back from Manchester. I went to the T...@LauraDark thnks x el rt"Ughh it's so hot & the singing lady is st...@hnprashanth @dkris I was out to my native for...Girls night with the bests Wish you were here J!Just watched @paulkehler rock the crap out of ...i got the gurl! i got the ride! now im just on...@ninthspace how is the table building going?by d way guyz I must log out na see u again to...@dreday11 its only 20 mins...
Pythonlist
Nowwecangoallthewaytomodeltrainingandprediction
SENTIMENT DATASET – UNIGRAM
y[0,1,0,1,1]
text_data[[‘thisisatweet’][‘soundsgood’][‘notreally’]]
I want to work here have great1 1 1 1 1 0 01 0 0 0 0 1 10 0 0 0 0 0 01 0 0 0 0 1 00 0 0 0 0 0 0
Nowwecangoallthewaytomodeltrainingandprediction
SENTIMENT DATASET – BIGRAM
I want Want to to go work here
1 1 1 1 11 0 0 0 00 0 0 0 01 0 0 0 00 0 0 0 0
text_data[[‘thisisatweet’][‘soundsgood’][‘notreally’]]
y[0,1,0,1,1]
BUILDING A MODEL
Convertlabelstointegers
SENTIMENT DATASET - FORMATTING
Pythonintarray
labelnegposposnegnegposnegnegposnegposposposneg
Convertlabelstointegers
SENTIMENT DATASET - FORMATTING
model.fit(X,Y)
X[4,0,0,0,0,7,0,0,1][0,0,0,0,9,0,0,0,2]
Nowwecangoallthewaytomodeltrainingandprediction
SENTIMENT DATASET – BUILD A MODEL
y[0,1,0,1,1]
X[4,0,0,0,0,7,0,0,1][0,0,0,0,9,0,0,0,2]
PERFORMANCE?
DON’T CHEAT!
PROPER MODEL VALIDATION
Weneedtoholdoutdatawecantestagainst,thisiscalledyourvalidationset
SENTIMENT DATASET – VALIDATION
Trainon20%,teston80%
SENTIMENT DATASET – VALIDATION
20% 80%
Bestscoreyet
SENTIMENT DATASET – VALIDATION
60% 40%
Bestscoreyet
SENTIMENT DATASET – VALIDATION
70% 30%
Bestscoreyet
SENTIMENT DATASET – VALIDATION
80% 20%
Bestscoreyet
SENTIMENT DATASET – VALIDATION
99% 1%
Perfectscores
SENTIMENT DATASET – VALIDATION
99.9999% 2
Predict Every Point, k-foldingFolds = 9 Fold = 1 Fold = 2… Y_pred
SENTIMENT DATASET – Validation
10 folds
SENTIMENT DATASET – Validation
100 folds
BIGRAM BOOST
acc: 0.8015r: 0.2061AUROC: 0.8738
acc: 0.7809r: 0.1238AUROC: 0.8554
Feature Creation
Model Selection
Feature Reduction
BETTER MODELS
acc: 0.8208r: 0.2832AUROC: 0.8939
acc: 0.8015r: 0.2061AUROC: 0.8739
Was:
Now: (10x average)
EMAIL CLASSIFICATION(multiclass)
EMAIL MULTICLASS DATASET (20 classes)
alt.atheismcomp.graphicscomp.os.ms-windows.misccomp.sys.ibm.pc.hardwarecomp.sys.mac.hardwarecomp.windows.xmisc.forsalerec.autosrec.motorcyclesrec.sport.baseballrec.sport.hockey
sci.cryptsci.electronicssci.medsci.spacesoc.religion.christiantalk.politics.gunstalk.politics.mideasttalk.politics.misctalk.religion.misc
EMAIL MULTICLASS DATASET (20 classes)
From: [email protected](where'smything)Subject: WHATcaristhis!?Nntp-Posting-Host: rac3.wam.umd.eduOrganization: UniversityofMaryland,CollegeParkLines: 15MSG: I was wondering if anyone out there could enlighten me on this car I saw\nthe other day. It was a 2-door sports car, looked to be from the late 60s/\nearly 70s. It was called a Bricklin. The doors were really small. In addition,\nthe front bumper was separate from the rest of the body. This is \nall I know. If anyone can tellme a model name, engine specs, years\nof production, where this car is made, history, or whatever info you\nhave on this funky looking car, please e-mail.\n\nThanks,\n- IL\n ---- brought to you by your neighborhood Lerxst ----\n\n\n\n\n"
rec.autos
EMAIL MULTICLASS DATASET (20 classes)
From: [email protected](GuyKuo)Subject: SIClockPoll-FinalCallSummary: FinalcallforSIclockreportsKeywords: SI,acceleration,clock,upgradeArticle-I.D.: shelley.1qvfo9INNc3sOrganization: UniversityofWashingtonLines: 11NNTP-Posting-Host: carson.u.washington.eduMSG: AfairnumberofbravesoulswhoupgradedtheirSIclockoscillatorhave\nsharedtheirexperiencesforthispoll.Pleasesendabriefmessagedetailing\nyourexperienceswiththeprocedure.Topspeedattained,CPUratedspeed,\naddoncardsandadapters,heatsinks,hourofusageperday,floppydisk\nfunctionalitywith800and1.4mfloppiesareespeciallyrequested.\n\nIwillbesummarizinginthenexttwodays,sopleaseaddtothenetwork\nknowledgebaseifyouhavedonetheclockupgradeandhaven'tansweredthis\npoll.Thanks.\n\nGuyKuo<[email protected]>\n"
comp.sys.mac.hardware
EMAIL MULTICLASS DATASET (20 classes)
From: jgreen@amber(JoeGreen)Subject: Re:WeitekP9000?Organization: HarrisComputerSystemsDivisionLines: 14Distribution: worldNNTP-Posting-Host: amber.ssd.csd.harris.comX-Newsreader: TIN[version1.1PL9]MSG: RobertJ.C.Kyanko([email protected])wrote:\n>[email protected]<[email protected]>:\n>>AnyoneknowabouttheWeitekP9000graphicschip?\n>Asfarasthelow-levelstuffgoes,itlooksprettynice.It\'sgotthis\n>quadrilateralfillcommandthatrequiresjustthefourpoints.\n\nDoyouhaveWeitek\'saddress/phonenumber?I\'dliketogetsomeinformation\naboutthischip.\n\n--\nJoeGreen\t\t\t\tHarrisCorporation\[email protected]\t\t\tComputerSystemsDivision\n"Theonlythingthatreallyscaresmeisapersonwithnosenseofhumor."\n\t\t\t\t\t\t--JonathanWinters\n’
comp.graphics
EMAIL MULTICLASS DATASET (20 classes)
RESUME MODELING
(binary)
Upload Your Resume
Now painstakingly fill out this form containing all of the exact same information
Document modeling review
UNSTRUCTURED
STRUCTURED
MUNGED
Resume Extension
Resume format consolidation
GPA Inclusion (18%)
GPA Replacement
Mimicking the human recruiterFeature Hunt
ONEFEATUREATATIME
INCREMENTAL GAINS
DEEP LEARNING
UnstructuredENGINEERSANDMANUALFEATURESAREEXPENSIVE,USINGDEEPLEARNINGTOAUTOMATE
AUTOMATIC FEATURE GENERATION
StructuredI want Want
to to go work here PERF.
1 1 1 1 1 95%1 0 0 0 0 22%0 0 0 0 0 57%1 0 0 0 0 91%0 0 0 0 0 11%
ESSAY
I want to work here
I have great teamwork
Synergy
I have so much gritThey fired that
individual
ENGINEERSANDMANUALFEATURESAREEXPENSIVE,USINGDEEPLEARNINGTOAUTOMATE
AUTOMATIC FEATURE GENERATION
ESSAY
I want to work here
I have great teamwork
Synergy
I have so much gritThey fired that
individual
ESSAY
3 2 1 4 5
3 7 67 345
54
3 7 99 10234
78 203 501 14
1 2 3 4 50 0 0 1 01 0 0 0 00 1 0 0 00 0 1 0 0
LSTM
RAWTEXT WORDSEQUENCE
ENCODING
AUTOMATIC FEATURE GENERATION
AUTOMATIC FEATURE GENERATION
AUTOMATIC FEATURE GENERATION
BEGINSCRATCHINGATLAYOUT
AUTOMATIC FEATURE GENERATION (LAYOUT)
CNN:bit.ly/pacon
INTERVIEW MODELING
59
WOULDYOUEVERHIREFROM JUST ARESUME?
INTERVIEW MODELINGSOFT/TECHNICAL COMPETENCIESResumecanoverstateandunderstate
Audio VideoText
QUESTIONS