using deep learning and nlp to predict performance from resumes

61
Using Deep Learning To Predict Performance From Resumes Ben Taylor, Chief Data Scientist

Upload: benjamin-taylor

Post on 16-Apr-2017

663 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Using Deep Learning And NLP To Predict Performance From Resumes

Using Deep Learning To Predict Performance FromResumes

Ben Taylor, Chief Data Scientist

Page 2: Using Deep Learning And NLP To Predict Performance From Resumes

INTRODUCTIONS

Page 3: Using Deep Learning And NLP To Predict Performance From Resumes

Ben Taylor @bentaylordata

Background Personal

Page 4: Using Deep Learning And NLP To Predict Performance From Resumes

• Sequoia Capital

• Largest Video Interviewing Platform

• Forbes #10 most promising companies

• Global: 189 countries

Page 5: Using Deep Learning And NLP To Predict Performance From Resumes

NATURAL LANGUAGE PROCESSING (NLP)

Page 6: Using Deep Learning And NLP To Predict Performance From Resumes

GRIT MOTIVATION ENGAGEMENT PERFORMANCE

1 55 80 95%

0 75 10 22%

0 50 20 57%

1 20 90 91%

0 40 60 11%

BasicTutorialOnHowToBuildANumericFeatureModel

BUILDING A MODEL

Page 7: Using Deep Learning And NLP To Predict Performance From Resumes

ESSAY GRIT MOTIVATION ENGAGEMENT PERFORMANCE

I want to work here 1 55 80 95%

I have great teamwork 0 75 10 22%

Synergy 0 50 20 57%

I have so much grit 1 20 90 91%

They fired that individual 0 40 60 11%

Now what?!?

BUILDING A MODEL

Page 8: Using Deep Learning And NLP To Predict Performance From Resumes

ESSAY PERFORMANCE

I want to work here 95%

I have great teamwork 22%

Synergy 57%

I have so much grit 91%

They fired that individual 11%

There are really two different options, mapping or tokenizing

BUILDING A MODEL

Map:Bad=0Good=1Better=2Best=3

Tokenize:Female=1Male=1

Female Male

1 0

0 1

Page 9: Using Deep Learning And NLP To Predict Performance From Resumes

I want to work here have great PERF.1 1 1 1 1 0 0 95%1 0 0 0 0 1 1 22%0 0 0 0 0 0 0 57%1 0 0 0 0 1 0 91%0 0 0 0 0 0 0 11%

Tokenizethetextintouniquewordcolumns

BUILDING A MODEL

ESSAY PERFORMANCE

I want to work here 95%

I have great teamwork 22%

Synergy 57%

I have so much grit 91%

They fired that individual 11%

Page 10: Using Deep Learning And NLP To Predict Performance From Resumes

I want to work here have great PERF.1 1 1 1 1 0 0 95%1 0 0 0 0 1 1 22%0 0 0 0 0 0 0 57%1 0 0 0 0 1 0 91%0 0 0 0 0 0 0 11%

Bagofwordsmodeling,sequenceandorderingislost

BUILDING A MODEL

Page 11: Using Deep Learning And NLP To Predict Performance From Resumes

Bagofwordsmodeling,sequenceandorderingislost

BUILDING A MODEL

Page 12: Using Deep Learning And NLP To Predict Performance From Resumes

I want Want to to go work here PERF.

1 1 1 1 1 95%1 0 0 0 0 22%0 0 0 0 0 57%1 0 0 0 0 91%0 0 0 0 0 11%

Band-Aid:Conceptofn-grams

BUILDING A MODEL

Page 13: Using Deep Learning And NLP To Predict Performance From Resumes

SENTIMENT EXAMPLE(multiclass)

Page 14: Using Deep Learning And NLP To Predict Performance From Resumes

Weneedalabeleddataset,sometimesgettingonewithlabelsisthebiggestchallengeofall.

SENTIMENT DATASET, 1.5M TWEETS

label textneg @Christian_Rocha i miss u!!!!!pos @llanitos there's still some St Werburghs hone...pos @Ashley96 it's meneg @Phillykidd we use to be like bestfriends

negJust got back from Manchester. I went to the T...

pos @LauraDark thnks x el rt

neg"Ughh it's so hot & the singing lady is st...

neg@hnprashanth @dkris I was out to my native for...

pos Girls night with the bests Wish you were here J!

negJust watched @paulkehler rock the crap out of ...

pos i got the gurl! i got the ride! now im just on...pos @ninthspace how is the table building going?pos by d way guyz I must log out na see u again to...neg @dreday11 its only 20 mins...

Sentiment140 cs.stanford.edu:( :)

Page 15: Using Deep Learning And NLP To Predict Performance From Resumes

Beforewecanprocessthisweneedtodotheproperformattingtogetitready

SENTIMENT DATASET - FORMATTING

text@Christian_Rocha i miss u!!!!!@llanitos there's still some St Werburghs hone...@Ashley96 it's me@Phillykidd we use to be like bestfriendsJust got back from Manchester. I went to the T...@LauraDark thnks x el rt"Ughh it's so hot & the singing lady is st...@hnprashanth @dkris I was out to my native for...Girls night with the bests Wish you were here J!Just watched @paulkehler rock the crap out of ...i got the gurl! i got the ride! now im just on...@ninthspace how is the table building going?by d way guyz I must log out na see u again to...@dreday11 its only 20 mins...

Pythonlist

Page 16: Using Deep Learning And NLP To Predict Performance From Resumes

Nowwecangoallthewaytomodeltrainingandprediction

SENTIMENT DATASET – UNIGRAM

y[0,1,0,1,1]

text_data[[‘thisisatweet’][‘soundsgood’][‘notreally’]]

I want to work here have great1 1 1 1 1 0 01 0 0 0 0 1 10 0 0 0 0 0 01 0 0 0 0 1 00 0 0 0 0 0 0

Page 17: Using Deep Learning And NLP To Predict Performance From Resumes

Nowwecangoallthewaytomodeltrainingandprediction

SENTIMENT DATASET – BIGRAM

I want Want to to go work here

1 1 1 1 11 0 0 0 00 0 0 0 01 0 0 0 00 0 0 0 0

text_data[[‘thisisatweet’][‘soundsgood’][‘notreally’]]

y[0,1,0,1,1]

Page 18: Using Deep Learning And NLP To Predict Performance From Resumes

BUILDING A MODEL

Page 19: Using Deep Learning And NLP To Predict Performance From Resumes

Convertlabelstointegers

SENTIMENT DATASET - FORMATTING

Pythonintarray

labelnegposposnegnegposnegnegposnegposposposneg

Page 20: Using Deep Learning And NLP To Predict Performance From Resumes

Convertlabelstointegers

SENTIMENT DATASET - FORMATTING

model.fit(X,Y)

X[4,0,0,0,0,7,0,0,1][0,0,0,0,9,0,0,0,2]

Page 21: Using Deep Learning And NLP To Predict Performance From Resumes

Nowwecangoallthewaytomodeltrainingandprediction

SENTIMENT DATASET – BUILD A MODEL

y[0,1,0,1,1]

X[4,0,0,0,0,7,0,0,1][0,0,0,0,9,0,0,0,2]

PERFORMANCE?

Page 22: Using Deep Learning And NLP To Predict Performance From Resumes

DON’T CHEAT!

Page 23: Using Deep Learning And NLP To Predict Performance From Resumes

PROPER MODEL VALIDATION

Page 24: Using Deep Learning And NLP To Predict Performance From Resumes

Weneedtoholdoutdatawecantestagainst,thisiscalledyourvalidationset

SENTIMENT DATASET – VALIDATION

Page 25: Using Deep Learning And NLP To Predict Performance From Resumes

Trainon20%,teston80%

SENTIMENT DATASET – VALIDATION

20% 80%

Page 26: Using Deep Learning And NLP To Predict Performance From Resumes

Bestscoreyet

SENTIMENT DATASET – VALIDATION

60% 40%

Page 27: Using Deep Learning And NLP To Predict Performance From Resumes

Bestscoreyet

SENTIMENT DATASET – VALIDATION

70% 30%

Page 28: Using Deep Learning And NLP To Predict Performance From Resumes

Bestscoreyet

SENTIMENT DATASET – VALIDATION

80% 20%

Page 29: Using Deep Learning And NLP To Predict Performance From Resumes

Bestscoreyet

SENTIMENT DATASET – VALIDATION

99% 1%

Page 30: Using Deep Learning And NLP To Predict Performance From Resumes

Perfectscores

SENTIMENT DATASET – VALIDATION

99.9999% 2

Page 31: Using Deep Learning And NLP To Predict Performance From Resumes

Predict Every Point, k-foldingFolds = 9 Fold = 1 Fold = 2… Y_pred

Page 32: Using Deep Learning And NLP To Predict Performance From Resumes

SENTIMENT DATASET – Validation

10 folds

Page 33: Using Deep Learning And NLP To Predict Performance From Resumes

SENTIMENT DATASET – Validation

100 folds

Page 34: Using Deep Learning And NLP To Predict Performance From Resumes

BIGRAM BOOST

acc: 0.8015r: 0.2061AUROC: 0.8738

acc: 0.7809r: 0.1238AUROC: 0.8554

Page 35: Using Deep Learning And NLP To Predict Performance From Resumes

Feature Creation

Model Selection

Feature Reduction

Page 36: Using Deep Learning And NLP To Predict Performance From Resumes

BETTER MODELS

acc: 0.8208r: 0.2832AUROC: 0.8939

acc: 0.8015r: 0.2061AUROC: 0.8739

Was:

Now: (10x average)

Page 37: Using Deep Learning And NLP To Predict Performance From Resumes

EMAIL CLASSIFICATION(multiclass)

Page 38: Using Deep Learning And NLP To Predict Performance From Resumes

EMAIL MULTICLASS DATASET (20 classes)

alt.atheismcomp.graphicscomp.os.ms-windows.misccomp.sys.ibm.pc.hardwarecomp.sys.mac.hardwarecomp.windows.xmisc.forsalerec.autosrec.motorcyclesrec.sport.baseballrec.sport.hockey

sci.cryptsci.electronicssci.medsci.spacesoc.religion.christiantalk.politics.gunstalk.politics.mideasttalk.politics.misctalk.religion.misc

Page 39: Using Deep Learning And NLP To Predict Performance From Resumes

EMAIL MULTICLASS DATASET (20 classes)

From: [email protected](where'smything)Subject: WHATcaristhis!?Nntp-Posting-Host: rac3.wam.umd.eduOrganization: UniversityofMaryland,CollegeParkLines: 15MSG: I was wondering if anyone out there could enlighten me on this car I saw\nthe other day. It was a 2-door sports car, looked to be from the late 60s/\nearly 70s. It was called a Bricklin. The doors were really small. In addition,\nthe front bumper was separate from the rest of the body. This is \nall I know. If anyone can tellme a model name, engine specs, years\nof production, where this car is made, history, or whatever info you\nhave on this funky looking car, please e-mail.\n\nThanks,\n- IL\n ---- brought to you by your neighborhood Lerxst ----\n\n\n\n\n"

rec.autos

Page 40: Using Deep Learning And NLP To Predict Performance From Resumes

EMAIL MULTICLASS DATASET (20 classes)

From: [email protected](GuyKuo)Subject: SIClockPoll-FinalCallSummary: FinalcallforSIclockreportsKeywords: SI,acceleration,clock,upgradeArticle-I.D.: shelley.1qvfo9INNc3sOrganization: UniversityofWashingtonLines: 11NNTP-Posting-Host: carson.u.washington.eduMSG: AfairnumberofbravesoulswhoupgradedtheirSIclockoscillatorhave\nsharedtheirexperiencesforthispoll.Pleasesendabriefmessagedetailing\nyourexperienceswiththeprocedure.Topspeedattained,CPUratedspeed,\naddoncardsandadapters,heatsinks,hourofusageperday,floppydisk\nfunctionalitywith800and1.4mfloppiesareespeciallyrequested.\n\nIwillbesummarizinginthenexttwodays,sopleaseaddtothenetwork\nknowledgebaseifyouhavedonetheclockupgradeandhaven'tansweredthis\npoll.Thanks.\n\nGuyKuo<[email protected]>\n"

comp.sys.mac.hardware

Page 41: Using Deep Learning And NLP To Predict Performance From Resumes

EMAIL MULTICLASS DATASET (20 classes)

From: jgreen@amber(JoeGreen)Subject: Re:WeitekP9000?Organization: HarrisComputerSystemsDivisionLines: 14Distribution: worldNNTP-Posting-Host: amber.ssd.csd.harris.comX-Newsreader: TIN[version1.1PL9]MSG: RobertJ.C.Kyanko([email protected])wrote:\n>[email protected]<[email protected]>:\n>>AnyoneknowabouttheWeitekP9000graphicschip?\n>Asfarasthelow-levelstuffgoes,itlooksprettynice.It\'sgotthis\n>quadrilateralfillcommandthatrequiresjustthefourpoints.\n\nDoyouhaveWeitek\'saddress/phonenumber?I\'dliketogetsomeinformation\naboutthischip.\n\n--\nJoeGreen\t\t\t\tHarrisCorporation\[email protected]\t\t\tComputerSystemsDivision\n"Theonlythingthatreallyscaresmeisapersonwithnosenseofhumor."\n\t\t\t\t\t\t--JonathanWinters\n’

comp.graphics

Page 42: Using Deep Learning And NLP To Predict Performance From Resumes

EMAIL MULTICLASS DATASET (20 classes)

Page 43: Using Deep Learning And NLP To Predict Performance From Resumes

RESUME MODELING

(binary)

Page 44: Using Deep Learning And NLP To Predict Performance From Resumes

Upload Your Resume

Now painstakingly fill out this form containing all of the exact same information

Page 45: Using Deep Learning And NLP To Predict Performance From Resumes

Document modeling review

UNSTRUCTURED

STRUCTURED

MUNGED

Page 46: Using Deep Learning And NLP To Predict Performance From Resumes

Resume Extension

Page 47: Using Deep Learning And NLP To Predict Performance From Resumes

Resume format consolidation

Page 48: Using Deep Learning And NLP To Predict Performance From Resumes

GPA Inclusion (18%)

Page 49: Using Deep Learning And NLP To Predict Performance From Resumes

GPA Replacement

Page 50: Using Deep Learning And NLP To Predict Performance From Resumes

Mimicking the human recruiterFeature Hunt

ONEFEATUREATATIME

INCREMENTAL GAINS

Page 51: Using Deep Learning And NLP To Predict Performance From Resumes

DEEP LEARNING

Page 52: Using Deep Learning And NLP To Predict Performance From Resumes

UnstructuredENGINEERSANDMANUALFEATURESAREEXPENSIVE,USINGDEEPLEARNINGTOAUTOMATE

AUTOMATIC FEATURE GENERATION

StructuredI want Want

to to go work here PERF.

1 1 1 1 1 95%1 0 0 0 0 22%0 0 0 0 0 57%1 0 0 0 0 91%0 0 0 0 0 11%

ESSAY

I want to work here

I have great teamwork

Synergy

I have so much gritThey fired that

individual

Page 53: Using Deep Learning And NLP To Predict Performance From Resumes

ENGINEERSANDMANUALFEATURESAREEXPENSIVE,USINGDEEPLEARNINGTOAUTOMATE

AUTOMATIC FEATURE GENERATION

ESSAY

I want to work here

I have great teamwork

Synergy

I have so much gritThey fired that

individual

ESSAY

3 2 1 4 5

3 7 67 345

54

3 7 99 10234

78 203 501 14

1 2 3 4 50 0 0 1 01 0 0 0 00 1 0 0 00 0 1 0 0

LSTM

RAWTEXT WORDSEQUENCE

ENCODING

Page 54: Using Deep Learning And NLP To Predict Performance From Resumes

AUTOMATIC FEATURE GENERATION

Page 55: Using Deep Learning And NLP To Predict Performance From Resumes

AUTOMATIC FEATURE GENERATION

Page 56: Using Deep Learning And NLP To Predict Performance From Resumes

AUTOMATIC FEATURE GENERATION

Page 57: Using Deep Learning And NLP To Predict Performance From Resumes

BEGINSCRATCHINGATLAYOUT

AUTOMATIC FEATURE GENERATION (LAYOUT)

CNN:bit.ly/pacon

Page 58: Using Deep Learning And NLP To Predict Performance From Resumes

INTERVIEW MODELING

Page 59: Using Deep Learning And NLP To Predict Performance From Resumes

59

WOULDYOUEVERHIREFROM JUST ARESUME?

INTERVIEW MODELINGSOFT/TECHNICAL COMPETENCIESResumecanoverstateandunderstate

Page 60: Using Deep Learning And NLP To Predict Performance From Resumes

Audio VideoText

Page 61: Using Deep Learning And NLP To Predict Performance From Resumes

QUESTIONS