using deep learning and nlp to predict performance from resumes

Post on 16-Apr-2017

663 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Using Deep Learning To Predict Performance FromResumes

Ben Taylor, Chief Data Scientist

INTRODUCTIONS

Ben Taylor @bentaylordata

Background Personal

• Sequoia Capital

• Largest Video Interviewing Platform

• Forbes #10 most promising companies

• Global: 189 countries

NATURAL LANGUAGE PROCESSING (NLP)

GRIT MOTIVATION ENGAGEMENT PERFORMANCE

1 55 80 95%

0 75 10 22%

0 50 20 57%

1 20 90 91%

0 40 60 11%

BasicTutorialOnHowToBuildANumericFeatureModel

BUILDING A MODEL

ESSAY GRIT MOTIVATION ENGAGEMENT PERFORMANCE

I want to work here 1 55 80 95%

I have great teamwork 0 75 10 22%

Synergy 0 50 20 57%

I have so much grit 1 20 90 91%

They fired that individual 0 40 60 11%

Now what?!?

BUILDING A MODEL

ESSAY PERFORMANCE

I want to work here 95%

I have great teamwork 22%

Synergy 57%

I have so much grit 91%

They fired that individual 11%

There are really two different options, mapping or tokenizing

BUILDING A MODEL

Map:Bad=0Good=1Better=2Best=3

Tokenize:Female=1Male=1

Female Male

1 0

0 1

I want to work here have great PERF.1 1 1 1 1 0 0 95%1 0 0 0 0 1 1 22%0 0 0 0 0 0 0 57%1 0 0 0 0 1 0 91%0 0 0 0 0 0 0 11%

Tokenizethetextintouniquewordcolumns

BUILDING A MODEL

ESSAY PERFORMANCE

I want to work here 95%

I have great teamwork 22%

Synergy 57%

I have so much grit 91%

They fired that individual 11%

I want to work here have great PERF.1 1 1 1 1 0 0 95%1 0 0 0 0 1 1 22%0 0 0 0 0 0 0 57%1 0 0 0 0 1 0 91%0 0 0 0 0 0 0 11%

Bagofwordsmodeling,sequenceandorderingislost

BUILDING A MODEL

Bagofwordsmodeling,sequenceandorderingislost

BUILDING A MODEL

I want Want to to go work here PERF.

1 1 1 1 1 95%1 0 0 0 0 22%0 0 0 0 0 57%1 0 0 0 0 91%0 0 0 0 0 11%

Band-Aid:Conceptofn-grams

BUILDING A MODEL

SENTIMENT EXAMPLE(multiclass)

Weneedalabeleddataset,sometimesgettingonewithlabelsisthebiggestchallengeofall.

SENTIMENT DATASET, 1.5M TWEETS

label textneg @Christian_Rocha i miss u!!!!!pos @llanitos there's still some St Werburghs hone...pos @Ashley96 it's meneg @Phillykidd we use to be like bestfriends

negJust got back from Manchester. I went to the T...

pos @LauraDark thnks x el rt

neg"Ughh it's so hot & the singing lady is st...

neg@hnprashanth @dkris I was out to my native for...

pos Girls night with the bests Wish you were here J!

negJust watched @paulkehler rock the crap out of ...

pos i got the gurl! i got the ride! now im just on...pos @ninthspace how is the table building going?pos by d way guyz I must log out na see u again to...neg @dreday11 its only 20 mins...

Sentiment140 cs.stanford.edu:( :)

Beforewecanprocessthisweneedtodotheproperformattingtogetitready

SENTIMENT DATASET - FORMATTING

text@Christian_Rocha i miss u!!!!!@llanitos there's still some St Werburghs hone...@Ashley96 it's me@Phillykidd we use to be like bestfriendsJust got back from Manchester. I went to the T...@LauraDark thnks x el rt"Ughh it's so hot & the singing lady is st...@hnprashanth @dkris I was out to my native for...Girls night with the bests Wish you were here J!Just watched @paulkehler rock the crap out of ...i got the gurl! i got the ride! now im just on...@ninthspace how is the table building going?by d way guyz I must log out na see u again to...@dreday11 its only 20 mins...

Pythonlist

Nowwecangoallthewaytomodeltrainingandprediction

SENTIMENT DATASET – UNIGRAM

y[0,1,0,1,1]

text_data[[‘thisisatweet’][‘soundsgood’][‘notreally’]]

I want to work here have great1 1 1 1 1 0 01 0 0 0 0 1 10 0 0 0 0 0 01 0 0 0 0 1 00 0 0 0 0 0 0

Nowwecangoallthewaytomodeltrainingandprediction

SENTIMENT DATASET – BIGRAM

I want Want to to go work here

1 1 1 1 11 0 0 0 00 0 0 0 01 0 0 0 00 0 0 0 0

text_data[[‘thisisatweet’][‘soundsgood’][‘notreally’]]

y[0,1,0,1,1]

BUILDING A MODEL

Convertlabelstointegers

SENTIMENT DATASET - FORMATTING

Pythonintarray

labelnegposposnegnegposnegnegposnegposposposneg

Convertlabelstointegers

SENTIMENT DATASET - FORMATTING

model.fit(X,Y)

X[4,0,0,0,0,7,0,0,1][0,0,0,0,9,0,0,0,2]

Nowwecangoallthewaytomodeltrainingandprediction

SENTIMENT DATASET – BUILD A MODEL

y[0,1,0,1,1]

X[4,0,0,0,0,7,0,0,1][0,0,0,0,9,0,0,0,2]

PERFORMANCE?

DON’T CHEAT!

PROPER MODEL VALIDATION

Weneedtoholdoutdatawecantestagainst,thisiscalledyourvalidationset

SENTIMENT DATASET – VALIDATION

Trainon20%,teston80%

SENTIMENT DATASET – VALIDATION

20% 80%

Bestscoreyet

SENTIMENT DATASET – VALIDATION

60% 40%

Bestscoreyet

SENTIMENT DATASET – VALIDATION

70% 30%

Bestscoreyet

SENTIMENT DATASET – VALIDATION

80% 20%

Bestscoreyet

SENTIMENT DATASET – VALIDATION

99% 1%

Perfectscores

SENTIMENT DATASET – VALIDATION

99.9999% 2

Predict Every Point, k-foldingFolds = 9 Fold = 1 Fold = 2… Y_pred

SENTIMENT DATASET – Validation

10 folds

SENTIMENT DATASET – Validation

100 folds

BIGRAM BOOST

acc: 0.8015r: 0.2061AUROC: 0.8738

acc: 0.7809r: 0.1238AUROC: 0.8554

Feature Creation

Model Selection

Feature Reduction

BETTER MODELS

acc: 0.8208r: 0.2832AUROC: 0.8939

acc: 0.8015r: 0.2061AUROC: 0.8739

Was:

Now: (10x average)

EMAIL CLASSIFICATION(multiclass)

EMAIL MULTICLASS DATASET (20 classes)

alt.atheismcomp.graphicscomp.os.ms-windows.misccomp.sys.ibm.pc.hardwarecomp.sys.mac.hardwarecomp.windows.xmisc.forsalerec.autosrec.motorcyclesrec.sport.baseballrec.sport.hockey

sci.cryptsci.electronicssci.medsci.spacesoc.religion.christiantalk.politics.gunstalk.politics.mideasttalk.politics.misctalk.religion.misc

EMAIL MULTICLASS DATASET (20 classes)

From: lerxst@wam.umd.edu(where'smything)Subject: WHATcaristhis!?Nntp-Posting-Host: rac3.wam.umd.eduOrganization: UniversityofMaryland,CollegeParkLines: 15MSG: I was wondering if anyone out there could enlighten me on this car I saw\nthe other day. It was a 2-door sports car, looked to be from the late 60s/\nearly 70s. It was called a Bricklin. The doors were really small. In addition,\nthe front bumper was separate from the rest of the body. This is \nall I know. If anyone can tellme a model name, engine specs, years\nof production, where this car is made, history, or whatever info you\nhave on this funky looking car, please e-mail.\n\nThanks,\n- IL\n ---- brought to you by your neighborhood Lerxst ----\n\n\n\n\n"

rec.autos

EMAIL MULTICLASS DATASET (20 classes)

From: guykuo@carson.u.washington.edu(GuyKuo)Subject: SIClockPoll-FinalCallSummary: FinalcallforSIclockreportsKeywords: SI,acceleration,clock,upgradeArticle-I.D.: shelley.1qvfo9INNc3sOrganization: UniversityofWashingtonLines: 11NNTP-Posting-Host: carson.u.washington.eduMSG: AfairnumberofbravesoulswhoupgradedtheirSIclockoscillatorhave\nsharedtheirexperiencesforthispoll.Pleasesendabriefmessagedetailing\nyourexperienceswiththeprocedure.Topspeedattained,CPUratedspeed,\naddoncardsandadapters,heatsinks,hourofusageperday,floppydisk\nfunctionalitywith800and1.4mfloppiesareespeciallyrequested.\n\nIwillbesummarizinginthenexttwodays,sopleaseaddtothenetwork\nknowledgebaseifyouhavedonetheclockupgradeandhaven'tansweredthis\npoll.Thanks.\n\nGuyKuo<guykuo@u.washington.edu>\n"

comp.sys.mac.hardware

EMAIL MULTICLASS DATASET (20 classes)

From: jgreen@amber(JoeGreen)Subject: Re:WeitekP9000?Organization: HarrisComputerSystemsDivisionLines: 14Distribution: worldNNTP-Posting-Host: amber.ssd.csd.harris.comX-Newsreader: TIN[version1.1PL9]MSG: RobertJ.C.Kyanko(rob@rjck.UUCP)wrote:\n>abraxis@iastate.eduwritesinarticle<abraxis.734340159@class1.iastate.edu>:\n>>AnyoneknowabouttheWeitekP9000graphicschip?\n>Asfarasthelow-levelstuffgoes,itlooksprettynice.It\'sgotthis\n>quadrilateralfillcommandthatrequiresjustthefourpoints.\n\nDoyouhaveWeitek\'saddress/phonenumber?I\'dliketogetsomeinformation\naboutthischip.\n\n--\nJoeGreen\t\t\t\tHarrisCorporation\njgreen@csd.harris.com\t\t\tComputerSystemsDivision\n"Theonlythingthatreallyscaresmeisapersonwithnosenseofhumor."\n\t\t\t\t\t\t--JonathanWinters\n’

comp.graphics

EMAIL MULTICLASS DATASET (20 classes)

RESUME MODELING

(binary)

Upload Your Resume

Now painstakingly fill out this form containing all of the exact same information

Document modeling review

UNSTRUCTURED

STRUCTURED

MUNGED

Resume Extension

Resume format consolidation

GPA Inclusion (18%)

GPA Replacement

Mimicking the human recruiterFeature Hunt

ONEFEATUREATATIME

INCREMENTAL GAINS

DEEP LEARNING

UnstructuredENGINEERSANDMANUALFEATURESAREEXPENSIVE,USINGDEEPLEARNINGTOAUTOMATE

AUTOMATIC FEATURE GENERATION

StructuredI want Want

to to go work here PERF.

1 1 1 1 1 95%1 0 0 0 0 22%0 0 0 0 0 57%1 0 0 0 0 91%0 0 0 0 0 11%

ESSAY

I want to work here

I have great teamwork

Synergy

I have so much gritThey fired that

individual

ENGINEERSANDMANUALFEATURESAREEXPENSIVE,USINGDEEPLEARNINGTOAUTOMATE

AUTOMATIC FEATURE GENERATION

ESSAY

I want to work here

I have great teamwork

Synergy

I have so much gritThey fired that

individual

ESSAY

3 2 1 4 5

3 7 67 345

54

3 7 99 10234

78 203 501 14

1 2 3 4 50 0 0 1 01 0 0 0 00 1 0 0 00 0 1 0 0

LSTM

RAWTEXT WORDSEQUENCE

ENCODING

AUTOMATIC FEATURE GENERATION

AUTOMATIC FEATURE GENERATION

AUTOMATIC FEATURE GENERATION

BEGINSCRATCHINGATLAYOUT

AUTOMATIC FEATURE GENERATION (LAYOUT)

CNN:bit.ly/pacon

INTERVIEW MODELING

59

WOULDYOUEVERHIREFROM JUST ARESUME?

INTERVIEW MODELINGSOFT/TECHNICAL COMPETENCIESResumecanoverstateandunderstate

Audio VideoText

QUESTIONS

top related