building a machine learning app with aws lambda

23
BUILDING A MACHINE LEARNING APPLICATION WITH AWS LAMBDA Ludi Rehak [email protected] Silicon Valley Big Data Science Meetup March 17, 2016 (+ help from Tom and Prithvi)

Upload: srisatish-ambati

Post on 08-Feb-2017

2.108 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Building a Machine Learning App with AWS Lambda

BUILDINGAMACHINELEARNINGAPPLICATIONWITHAWSLAMBDA

Ludi [email protected]

SiliconValleyBigDataScienceMeetupMarch17,2016

(+helpfromTomandPrithvi)

Page 2: Building a Machine Learning App with AWS Lambda

BUILDINGA MACHINE LEARNINGAPPLICATIONWITHAWSLAMBDA

Q: WhatisAWSLambda?A: AWSLambda isacomputeservicethatrunscode–aLambdafunction- on-demand.Itsimplifiestheprocessofrunningcodeinthecloudbymanagingcomputeresourcesautomatically.

OffloadsDevOps tasksrelatedtoVMs:• Serverandoperatingsystemmaintenance• Capacityprovisioning• Scaling• Codemonitoringandlogging• Securitypatches

Page 3: Building a Machine Learning App with AWS Lambda

MAJORSTEPS

Step1:IdentifyproblemtosolveStep2: TrainmodelondataStep3: ExportthemodelasaPOJOStep4:WritecodeforLambdahandlerStep5: Builddeploymentpackage(.zipfile)and

uploadtoLambdaStep6: MapAPIendpointtoLambdafunctionStep7:Embedendpointinapplication

Page 4: Building a Machine Learning App with AWS Lambda

ACONCRETE USECASE: DOMAINNAMECLASS IFICATION

Maliciousdomains• Carryoutmaliciousactivity- botnets,phishing,malwarehosting,etc

• Namesaregeneratedbyalgorithmstodefeatsecuritysystems

Goal:Classifydomainsaslegitimatevs.malicious

Legitimate Malicioush2o zyxgifnjobqhzptuodmzov

zen-cart c3p4j7zdxexg1f2tuzk117wyzn

fedoraforum batdtrbtrikw

Page 5: Building a Machine Learning App with AWS Lambda

FEATURES

• Stringlength• ShannonEntropy

oMeasureofuncertaintyinarandomvariable

• NumberofsubstringsthatareEnglishwords• Proportionofvowels

Page 6: Building a Machine Learning App with AWS Lambda

DATA

• Domainsandwhethertheyaremaliciouso http://datadrivensecurity.info/blog/data/2014/10/legit-dga_domains.csv.zip

o 133,927 rows• Englishwords

o https://raw.githubusercontent.com/dwyl/english-words/master/words.txt

o 354,985rows

Page 7: Building a Machine Learning App with AWS Lambda

MODELINFORMATION

MaliciousDomainModel

Algorithm: GLMModelfamily: BinomialRegularization: RidgeThreshold(maxF1): 0.4935

Class 0 1 Error

0 15889 315 FPR0.0194

1 346 10043 FNR0.0333

Confusion matrix on validation data

Actual

Predicted

Page 8: Building a Machine Learning App with AWS Lambda

WORKFLOWFORTHISAPP

Inputdomainname

GetPredictions

MaliciousDomain?

Visitwebpage

Malicious Legitimate

Yes No

Page 9: Building a Machine Learning App with AWS Lambda

APPARCHITECTUREDIAGRAM

RESTendpoint

JavaScriptApp

Lambda

JythonFeatureMunging

LambdaFunctionHandler

H2OModelPOJO

Prediction

HTTPS POST

domain name

JSONwith

prediction

Page 10: Building a Machine Learning App with AWS Lambda

LAMBDAFUNCTIONHANDLER

publicstaticResponseClass myHandler(RequestClassrequest,Contextcontext)throwsPyException {

PyModule module=newPyModule();

//Predictioncodeisinpymodule.pydouble[]predictions=module.predict(request.domain);returnnewResponseClass(predictions);}

RESTendpoint

JythonFeatureMunging

LambdaFunctionHandler

H2OModelPOJO

Prediction

Page 11: Building a Machine Learning App with AWS Lambda

JYTHONFEATUREMUNGING

def predict(domain):domain=domain.split('.')[0]row=RowData()functions=[len,entropy,p_vowels,num_valid_substrings]eval_features =[f(domain)forfinfunctions]names=NamesHolder_MaliciousDomainModel().VALUESbeta=MaliciousDomainModel().BETA().VALUESfeature_coef_product =[beta[len(beta)- 1]]fori inrange(len(names)):row.put(names[i],float(eval_features[i]))feature_coef_product.append(eval_features[i]*beta[i])

#predictionmodel=EasyPredictModelWrapper(MaliciousDomainModel())p=model.predictBinomial(row)

RESTendpoint

JythonFeatureMunging

LambdaFunctionHandler

H2OModelPOJO

Prediction

Page 12: Building a Machine Learning App with AWS Lambda

H2OMODEL POJO

• staticfinalclassBETA_0implementsjava.io.Serializable {staticfinalvoidfill(double[]sa){sa[0]=1.49207826021648;sa[1]=2.8502716978560194;sa[2]=-8.839804567200542;sa[3]=-0.7977065034624655;sa[4]=-14.94132841574946;}}

RESTendpoint

JythonFeatureMunging

LambdaFunctionHandler

H2OModelPOJO

Prediction

Page 13: Building a Machine Learning App with AWS Lambda

HANDS-ONDEMONSTRATION

STEP1:Build$git clonehttps://github.com/h2oai/app-malicious-domains$cdapp-consumer-loan$gradle wrapper$./gradlew build

STEP2:CreateLambdafunctionandsetAPIendpointSeeinstructionsandscreenshotsinREADME.md

STEP3:Usetheappinawebbrowser$./gradlew jettyRunWar –xgenerateModelhttp://localhost:8080

Page 14: Building a Machine Learning App with AWS Lambda

TROUBLESHOOTING

• CommonPy errorso AnotherH2Oisalreadyrunning

• Py scriptcan’tfindthedatainh2o.import_file()• CommonJavaerrors

o Javanotinstalledatall• Also,mustinstallaJDK(JavaDevelopmentKit)sothattheJavacompileris

available(JREisnotsufficient)o Notconnectedtotheinternet

• Gradle needstofetchsomedependenciesfromtheinternet• CommonLambdaerrors

o Errorinuploading.zipfile• Checkifthefunctionalreadyexistsand,ifnot,tryagain.Forslowerinternet

connections,tryuploading.zipfilewithS3link.o TimeouterrorwhentestingLambdafunction

• GotoadvancedsettingsandincreaseTimeoutvalueo GatewayTimeout(504error)

• ThisisLambda’scoldstartbehavior.Keeptrying,eventuallyLambdakicksin

Page 15: Building a Machine Learning App with AWS Lambda

CAVEATS

• Statelesso Canaccessstateful databycallingotherwebservices,suchasAmazonS3orAmazonDynamoDB.

• Coldstartbehavioro containersareinstantiatedandreusedafterthefirstrequestandstayactiveforawindowoftime(10-20minutes)

o “thelongerIleaveitbetweeninvocations,thelongerthefunctiontakestowarmup”

• APIGatewaytimeoutof10secso Canrequestlongertimeout

Page 16: Building a Machine Learning App with AWS Lambda

CONFIGURINGLAMBDAFUNCTIONS

• Memoryo AllocatesproportionalCPUpower,networkbandwidth,anddiskI/O

o Easysingle-dialsolutiono Logshowshowmuchmemorywasusedfortuningandcostsavings

• Timeout

Page 17: Building a Machine Learning App with AWS Lambda

LAMBDARESOURCEL IMITS

Resource DefaultLimit

Memory 512MB

Numberof filedescriptors 1,024

Numberofprocessesandthreads(combined total)

1,024

Maximumexecutiondurationperrequest 300seconds

Invoke requestbodypayloadsize 6MB

Invoke responsebodypayloadsize 6MB

Concurrentexecutionsperregion 100

Item DefaultLimit

Lambdafunction deploymentpackagesize(.zip/.jarfile)

50MB

Sizeofcode/dependencies thatyoucanzipintoadeploymentpackage(uncompressed zip/jarsize)

250MB

Page 18: Building a Machine Learning App with AWS Lambda

LAMBDAPRICING

• Lambdao Requests

• First1millionpermontharefree• $0.20per1millionrequeststhereafter

o Duration• First400,000GB-secondsofcomputetimepermontharefree• $0.00001667foreveryGB-second thereafter

• APIGatewayo $3.50permillionAPIcallsreceivedplusdatatransfercosts

• EstimateforMaliciousDomainApplication:• Lambda:$0.37/hourwith10threadsafterfree-tier• APIGateway:$0.71/hour• Total:~$1/hr

Page 19: Building a Machine Learning App with AWS Lambda

LAMBDAPERFORMANCE

Memory(MB) Threads Loops Samples Median

(ms)Min(ms)

Max(ms)

%Error

Throughput(calls/sec)

512 1 10000 10000 102 85 2137 0 8.4

512 10 1000 10000 102 85 30330 0.18 44

512 100 100 10000 149 85 30307 0.43 168

Page 20: Building a Machine Learning App with AWS Lambda

LAMBDASCALING

• Automaticallyscalestosupporttherateofincomingrequests

• “Nolimittothenumberofrequestsyourcodecanhandle”

• StartsasmanyinstancesofLambdafunctionasneeded

Page 21: Building a Machine Learning App with AWS Lambda

RELATEDEXAMPLES

• H2OGeneratedModelPOJOinaJavaServletcontainero Github:h2oai/app-consumer-loan

• H2OGeneratedModelPOJOinaStormbolto GitHub:h2oai/h2o-world-2015-trainingo tutorials/streaming/storm

• H2OGeneratedModelPOJOinSparkStreamingo GitHub:h2oai/sparkling-watero examples/src/main/scala/org/apache/spark/examples/h2o/CraigslistJobTitlesStreamingApp.scala

Page 22: Building a Machine Learning App with AWS Lambda

RESOURCESONTHEWEB

• Slideso GitHub h2oai/h2o-tutorials/tree/master/tutorials/aws-lambda-app

• Sourcecodeo GitHub h2oai/app-malicious-domains

• LateststableH2OforPythonreleaseo http://h2o.ai/download/h2o/python

• GeneratedPOJOmodelJavadoco http://h2o-release.s3.amazonaws.com/h2o/rel-turan/3/docs-

website/h2o-genmodel/javadoc/index.html

• AWSLambdao http://docs.aws.amazon.com/lambda/latest/dg/welcome.html

Page 23: Building a Machine Learning App with AWS Lambda

Q&A

• Thanksforattending!

• Sendfollowupquestionsto:

Ludi [email protected]