using the splunkmachine learning toolkit to create … create your own custom models ... •...
TRANSCRIPT
Copyright©2016Splunk Inc.
Dr.AdamOlinerDirectorofEngineering,DataScience,Splunk
UsingtheSplunk MachineLearningToolkittoCreateYourOwnCustomModels
ManishSainaniPrincipalProductManager,Splunk
Disclaimer
2
Duringthecourseofthispresentation,wemaymakeforwardlookingstatementsregardingfutureeventsortheexpectedperformanceofthecompany.Wecautionyouthatsuchstatementsreflectourcurrentexpectationsandestimatesbasedonfactorscurrentlyknowntousandthatactualeventsorresultscoulddiffermaterially.Forimportantfactorsthatmaycauseactualresultstodifferfromthose
containedinourforward-lookingstatements,pleasereviewourfilingswiththeSEC.Theforward-lookingstatementsmadeinthethispresentationarebeingmadeasofthetimeanddateofitslivepresentation.Ifreviewedafteritslivepresentation,thispresentationmaynotcontaincurrentoraccurateinformation.Wedonotassumeanyobligationtoupdateanyforwardlookingstatementswemaymake.Inaddition,anyinformationaboutourroadmapoutlinesourgeneralproductdirectionandissubjecttochangeatanytimewithoutnotice.Itisforinformationalpurposesonlyandshallnot,beincorporatedintoanycontractorothercommitment.Splunkundertakesnoobligationeithertodevelopthefeaturesor
functionalitydescribedortoincludeanysuchfeatureorfunctionalityinafuturerelease.
Whoarewe?
3
Dr.AdamOliner– DirectorofEngineering,DataScience&MachineLearning– Splunker for2years– Embarrassinglyovereducated
ManishSainani– PrincipalProductManager,MachineLearning– Splunker for2years– FirstMLhireatSplunk!
Whatarewedoinghere?
4
OverviewofMachineLearningTheAssistants:GuidedMachineLearning– Prepare– Fit– Validate– Deploy
Examples– DIYAnomalyDetector– CustomerApplications
OverviewofMLatSplunk
CorePlatformSearch PackagedPremiumSolutions CustomML
PlatformforOperationalIntelligence
SplunkMachineLearningToolkit
Assistants: Guidemodelbuilding,testing,&deployingforcommonobjectivesShowcases: InteractiveexamplesfortypicalIT,security,business,IoTusecases
Algorithms: 25+standardalgorithmsavailableprepackagedwiththetoolkitSPLMLCommands:Newcommandstofit,testandoperationalizemodelsPythonforScientificComputingLibrary:300+opensourcealgorithmsavailableforuse
Buildcustomanalyticsforanyusecase
ExtendsSplunkplatformfunctionsandprovidesaguidedmodelingenvironment
What’sNewsinceour0.9BetaRelease(lastyear’s.conf)?
7
• Newnameandabbreviation;-)• Noeventlimits(removalof50Klimitonfittingmodels)
• Configurableresourcecapsviamlspl.conf
• Searchheadclusteringsupport• Distributed/streamingapply• Scheduledfit• Newalgorithms(nextslide)
– Featureengineeringandselection– Stochasticgradientdescent(e.g.)– ARIMA
• Multi-algorithmsupportacrossAssistants
• Scatterplotmatrixviz• Alerting• Tooltips• In-apptours• ClusterNumericEventsassistant• VideosvideosvideosforeachassistantacrossIT,Security,IoT andBusinessAnalytics
• ML-SPLCheatSheet
MachineLearning
10
AprocessforgeneralizingfromexamplesExamples– A,B,…→ # (regression)– A,B,... → a (classification)– Xpast → Xfuture (forecasting)– likewithlike (clustering)– |Xpredicted – Xactual|>>0 (anomalydetection)
MachineLearningProcess
11
CollectData
Explore/Visualize
Model
Evaluate
Clean/Transform
Publish/Deploy
MachineLearningProcesswithSplunk
12
CollectData
Explore/Visualize
Model
Evaluate
Clean/Transform
Publish/Deploy
props.conf,transforms.conf,DatamodelsAdd-onsfromSplunkbase,etc.
Pivot,TableUI,SPLMLToolkit
Alerts,Dashboards,Reports
DomainExpertise(IT,Security,…)
DataScienceExpertise
SplunkExpertise
CustomMachineLearning– SuccessFormula
Identifyusecases
Drivedecisions
Setbusiness/opspriorities
SPL
Dataprep
Statistics/mathbackground
Algorithmselection
Modelbuilding
SplunkMLToolkitfacilitatesandsimplifiesviaexamples&guidance
Operationalsuccess
GuidedMLwiththeAssistants
14
Guidesyouthroughvariousanalytics– Prepare,fit,validate,anddeploy
AutomaticallygeneratesalltherelevantSPL
TheAssistants
18
1. PredictNumericFields2. PredictCategoricalFields3. DetectNumericOutliers4. DetectCategoricalOutliers5. ForecastTimeSeries6. ClusterNumericEvents
PredictNumericFields
19
Algorithms– LinearRegression
ê …includingLasso,Ridge,andElasticNet– KernelRidge– DecisionTreeRegressor– RandomForestRegressor– SGDRegressor
Validation– Fourvisualizationsofpredictionerror– R2 andRMSE
PredictCategoricalFields
20
Algorithms– LogisticRegression– DecisionTreeClassifier– RandomForestClassifier– SGDClassifier– SVM– NaïveBayes
ê BernoulliNB andGuassianNB
Validation– Precision,recall,accuracy,F1– Confusionmatrix
DetectNumericOutliers
21
Methods– Standarddeviation– Medianabsolutedeviation– Interquartilerange
Validation:
ClusterNumericEvents
24
Algorithms– KMeans– DBSCAN– Birch– SpectralClustering
Validation– ScatterplotMatrixviz
Splunk!
27
Leadingplatformforcollecting,cleaning,andtransformingdataInteractiveFieldExtractorDatamodelsHundredsofadd-onsfromSplunkbasetransforms.confprops.confetc.
FeatureEngineeringTFIDF(term-frequencyxinversedocument-frequency)– Transformfree-formtextintonumericattributes
StandardScaler (i.e.normalization)FieldSelector (i.e.choosekbestfeaturesforregression/classification)PCAandKernelPCA
Fit:What’sNew
31
NoeventlimitsConfigurableresourcecaps(ml-spl.conf)SearchheadclusteringsupportScheduledfitNewalgorithms
Validate/Apply:What’sNew
34
ConfigurableresourcecapsSearchheadclusteringsupportDistributed/streamingapplyScatterplotmatrixviz
Let’sBuildanAnomalyDetector!
42
We’llusetwoAssistants– PredictNumericFields– DetectNumericOutliers
Showautomatically-generatedintermediateSPL
YouBuiltanAnomalyDetector!
54
YoubuiltapredictivemodelofACPowerWhenthepredictionerrorfromthismodelisanoutliercomparedtopasterrors,yougenerateanalertThispredictivemodelautomaticallyretrainsitselfonascheduleyoucontrolYoudidn’thavetotypeanySPL
MachineLearningCustomerSuccess
NetworkOptimizationDetect&PreventEquipmentFailure Security/FraudPrevention
PrioritizeWebsiteIssuesandPredictRootCause
PredictGamingOutagesFraudPrevention
MachineLearningConsultingServices AnalyticsAppbuiltonMLToolkit
Optimizingoperationsandbusinessresults
PreventCellTowerFailureOptimizeRepairOperations
Entertainment Company
15
MachineLearningToolkitCustomerUseCases
57
Speedingwebsiteproblemresolutionbyautomaticallyrankingactionsforsupportengineers
Reducingcustomerservicedisruptionwithearlyidentificationofdifficult-to-detectnetworkincidents
Minimizingcelltowerdegradationanddowntimewithimprovedissuedetectionsensitivity
Improvinguptimeandloweringcostsbypredicting/preventingcelltowerfailuresandoptimizing repairtruckrolls
Predictingandavertingpotentialgamingoutageconditionswithfiner-graineddetection
EnsuringmobiledevicesecuritybydetectinganomaliesinIDauthentication
PreventingfraudbyIdentifyingmaliciousaccountsandsuspiciousactivitiesEntertainment Company
DetectNetworkOutliersReduceddowntime+increasedserviceavailability=bettercustomersatisfaction
58
MLUseCase Monitornoiserisefor20,000+celltowerstoincreaseserviceanddeviceavailability,reduceMTTR
Technicaloverview • Acustomizedsolutiondeployedinproductionbasedonoutlierdetection.• Leveragepreviousmonthdataandvotingalgorithms
“TheabilitytomodelcomplexsystemsandalertondeviationsiswhereITandsecurityoperationsareheaded…SplunkMachineLearninghasgivenusaheadstart...”
ReliablewebsiteupdatesProactivewebsitemonitoringleadstoreduceddowntime
59
“SplunkMLhelpsusrapidlyimproveend-userexperiencebyrankingissue severitywhichhelpsusdeterminerootcausesfasterthusreducingMTTRandimprovingSLA”
• Veryfrequentcodeandconfig updates(1000+daily)cancausesiteissues• Finderrorsinserverpools,thenprioritizeactionsandpredictrootcause
• CustomoutlierdetectionbuiltusingMLToolkitOutlierassistant• BuiltbySplunkArchitectwithnoDataSciencebackground
MLUseCase
Technicaloverview
WhatNow?
60
GettheMachineLearningToolkitfromSplunkbaseGowatchMachineLearningVideosonSplunkYoutube Channelhttp://tiny.cc/splunkmlvideosGotoMachineLearningstalks:– AdvancedMachineLearninginSPLwiththeMachineLearningToolkitbyJacobLeverich– ExtendingSPLwithCustomSearchCommandsandtheSplunkSDKforPythonbyJacobLeverich
SeveralCustomersandPartnerTalks– Cisco,Scianta Analytics,AsianTelco,etc.EarlyAdopterAndCustomerAdvisoryProgram:[email protected]:[email protected]:[email protected]
http://tiny.cc/splunkmlapp