di&a slides: descriptive, prescriptive, and predictive analytics
Post on 05-Apr-2017
1.242 Views
Preview:
TRANSCRIPT
The First Step in Information Management
www.firstsanfranciscopartners.com
Producedby:
MONTHLY SERIES
Broughttoyouinpartnershipwith:
March 2, 2017Descriptive, Prescriptive and Predictive Analytics
PollingQuestions
§ What typeofstatisticalanalysesdoyouuseorplantouse(canchoosemultipleanswers)?− Descriptive− Predictive− Prescriptive− Idon’tuseanyofthese− Idon’tknowthedifferencebetweenthese
pg 2© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
PollingQuestions
§ What typeofstatisticalanalysesdoyouuseorplantouse(canchoosemultipleanswers)?− Descriptive− Predictive− Prescriptive− Idon’tuseanyofthese− Idon’tknowthedifferencebetweenthese
§ Howfrequentlydoyouusestatisticalanalysesinyourwork?− Idon’tcurrentlydoanytypeofstatisticalanalysis− Lessthanonceaweek− Onceorafewtimesaweek− Atleastonceaday
pg 3© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
TopicsForToday’sWebinar
pg 3© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
§ Overviewofstatisticalanalysisprocess− Formingahypothesis
− Identifyingappropriatesources
− Proving/Disprovingthehypothesis
§ Typesofdataanalysis− Descriptivedataanalytics
− Predictivedataanalytics
− Prescriptivedataanalytics
§ Howthesetypescomparewithintheanalyticenvironment
§ Keytakeawaysandsuggestedresources
Combine?
Descriptive
Predictive
Prescriptive
TheProcessofStatisticalAnalysis
pg 5© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
FormHypotheses
• Null:Nothingspecial
• Alternative:Somethingunique,anactionablefinding,etc.
IdentifyDataSource
• Don’tgooverboard!
• Collectyourown,OR
• Usesecondarydata
Prove/DisproveHypothesis
• IsTypeIorTypeIIerrorworse?
• Chooseconfidencelevel
• Reject/notrejectnull
Whenwehaveresourceconstraints,StatisticalAnalysisenablesustomakequantitativeinferencesbasedonanamountofinformationwecananalyze(asample).
Step1:FormingaHypothesis
§ Instatisticalanalysis,wehavetwohypotheses:− Nullhypothesis:Claimsthatanyirregularitiesinthesamplearedue
tochance− Alternativehypothesis:Claimsthatirregularitiesinthesamplearedue
tonon-randomcauses(andwouldthereforereflectthepopulation)§ Whatareyoureallylookingtodiscover/prove?− Experiment1:
§ Null:Thereisnodifferenceintheamountsoldwhencomparingsalespeoplewhodidanddidnotreceivetraining.
§ Alternative:Thereisadifferenceintheamountsoldwhencomparingsalespeoplewhodidanddidnotreceivetraining.
− Experiment2:§ Null:Thesalespeoplewhoreceivedtrainingdonotsellmoreonaveragethanthesalespeoplewhodidnotreceivetraining.
§ Alternative:Salespeoplewhoreceivedthetrainingsellmoreonaveragethanthosewhodidnotreceivethetraining.
pg 6© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Step1
Step2:IdentifyingAppropriateSources
§ Remember,youdon’tneedBigDataforeverydecision!§ Sometimes,knowingwhatdatayoudon’t needisjustasimportantasknowingwhatyoudo need.Keepyourenddecisioninmind.
§ Potentialsourcesofdata:− Primarydata− collectnewdata
§ Whotoinclude:Randomsample,stratifiedrandomsample,etc.§ Howmanytoinclude:Samplesizecalculatorsonline(free)§ Determinethelevelofmeasurementneededforyourdesiredanalysis:categorical,ordinal,interval,rational
§ Asnecessary,designacontrolgroup− Secondarydata− utilizeexistingdata
§ Censusrecords,syndicateddata,governmentdata,etc.
§ Consideryourdataneeds,datacleanliness,cost,etc.,whendeterminingappropriatesources.
pg 7© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Step2
Step3:Proving/DisprovingtheHypothesis
§ Establishaconfidencelevelpriortoanalysis.§ Confidencelevels:
1. Determinehowsignificantadifference/irregularitymustbeforyoutoprove/disproveyouralternativehypothesis.
2. Determinehowconfidentyoucanbeinyourdecision.
§ Evenwithahighconfidencelevel,youaren’talwaysright:− TypeIerror:Yourejectthenullhypothesisbutshouldn’thave.− TypeIIerror:Youdonotrejectthenullhypothesisbutshouldhave.− Howtodecreasethelikelihoodoftheseerrors:changetheconfidencelevel,increase
samplesize(beawareofeffectsize),etc.
§ Determinewhichtypeoferrorismoredetrimentaltoyourinvestigationandsetupyourstudyaccordingly.
pg 8© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Step3
Step3:Proving/DisprovingtheHypothesis
pg 9© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Training N Mean Std. Deviation
Std. Error Mean
No training 74 102.643 9.95482 1.15722Training 74 106.3889 9.83445 1.14323
QPctQ3
Sig. (2-tailed)
Mean Difference
Std. Error Difference
95% Confidence Interval of the Difference
95% Confidence Interval of the Difference
Lower Upper
0.029 0.865 -2.303 146 0.023 -3.74595 1.6267 -6.96086 -0.53103
-2.303 145.978 0.023 -3.74595 1.6267 -6.96087 -0.53102
Levene's Test for Equality of Variances
t-test for Equality of Means
F Sig.
§ Confidencelevel=95%
§ Alpha=0.05
100
102
104
106
108
Notraining Training
Percentof3rdQuarterQuotaSoldbyTrainedvs.Untrained
Salespeople
www.firstsanfranciscopartners.com
TypesofDataAnalysis
TypesofDataAnalysis
pg 11© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Predictive PrescriptiveDescriptive
• Aimstohelpuncovervaluableinsightfromthedatabeinganalyzed
• Answersthequestion“Whathappened?”
• Helpsforecastbehaviorofpeopleandmarkets
• Answersthequestion“Whatcouldhappen?”
• Suggestsconclusionsoractionsthatmaybetakenbasedontheanalysis
• Answersthequestion“Whatshouldbedone?”
§ Thoughthemostsimpletype,itisusedmostoften.
§ Twotypesofdescriptiveanalysis:1. Measuresofcentraltendency(tellsus
aboutthemiddle)§ Mean− theaverage§ Median− themidpointofthe
responses§ Mode− theresponsewiththehighest
frequency2. Measuresofdispersion
§ Range− themin,themaxandthedistancebetweenthetwo
§ Variance− theaveragedegreetowhicheachofthepointsdifferfromthemean
§ StandardDeviation−themostcommon/standardwayofexpressingthespreadofdata
pg 12© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Customer_ID ItemsPurchased AmountSpent29304 1 1.09$28308 3 44.43$19962 21 218.58$30281 1 73.02$
6.5
2
1
0
1
2
3
4
5
6
7
Mean Median Mode
Mean,MedianandModeAmountsofItemsPurchased
Descriptive DataAnalytics
www.firstsanfranciscopartners.com
AnalysisPredictive
§ Somemistakepredictiveanalysistohaveexclusiverelevancetopredictingfuture events.− However,incasessuchassentimentanalysis,existingdata(e.g.,thetextofatweet)isusedtopredictnon-existentdata(whetherthetweetispositiveornegative).
§ Severalofthemodelsthatcanbeusedforpredictiveanalysisare:− Forecasting− Simulation− Regression− Classification− Clustering
pg 14© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Predictive DataAnalytics
Forecasting
§ Forecasting:− Movingaveragetechnique:usethe
meanofpriorperiodstopredictthenext§ Themeanofperiods1−4=period5§ Themeanofperiods2−5=period6
− Exponentialsmoothingtechnique:similar,butmorerecentdatapointsareweightedmoreheavilyduetorelevance
− Regressiontechniques§ Usecautioninforecasting– Thelargertheforecastedtimeperiod,thelessaccuracythereisintheprojections.
pg 15© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
$-
$5,000.00
$10,000.00
$15,000.00
$20,000.00
$25,000.00
2006 2008 2010 2012 2014 2016 2018 2020 2022
NetIncomeofStoreCProjected2017-2020
Predictive
Simulation
§ Simulation− Queuingmodels:usedtopredictwaittimeandqueuelength
§ Resultscanbeusedtocreatestaffschedulesinawaythatreducesinefficiencies,etc.− Discreteeventmodel:usedinspecialsituationswhenqueuingcannotbeused
§ Resultscanbeusedtoidentifybottlenecks,etc.− MonteCarlosimulations:usedtoidentifyprobableoutcomesofascenariobasedonmanypossibleoutcomes(usesrandomnumbergenerationandmanyiterationsofthescenario).§ Resultscanbeusedtopredictthelikelihoodofprofitabilitywithinthefirsttwoyears,etc.
pg 16© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Predictive
QueuingModelExample
pg 17© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Scenario1 Scenario2
Predictive
MonteCarloSimulationExample
pg 18© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Predictive
Regression
§ Regression− generallyspeaking,usedtounderstandthecorrelationofindependentanddependentvariables
pg 19© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
§ Typesofregressionmodels:− Logistic:usedforcategoricalvariables(i.e.,willcustomersshopatyourstoreoracompetitor?)
− Linear:usedtoidentifyalinearrelationshipbetweenthedependentvariableandatleastoneindependentvariables(i.e.,dailystorerevenuepredictedbythenumberofcustomersenteringthestore)
− Step-wise:usedtoidentifyarelationshipbetweendependent/independentvariables.Thisisdonebyadding/removingvariablesbasedonhowthosevariablesimpacttheoverallstrengthofthemodel.
Predictive
Classification&Clustering
§ Classification:usedtoassignobjectstooneofseveralcategories− Sentimentanalysisofsocialmediapostings
§ Clustering:anothermethodofforminggroups− Intragroupdifferencesareminimized− Intergroupdifferencesaremaximized− Commonlyusedtocreateandbetterunderstandcustomergroups
pg 20© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Predictive
www.firstsanfranciscopartners.com
AnalysisPrescriptive
§ Decisionscanbeformulatedfromdescriptiveandpredictiveanalysis− IfIneedtocutaproductandIknowthatproductCisleastpreferredandleastprofitable,IwillcutproductC.
§ However,prescriptiveanalyticsexplicitlytellyouthedecisionsthatshouldbemade.Thiscanbedoneusingavarietyoftechniques:− Linearprogramming− Integerprogramming− Mixedintegerprogramming− Nonlinearprogramming
pg 22© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Prescriptive DataAnalytics
LinearProgrammingExample
pg 23© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
ProductA ProductB ProductC ProductD ProductEQuantitytoOrderProfitperUnit 5$ 3$ 20$ 50$ 200$ TotalProfit -$
ProductA ProductB ProductC ProductD ProductE Used AvailableStorageSpace 0.05 0.5 1 5 10 1000SellingEffort 0.25 5 0.5 2 7 500MinimumOrder 100 15 20 60 5
ProductA ProductB ProductC ProductD ProductEQuantitytoOrder 100 15 490 60 5ProfitperUnit 5$ 3$ 20$ 50$ 200$ TotalProfit 14,345.00$
ProductA ProductB ProductC ProductD ProductE Used AvailableStorageSpace 0.05 0.5 1 5 10 852.5 1000SellingEffort 0.25 5 0.5 2 7 500 500MinimumOrder 100 15 20 60 5
Solution:
Prescriptive
ComparingtheThreeTypesofDataAnalytics
§ Descriptiveanalysisismostcommon.− Bestpracticetoperformdescriptive
analysespriortoprescriptive/predictive§ Understandthatdistribution,variance,skew,etc.,mayexcludecertainmodels
§ Howtoknowwhichtypeofanalysistopursue:− Howmuchtimedoyouhave?− Whatresourcesareavailabletoyou?
pg 24© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
− Howaccurateisyourdata?Howaccuratedoyouneedthemodel/analysistobe?
− Howpopular/acceptedisthemodelyouareconsidering?§ Don’tsubscribeto“that’showwe’vealwaysdoneit,”butremembertouseamodelthatstakeholderswillaccept.
KeyTakeawaysandSuggestedResources
§ Gainingmeaningfulinsightsfromdatarequiresplanning,technicalawarenessandconsistency.
§ Statisticalanalysisisn’tareplacementforyourownlogic(don’tgoonstatisticalautopilot).
§ Utilizeavailableresources(blogs,podcasts,articles,webinarsandonlinecourses)tolearnmore.− LookforAPPLIED statisticstopics
§ Bigdataisnotalwaysrequired.
§ Basicunderstandingofthestatisticalanalysisprocessgoesalongway!
pg 25© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Podcast:NotSoStandardDeviationshttps://soundcloud.com/nssd-podcast
Guide:WhenPredictiveModelsFailsearchdatamanagement.techtarget.com/ezine/Business-Information/When-predictive-analytics-models-produce-false-outcomes
Book:StatisticsinPlainEnglishTimothyC.Urdan
ClosingQ&A
pg 26© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Descriptive
Predictive
Prescriptive?
pg 27
Thankyou!SeeyouThursday,April6forournextDIAwebinar,
BuildingaFlexibleandScalableAnalyticsArchitecture
Catchourwebinarrecapnextweekhere:firstsanfranciscopartners.com/blog
JohnLadley@jladleyjohn@firstsanfranciscopartners.com
KelleO’Neal@kellezonealkelle@firstsanfranciscopartners.com
© 2016 First San Francisco Partners www.firstsanfranciscopartners.com
top related