application of data mining for identifying and predicting ...application of data mining for...

33
i Application of Data Mining for identifying and predicting room bookings Catarina Caldeira Maçaroco Window of Opportunity Project Work report presented as partial requirement for obtaining the Master’s degree in Information Management and Business Intelligence.

Upload: others

Post on 09-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

i

ApplicationofDataMiningforidentifyingandpredictingroombookings

CatarinaCaldeiraMaçaroco

WindowofOpportunity

ProjectWorkreportpresentedaspartialrequirementforobtainingtheMaster’sdegreeinInformationManagementandBusinessIntelligence.

Page 2: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

i

NOVAInformationManagementSchool

InstitutoSuperiordeEstatísticaeGestãodeInformaçãoUniversidadeNovadeLisboa

APPLICATIONOFDATAMININGFORIDENTIFYINGANDPREDICTING

WHICHEVENTSLEADTOROOMBOOKINGS–

WINDOWOFOPPORTUNITY

by

CatarinaMaçaroco

ProjectWorkpresentedaspartialrequirementforobtainingtheMaster’sdegreeinInformationManagement,withaspecializationinKnowledgeManagementandBusinessIntelligence

Advisor:RobertoHenriques,PhD

February2019

Page 3: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

ii

ABSTRACT

Thisstudyinvestigatesthesearchpatternsforpredictinghotelbookings.UsingExpedia’ssearchandpurchasedata,Iidentifiedtheuser’sbookingwindowandwhicheventshaveahighereffectonthebookinglikelihood.

Thetourismindustryhasseenexponentialgrowthoverthelasttwodecades,muchduetoglobalsocio-economicchanges,globalizationandinternetmassification.Portugalisfinallyreapingitsshareofprofitandcontinueswinning“bestdestination”prizesyearafteryear.Itisnowpartofaverycompetitiveecosystemwheredistributionplaysadeterminantrole inwhetherthetouristicproductsurvivesornot. Big players like Expedia and Booking.com have taken control of a big chunk of themarket’srevenue because they understood that the large amounts of data they have enabled predictingdemandhenceofferinghighlycompetitivedeals.

Only by understanding the booking drivers one can negotiate better distribution deals and lowercommissions through making better sales predictions and enhancing marketing and revenuestrategies,hencethepurposeofthisstudybeingtousedataminingtofindpatternsinroombookings,enablingthisindustrytobecomeanevenmoreimportantsourceforthecountry’sGDP.

Throughtheanalysisofconsumerbehaviorandbookingtimesandtheuseoftimeseriesanalysisandmachinelearning,itispossibletofindpatternsthatcanbeappliedacrossnotonlyHospitalitybutotherindustriesaswell.

By looking for the connections and the relevance of each feature on the final predictions, a newwindowofopportunitywillopenformarketingandsalesprofessionals.

KEYWORDS

Salestrends;Micromoments;Macromoments;ZeroMomentofTruth;BookingWindow

Page 4: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

iii

INDEX

1. Introduction.................................................................................................................1

1.1.Backgroundandproblemidentification...............................................................1

1.2.StudyRelevance...................................................................................................2

1.3.StudyObjectives...................................................................................................3

2. Literaturereview.........................................................................................................4

3. Methodology...............................................................................................................7

3.1.ProblemDefinition...............................................................................................8

3.2.DataCollection.....................................................................................................8

3.3.DataSamplingandCleaning.................................................................................8

3.4.Software...............................................................................................................8

3.5.Selectingandfittingmodels.................................................................................8

3.6.Validatingresults..................................................................................................93.7.Featureengineering.............................................................................................9

3.8.Models................................................................................................................10

4. Resultsanddiscussion...............................................................................................12

4.1.DescriptiveAnalysis............................................................................................12

4.2.PredictiveAnalysis..............................................................................................12

4.3.PrescriptiveAnalysis...........................................................................................15

5. Conclusions................................................................................................................16

6. Limitationsandrecommendationsforfutureworks.................................................17

7. Bibliography...............................................................................................................18

8. Appendix....................................................................................................................21

8.1.Appendix1–DescriptiveAnalysis......................................................................21

8.2.Appendix2–PredictiveAnalysis........................................................................24

8.3.Appendix3-Automation...................................................................................26

Page 5: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

iv

LISTOFFIGURES

Figure1-Workflowdiagram.....................................................................................................7

Figure2-Dataikuworkflow....................................................................................................11Figure3-Searchwindowperweekpercontinent.................................................................21

Figure4-Searchwindowwhenbookingwithpackageorwithoutpackage..........................21

Figure5-Searchwindowifbookingincludesweekendordoesnotincludeweekend..........22Figure6-Searchwindowwhenbookingwithchildren..........................................................22

Figure7-Numberofbookingsincludingchildren..................................................................23

Figure8-Numberofbookingspercontinent.........................................................................23Figure9-Variablesimportance..............................................................................................25

Figure10-Bookinglikelihood.................................................................................................25

Figure11-Automationworkflow...........................................................................................26

Page 6: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

v

LISTOFTABLES

Table1-Hyperparameters.......................................................................................................9

Table2-Cohorts’results.........................................................................................................13Table3-Algorithmdetails......................................................................................................14

Table4-Qualitymetrics.........................................................................................................14

Table5-VariablesExplanation...............................................................................................24Table6-Models'results.........................................................................................................24

Page 7: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

vi

LISTOFABBREVIATIONSANDACRONYMS

API ApplicationProgrammingInterface

ARIMA AutoregressiveIntegratedMovingAverage

AHRESP AssociaçãodaHotelariaRestauraçãoeSimilaresdePortugal

CPC Costperclick

CMS ContentManagementSystem

GDP GrossDomesticProduct

LSTM Longshort-termmemory

MCC MatthewsCorrelationCoefficient

OTA Onlinetravelagent

ROCAUC ReceiverOperatingCharacteristicAreaUndertheCurve

ROI Returnoninvestment

SETAR Self-ExcitingThresholdAutoregressive

XGBOOST ExtremeGradientBoosting

ZMOT Zeromomentoftruth

Page 8: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

1

1. INTRODUCTION

Competitionforroombookingshasneverbeenthisfierce.FromHotelstodestinationmanagementcompaniestoOTAs,everycompanyiseagertotakeitscutfromthesellingprice(KevinMay,2017).

This study aims to use popular machine learning models to predict likelihood to book and actaccordinglytothatprobability.

Throughfixedcommissions,costperacquisitionorfixedfees,lodgingandaccommodationbusinessesarecompletelydependentonthirdpartybookingagents.Refusingfrombeingpartofthesystemwouldleadtocompleteisolationinamarketthathasbeengrowingatarateof2,5%peryear(INE,2017)interms of number of Hotels and where investors are confident in continuously stronger revenues,where tourism now represents 10% of the PortugueseGDP (Ferreira, 2017), currently the highestsource.

Analysingthecustomerbehaviourandpredictingthebookingwindowisessentialtogainacompetitiveadvantage in amarketwhere advertising spent is reachingmonstrous levels, with companies likeBooking.comspending$3,5billioninPayperclicklastyear(KevinMay,2017).

XBBoost proved to be theModel that performed best and provided themost accurate predictionpower.Theresultssuggestthatbyaddingfurtherrelevantvariablesthemodel’spredictivepowerwillincrease. I was able of identifying the users that aremost likely and least likely to book, creatingdifferentsegmentswhichcanbeverypowerfultogenerateeffectivestrategies, impactingtherightuserswiththerightnudgesthatleadtomorebookings,hencemorerevenue.

Withtheidentificationoftheexactperiodsandvariablesthatleadtoabooking,theadvertisingspentcanbebetterallocatedandrevenuestrategiescanbecomemoreeffective(TrefisTeam,2016).

1.1. BACKGROUNDANDPROBLEMIDENTIFICATION

WiththeincreasingimportanceoftourismbothgloballyandespeciallyforPortugalanditsGDP,itisessentialtotakefulladvantageoftheemergingopportunities.

Thetourismandhospitalitymarketsarewidelydiverseandcomplexmainlybecausetheydealwiththe human need for emotions and experiences (Wu, Mattila, & Hanks, 2015). When looking foraccommodationforeitherbusinessorleisure,theindividualismovedbyasenseofurgency,aneedforrelaxation,asuddenrushforsomethingnew,justtonameafewdrivers.Thesedriversarewhatmarketersarethrivingtograsp,becausedoingsomeansthatonereachestherevenuesourcefirst,henceavoidingorgainingcommissions.

Eventsthatleadtobookingsrangefrommundaneeverydayeventssuchasfeelingtiredafteralongdayandinneedforanescape,orSundaylunchwithfamilyleadingtobookingatrip.Itcanalsobecausedbyextremeevents,liketheEurocup’sfinal,ChristmasorNewYears’Eve.Still,forecastingforextremeeventsisextremelydifficultduetotheirlackoffrequency,somethingthatUberisinvestedinsolving(Laptev,Smyl,&Shanmugam,2017).

Companiesareusingonlinemarketingtoreachtheaudienceandcaptivatetheusers’interestbeforeothers (Torres, Singh,&Robertson-Ring, 2015), yet themarket is fierce and competitive. Knowing

Page 9: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

2

whencustomersbookisachancetouseGuerrillamarketingeffectively,butititisextremelydifficultandtimeconsumingtoplacetherightadvertinfrontoftherightconsumer,attherighttime(Hudson&Thal,2013).

ConsideringthatBooking.comisspending$3,5billioninCostPerClick(CPC),andiscurrentlyoneofGoogleAdwords’biggestclient(KevinMay,2017),mostcertainlyithasbeenconductingthesestudiesforquiteawhile.Nevertheless,itisnotintheirinteresttosharethisknowledge,hencethereasonfortheidentificationofthisproblemasonethatcanbetackledindependentlyinthisstudy.

Multipleapproacheshavebeenexploredbyrevenuemanagers,throughtheexplorationofhistoricaldataand relating thiswithpastand futureevents takingplace in the locationsofbuyerandseller(Constantino,Fernandes,&Teixeira,2016),yetitisnotanexactsciencethanratheritisahunch.

Thisstudywilltrytogoastepfurtherandpullthedatathatdeliverstrueinsightsandvalue,inthiscase,customerbehaviour.Customerbehaviouranalytics iswhatdrovetraditionalmarketingtothesecond plan and brought digital marketing into play. Contrary to traditional marketing, digitalmarketing’scorevalueisthateverythingcanbemeasuredthusenablingamuchcomprehensiveviewoftheclientandefficientstrategiestogeneratevalue(JulieCave,2016).

Google and Facebook, for example, store gigantic amounts of data about every single individual(Bangwayo-Skeete&Skeete,2015), this is their leverage fordelivering therightadvertising, to therightcustomer,attherighttime,leadingtoconversionsandadvertisersrelianceontheireffectivenessduetothehighROIresults.

Itisnowuptoeachcompanytodetangletheinformationtheypossess(salesdata)correlateittocity’sopendataandfindthepatternsthatcanleadtocompetitiveadvantage(PatrickWhyte,2017).

Thefollowingvariableswillbeconsideredinthisstudy:

-Salesdata

-Consumerbehaviour

-Economicsituation(individual,localandglobal)

-Socio-economicevents

Applyingtimeseriesandmachinelearningtosalesforecastenablesfindingthepatternsthatcanbetranslatedintobookingwindows(Bangwayo-Skeete&Skeete,2015).

Therelevanceofthisstudyanditsmainobjectiveswillbefurtherdescribedinthefollowingchapters.

1.2. STUDYRELEVANCE

Thetourismindustrywillemploy10%ofthepopulationworldwidebytheyear2027.InPortugalnow,thisindustryrepresentsalmost1millionjobs,numberswhichareexpectedtogrowto1.034.000jobs,representing22,6%ofthetotalemployedpopulationby2027(JoanaNunesMateus,2017).

Representing17%ofthePortugueseGDPandanindustrywherealmostaquarterofthePortuguesepopulationrelyon,onecaneasilycalculatethatthecommonlycharged10-25%commissiononevery

Page 10: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

3

salestransactionthroughintermediariessuchasBooking.comandExpediafordistribution,isrevenueinasense“lost”andthatcouldbepartiallyreacquiredifproperdataexplorationwasdone.

It is therefore of great importance that independent studies start being developed regarding thissubject,onlythroughcompleteawarenessofthesalesdriverscanonestayafloatandminimisetheadvertisingandcommissions’expenditureandownback itspowerandstrategydefinition,withoutbeingatthemercyofthird-partyvendors

1.3. STUDYOBJECTIVES

Themainquestionofthisstudyisposedas:Isitpossibletopredictthebookingwindowsusingsearchandbookingdata?

Theanalysisismadeonsaleshistoricaldataandsearchtrends.Thesewillaimtoanswerthefollowingspecificobjectives:

-WhichtrendscanbefoundthroughthedataprovidedbyExpediathroughKaggle?

Data from the Kaggle competition concerning the time of booking and for which dates will beanalysedtofindpatternsthatwillthenbecorrelatedwithothervariables.

-Whenisthebesttimetoreleasemarketingcampaigns?

Bycrossingthebookingtimeswithspecificvariables/timesofday/dayoftheweek, it ispossibletodetectthebookingspecificdrivers,thatenableagoodpredictionoftherighttimetoimpacttheuser.

-Whichdataimprovesdemodel’spredictabilityperformance?

Variousdatafeatureswillbeanalysed,themodel’saccuracywillvarydependingonwhichfeaturesareusedandthiswillultimatelyhaveaneffectandleadtoreachingthestudy’sobjective.Thiswillonlybepossibletotestoncevariousmodelsarecreated,ranandanalysed.

-Whichcorrelationscanbefoundtoaugmentdata’svalue?

Aftercollectingallthedatanecessaryforthisstudy,acorrelationbetweenthedatafeatureswillbemadetoseeiftheeventdidinfluencetheroombookings.

-Whichmodelbetterfitsthedata?

Severalmodelswillbeappliedtothedatatodetectthemodelwiththebestpredictiveoutcome.

Page 11: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

4

2. LITERATUREREVIEW

Thereareseveralpublishedresearchpapersonthetopicofforecastingtourismdemandandtrends,theirtechniquesandapproacheswerestudiedandtestedtoenablethisstudy.

Thisstudy’sobjectiveistoevaluatetheforecastingperformanceofartificialneuralnetworksrelativeto different time series models using Expedia’s search and booking data from Kaggle’s 2013competition(Kaggle,2015).

TheTourismindustrycontributedUS$7.6trilliontotheglobaleconomy,10.2%ofglobalGDP(Misrahi&Crotti,2018).InPortugalthecontributionofTourismtothecountry’sGDPisof7,1%,thereisaclearopportunity for growth when compared with the global landscape (INE, 2017). Tourism is one oftoday’sfastestgrowingeconomicactivitieshencetourismdemandforecastingbecomingessentialtomonitorandpredicttourisminflux,revenueforecastingandbudgetallocation.TheresearchersSongand Lee found it crucial to improve the accuracy and performance of analysis methodsbyexperimentingwithnewapproaches(Chan,Witt,Lee,&Song,2009).

It isnotpossibletostocktheunfilledairlineseats,unoccupiedhotelrooms,orunusedconcerthallseats.Duetotheperishablenatureofthetourismindustry,theneedforaccurateforecastsiscrucial(Law&Au,1999).

ArelevantarticleforthisstudyisForecastingtourismdemandtoCatalonia:Neuralnetworksvs.timeseriesmodels(Claveria&Torra,2014),inwhichtimeseriesandartificialneuralnetwork(NN)modelsareusedtoextractpatternsandpredictiveresultsfromCatalonia’stourismdemand.Therehasbeenanincreasedinterestinmoreadvancedpredictivetechniquesfortourismdemand.Whichistiedwithtourism becoming an increasingly stronger global industry. The use of Artificial Intelligence (AI)techniquesfordataanalysishasbeengrowingduetotheneedformorereliableandaccurateforecastsof tourismdemand that candealwith increasingcomplexity.This ismainlybecauseAImodelsarebettercapableofdealingwithnonlinearbehavior,characteristicoftraveldata,inthiscase,bookingsdata.Still,whencomparingtheforecastingaccuracyofthedifferentmodels,AutoregressiveIntegratedMoving Average (ARIMA) outperformed Self-Exciting Threshold Autoregressive (SETAR) and ANNmodels,especiallyforshortertimespans.Theoriginaldatasetpre-processingmaybethereasonfortheseresults,wheretherewasinformationlosswhenaccountingforthepresenceofseasonalityandeliminatingoutliers,whichleadstoaloweraccuracyofneuralnetworkforecasts.Neuralnetworkscanbeimprovedthroughstructureoptimization,addinglayersandmemoryvalues,hencefutureresearchneeds to consider whether the implementation of optimised neural networks and advances ondynamicnetworksdoimprovetourismdemandforecasting.

Inthepastfewyearsduetonewandmoreadvancedforecastingtechniques,suchasNeuralnetworksandGradientBoosting,and theneed formoreaccuratemetricsof tourismdemandthe interest inArtificial Intelligence (AI) and experimentation with those same techniques grew, mainly becauseoftheircapabilityofhandlingnonlinearbehaviour(Pai,Hung,&Lin,2014).

Theincreasingavailabilityoftechnologyatlowercostsenablesaswelltheever-wideradoptionandexperimentation.Technologycompanies,e.g.Google,Facebook,AmazonandUberarenowprovidingopensourcesoftwarethatenablesdevelopersanddatascientiststofurtherexplorethecapabilitiesofArtificialIntelligence(AI)andMachineLearning(ML),thisleadstohigherinterestincreatingbetter

Page 12: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

5

predictive solutions (Mukherjee & Lakshmanan, 2017). With cloud computing, it is possible tohave efficient energetic usage and ease in scaling and cost savings (Zhang, 2016). Upfront costcommitmentismuchlowernowwhencomparedtoon-premisessolutions.

Whenconsideringopendata,onestudyanalyzedGoogle’sTrendsdataandexaminedtheusefulnessof “hotels” “flights” and “destination country” search indicators andmeasured inwhat extent thesearchqueriesdataimprovedtheARandtheSARIMAmethodspredictingovernighttouristarrivals.ThetwelvemonthforecastresultsrevealthatAR-MIDASmodelsgavesuperiorpredictionstoARandSARIMA time seriesmodels in terms of the RootMean Squared Error (RMSE) andMeanAbsolutePercentError(MAPE)forecastingcriteria.

Googlequerysearchdatacanbeusedtoaccuratelyprojectfuturetouristarrivalsoverayear'shorizon.Thisstudycontributedtothegrowinginterestinforecastingusingwebtrafficdatamakingitrelevanttootherindustriesthatcanbenefitfromtheanalysisofwebsearchvolumehistoriestopredictusefultrends(Bangwayo-Skeete&Skeete,2015).

In2014astudylookedintoGoogle’sSearchDataandexaminedtheusefulnessofsearchindicatorssuchas“hotels”and“flights”andtesteditsimpactonthesimpleAutoregreessivemethod(AR)andtheSeasonalAutoregressiveIntegratedMovingAverage(SARIMA)(Bangwayo-Skeete&Skeete,2015).

The Tourism forecast combination using the CUSUM technique article (Chan et al., 2009)demonstratedthatthereisnosinglemethodthatoutperformedothersinforecastingaccuracy,itwasthecombinationofmethodsthatproducedbetterresults.

ThisleadtotheincreasedinterestinNNwhichperformedbetterthattimeseriesmethods,especiallyduetoitscapacitytodealwithnonlinearrelationshipbetweenpredictorsandpredictedvariables.

A 2014 study tested the accuracy of neural network ensemble prediction when compared withtraditionalmachine learningmethods and traditionalmathematical statisticsmethods for studyingChina’s inbound tourism market in which neural network ensemble had clear better predictivecapabilities(Bo&Shi-Ting,2014).

TheNNscapacitytoemulatethehumanbrainto identifypatterns inhistoricaldataandlearnfromexperiencetocapturefunctionalrelationshipsamongdatawhentheunderlyingprocessisunknown(Claveria,Monte,&Torra,2015)alsoleadstoitsshortcomingsofbeingofpoorcomprehensibilitydueits“blackboxmodel”,helpfulforpredictingresultsbut lackingincomprehendingthenatureofthestudyinquestionduetonotallowinginterpretabilityoftheoutcomesandcoefficients.

In2015’sstudy“Commontrendsininternationaltourismdemand:Aretheyusefultoimprovetourismpredictions?”(Claveriaetal.,2015)researchersmodelledtourismdemandincorporatingthecommontrends in international tourist arrivals from all visitor markets to a specific destinationandanalyzedwhethertheapproachallowedimprovingtheforecastingperformanceofNNmodels.Theyused threeNNs: themulti-layerperceptronnetwork (MLP), the radialbasis functionnetwork(RBF)andtheElmannetwork.

In the study “Univariateversusmultivariate time series forecasting: anapplication to internationaltourismdemand”(duPreez&Witt,2003)itwasfoundthatunivariatetimeseriesmodelswerenotsurpassedbymultivariatetimeseriesmodelsformoreaccurateforecasting.

Page 13: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

6

Experimentalresultsdemonstratedthattheforecastingefficiencyofaneuralnetworkissuperiortothat of multiple regression, naive, moving average and exponential smoothing. This indicates thefeasibilityofapplyinganeuralnetworkmodeltopracticalinternationaltourismdemandforecasting(Law&Au,1999).

Uber, a global car-hailing andmobility technology company based in San Francisco, United Statesconductedastudytocreateamodel for forecastingextremeevents.Theyusedthousandsof timeseriesbasedonLongShort-TermMemory(LSTM)totrainamulti-moduleneuralnetwork.TheychoseLSTMduetoitscapacityofmodellingcomplexnonlinearfeatureinteractionsthroughworkingwithlarge amounts of data across numerous dimensions and use of external variables and automaticfeatureextraction(Laptevetal.,2017).

AI can identify patternsor irregularities thatwouldotherwise stayhidden.MLhas the capacity ofspotting opportunities that can make the difference. Its value is exponentially increased whencombinedwithhumananalysis,thatiswheninsightandvisionenabledatatotakeformandtranslateintoactionablestrategies.

Page 14: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

7

3. METHODOLOGY

ForthisstudyIdevelopedthefollowingmethodologyanddiagram,illustratedinfigure1.Itdeliversaforecastingmethodthatwillenableaccuratepredictionsbasedonpastandfutureeventsaffectingroombookings.Themethodologyiscomposedofthefollowingeightstages:

§ Objectives

§ Datasources

§ DescriptiveAnalysis

§ FeatureEngineering

§ Modelling

§ ModelsFineTuning(ResultsandRevalidation)

§ ResultsAnalysis

§ Automationadvice

Figure1-Workflowdiagram

Page 15: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

8

3.1. PROBLEMDEFINITION

Toconducteffectiveforecasting,oneneedstoensuretheproblemdefinitionhasbeenfullyexploredanddefined.Theproblemdefinitiondescribedintheprevioussection,2.1,servesasacompassforthefollowingstepsandthesuccessfulevaluationofthemethodologyandtheobtainedresults.

3.2. DATASOURCES

Thissectionprovidesanoverviewofthedata,thepre-processingandcleaningstepstaken,aswellasfeatureselectionandengineering.

DatawasretrievedfromKaggle’sExpediacompetition(Kaggle,2015).Thedatacoversusers’searchandbookingdatafrom2013and2014,620440records.Thisincludesbothclickandbookingevents.

The competition goal was to predict booking outcomes for a user event, based on their searchpatterns.

Thisdatawasselectedduetotherelevantfeaturesthatenabletheuseofdifferentmodelsandproperprediction outcomes. The dataset contains features that provide general information about theusersuchasID,users’continentwhenbookingandthedestinationofthebooking.Italsoshowstheusers’behaviourwithcheckinandcheckoutdatesandtimeofbooking.Aswellasifatthetimeofbooking therewasor not a promotionbeingdisplayed. The variables explanation canbe found inAppendix2,table5.

3.3. DATASAMPLINGANDCLEANING

Fromthe largesetofdatacollectedfromKaggle’scompetition,only8,4%wereactualbookings,toovercomethisissueIrebalancedthesampletoincreasetheinstanceswheretherewasabooking.

Iremovedtherowswithmissingvaluesinorigin-destinationdistance,whichamountedto36,8%ofthedata.Aftercleaningthemissingvalues,Iwasleftwith409602rowsofdata,enoughforthisproject.Missingvaluescancomplicate theanalysisof thestudy. I chose todropthedata rowsrather thanimpute valuesbecause it accounted to36,8%of thedata,which is ahighpercentageof values toimputeandcouldinthefutureformulatewrongpredictions(Kang,2013).

3.4. SOFTWARE

Dataikuversion4.1wasthesoftwareusedinthisstudy.ItenablestheuseofMachineLearninginaclearmanner.IwasabletobuildandoptimizethemodelsinPythonwhichallowsforseamlessfutureintegrationsthroughanAPItoexternallibraries.

Dataiku enables the creation, training and deployment of advanced custom Machine LearningModelsthroughtheuseofPythonorR.ForthisstudyIchosePython.

3.5. SELECTINGANDFITTINGMODELS

Different approacheswere tested to be able to deal effectivelywith the different data types anddimensions. Flexibility and scalability are essential for this study andmodel developmentwhich ispossiblewiththeuseofDataiku.

Page 16: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

9

InthisstudyItrainedthedatawithLogisticRegression,RandomForest,XGBoostandNeuralNetworks.

Ichosethesemodelsduetotheircharacteristics,asfollows;LogisticRegression’soutcomesenableinterpretationwhichwasessentialtodeterminewhichvariablesaremostrelevantforthepurposeofthisstudy.RandomForestisaneasytousemodelwhichiscapableofdealingwithmultiplevariablesinaneffectiveway.Itiswidelyusedduetoitssimplicityandthefactthatitcanbeusedforclassificationandregressiontasks.NeuralNetworksarecomplexuninterpretablemodelsthathavegainedtractioninrecentyearsduetotheircapacitywithdealingwithmultiplevariablesandprovideveryaccuratepredictionswithverylargedatasets.XGBoostisdesignedforspeedandperformanceandoutperformsvariousmodelsinitspredictivecapabilities.IthasbeenusedinseveralKagglecompetitionsproducinggreat results andwinning several competitions. XGBoostwas engineered to have amore efficientcomputingtimeandusememoryresources inanoptimalway. Inrecentyears,datascientistshaveusedthesemodelstosuccessfullypredictoutcomessimilartothoseofthisstudy(SunilRay,2017).Amorein-depthexplanationofeachmodelcanbefoundinchapter3.8.

Thedatasetwasdividedinto3cohorts,1,2andacohortwithallthefeatures,eachcohorthasavaryingnumberoffeatures,refertoTable2inResultsandDiscussion.Iwantedtofindifaddingmorefeaturesinthetrainingwouldimpactthemodels’results.

3.6. VALIDATINGRESULTS

Oncethemodelsweretestedandresultsretrieved,thesewillneedtoberevalidatedtoensurethatthedevelopedmodelsareaccurateandcanbetestedwithnewsetsofdata,continuouslyproducingqualityresults.Seeingiftheresultscanbegeneralized.Iusedtheseparatesubsetofthedatasettovalidate if themodelwouldoverfitwiththedata I trained itwith.RefertoTable1 forthemodels’hyperparameters.

Table1-Hyperparameters

ForthisprojecttheassessmentmetricwasROCAUCsinceIwantedtohavepredictionsoptimisedfortrue positives versus false positives. It enables seeing how the model performs at categorisingoutcomes.

3.7. FEATUREENGINEERING

Somefeaturescombinedwithotherscanprovideinsightsintothedatatheyaimtorepresent.

Timeofsearchandtimeofbookingwereparsedinordertoprovideaclearerunderstandingofsearchandbookingtimes.

Page 17: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

10

Date_timedatawasparsedandthedatecomponentswereextractedintoYear,Month,Day,DayofWeekandHour.Inordertoseeifthereisatrendinmonthordayoftheweek.

Thesameprocesswasdoneforsrch_ciandsrch_co,check-inandcheck-outdates. Icomputedthetimedifferencebetweendate_timeandcheck-inandthedifferencebetweencheck-inandcheck-outdates.BinnedthesearchMonthsintoQuartersandconductedthesameprocessforthecheck-inandcheck-outmonths.

Also,binnedthesearchhourin4binswith6hourseach.Midnightto6am,6amto12pm,12pmto6pmand6pmtomidnight.

The target searchmonthwas extracted from the search date and the frombookingwindow. Thisfeaturecanproduceinsightsonhowcertainpropertiesmightbemoredesirablethanothersincertainmonths. This would enable confirming if a property in for example, Continent 3, would bemoresearchedforinJanuaryforstaysinSeptember,enablingthenbetterpredictions.

3.8. MODELS

ThemodelsusedinDataikuwere;RandomForest,LogisticRegression,XGBoostandArtificialNeuralNetwork.

Startingwith the LogisticRegression, it is a classificationalgorithm thatuses a linearmodelwhichcomputes thetarget featureasa linearcombinationof input feature. It ispronetooverfittingandsensitive toerrors in the inputdataset. Still, theuseofa simple linearalgorithmcanbehelpful inexploringthedataandreachinginsightsregardingthedata’sunderlyingstructure.

InadditiontothislinearmodelIselected3non-linearmodelstobettercapturethecomplexityofthedata,whichwere:

RandomForest,composedofmanydecisiontrees.Whereeachtreepredictsanoutcome,affectingthefinalansweroftheforest.Itisanensemblelearningmethodforclassification,regressionandothertasks.ByusingRandomForestIavoidedtheoverfittingofdecisiontrees,whichispossiblebyhavingarandomelementthatenablesthatalltreesintheforestarenot identical. It lacksexplainabilitybutgenerallyprovidesgood results. It candealwith themultivariatedataandbyaveragingacross thebookingprobabilities,shouldhelppreventover-fittingbyindividualtrees(NiklasDonges,2018).

XGBoost(eXtremeGradientBoostingmethod),anadvancedgradienttreeboostingalgorithm,whichusesparallelprocessing,regularization(thathelpspreventoverfitting)andearlystopping,makesitafast,scalableandaccuratealgorithm.Beinganensemblelearningmethoditcombinesthepredictivepowerofmultiplelearners.Theresultisasinglemodelwhichgivestheaggregatedoutputfromseveralmodels. The models forming the ensemble can be either from the same or different learningalgorithms.Still,theyhavebeenmostlyusedwithdecisiontrees.Alltheadditivelearnersinboostingaremodelledaftertheresidualerrorsateachstep.Theboostinglearnersmakeuseofthepatternsinresidual errors. At the stage where maximum accuracy is reached by boosting, the residuals arerandomlydistributedwithoutanypattern(RamyaBhaskarSundaram,2018).

Ithasbeenusedinreal-worldproductionpipelinesforadclick-throughratepredictionandprovidedstate-of-the-artresultswithvariousotherproblemssuchasstoresalesprediction;customerbehaviourpredictionandwebtextclassification(Chen&Guestrin,n.d.).

Page 18: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

11

ArtificialNeuralNetworks are inspiredby the functioningof neurons, consistentof several hiddenlayersofneuronswhichreceiveinputsandtransmittheseintothefollowinglayer.Itcandealwithnon-linearityallowingforcomplexdecisionfunctions. It lacks interpretabilityoffeatures importanceforthemodel’spredictiveoutcome.

Estimating feature importanceandmodel interpretability ingeneral isanareawhereHaldaretal.,took a step back with the move to NNs. Estimating feature importance is crucialin prioritizing engineering effort and guiding model iterations.The strength of NNs is in figuring out nonlinear interactions between the features. It is also theweakness when it comes to understanding what role aparticular feature is playing as nonlinear interactionsmake it very difficult to study any feature inisolation(Haldaretal.,n.d.).

Thetargetvariablewasifusersconductedabookingornot.

Allfeatureswereinputtedinthemodelexceptdate_time,user_id,srch_ciandsrch_co,sincethesehave direct correlation with the engineered features created which ultimately provide higherpredictionvalueandavoidoverfitting.

IoptimizedthemodelsforROCAUC,whichshowsmetheperformanceofthemodelsandthetruepositivesagainstthefalsepositivesrate.

TheDataikuflowshowninFigure2illustratesthestepstakeninthedataanalysisandmodellingstepsofthestudy.Thedatawasimported,cleanedandthendividedintotestandtraining.Thetrainingsetwastrainedwiththevariousvariablecohortsanddifferentmodels.

Finallytheresultsweredividedintwotoensurethattherewasnooverfittinginthedata.

Figure2-Dataikuworkflow

Page 19: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

12

4. RESULTSANDDISCUSSION

4.1. DESCRIPTIVEANALYSIS

By considering the Booking windows between continents, there are clear differences betweencontinents. Considering continents 4 and 1, the booking window has significantvariancethroughouttheyear,butamuchsoftervarianceforcontinents0,2and3.

For continent 4, week 3 has the highest average bookingwindowwith 108 days. And the lowestbookingwindowforweek26withbookingsbeingmade82daysinadvance.Refertoappendix1,figure3.

TheseresultscanprovidehighlyvaluableactionableinsightstoanycompanyadvertisingforHospitalityorHospitalityrelatedproductsandservices.

Theaveragesearchwindowwidelyvarieswhenconsideringthebookingsthat includeapackageornot.Whenincludingapackagetheaveragebookingwindowisof75,8daysagainst44,6dayswithoutapackage.Refertoappendix1,figure4.

There is notmuchdifference in the searchwindowwhen thebooking includes aweekendornot,rangingfrom50to54daysinbothscenarios.Appendix1,figure5.

Whenbookingwithchildrenthebookingwindowdrasticallychangesthemorechildrenyouhave,thereisaclearpatternwithinthedata.Ifbookingwithoutchildrentheaveragebookingwindowisof50,66days,whilewhenthebookingincludesachildtheaveragegoesupto54,87andgoingupto65,43dayswhenthebookingincludes3children.

Still,thesenumbersmaynothavestatisticalevidenceduetothescarcenumberofbookingswithmorechildren.78,5%ofbookingsaremadewith0children,11%withonechild,8,6%with2andIcouldseeasteepdeclineinbookingswith3children,representingonly1,4%ofthetotalbookinginthedataset.Refertoappendix1,figure6and7.

Mostbookingsaremadeforcontinent2,with63,9%ofbookings.Iassumethereforethatcontinent2is likely to be North America, due to the population volume and high numbers of internal travel.Appendix1,figure8.

Thesenumbersalsoreflectthehighermarketpenetrationinforcontinent2ratherthantheremainingcontinents.

4.2. PREDICTIVEANALYSIS

I ran threedifferent feature cohorts through themodels, addingmore features toevery cohort, itprovedtobethatthecohortwithallthefeatureshadthebestperformance.

Page 20: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

13

Table2-Cohorts’results

WhatIfoundisthatsomemodelsareconsistentlybetterthanothersinpredictinglikelihoodtobook.XGBoost continuously outperforms the other models. Also, the more variables I added the moreaccuratepredictionsthemodelswouldbeabletoachieve.Allresultscanbefoundinappendix2table6.

PleaserefertoTable3forthealgorithm’sdetails.

Page 21: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

14

Table3-Algorithmdetails

Intermsofvariablesimportanceinpredictivepower,hotelcluster,hotelmarket,searcheddestination,if packageornot, searchdurationandorigin-destinationdistanceproved tobe themost relevant.Pleaserefertoappendix2,figure9.

Itwaspossibletoidentifyandbinwhichusersaremostlikelytobook,mediumandleastlikelytobook.Refertoappendix2figure10.Thisallowssegmentationofnewcustomerswhichcanleadtodifferentstrategiesforeachsegment,enablingmoreprecisetargetingandbettercampaignsand/orpromotionsperformance.

AsIshowbelowthemodelwasgeneralenoughtoobtaingoodpredictions,refertoTable4.TheROCAUCwas0,7833.

Table4-Qualitymetrics

Page 22: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

15

4.3. PRESCRIPTIVEANALYSIS

AnextstepwouldbetoimplementanautomationworkflowusingintegrationsbetweenDataiku,theexistingdatabaseandtheContentManagementSystem(CMS)asillustratedinAppendix3,figure11.

TheseintegrationsarepossiblethroughZapier,aweb-basedservicethatenablesseamlessautomationworkflowsbetweendifferentapplications,itcanbedescribedasatranslatorbetweenwebAPIs.

It ispossibletoapplythemodeldevelopedwithpastdataandfeeditwithnewdatatopredictthelikelihoodofacustomerbooking.

Thisultimately leadstoreal-timewebsiteupdatesshowingthetargetedpromotionsor informationthatleadtheconsumertobuy.Andultimatelybettermarketingstrategies.

Page 23: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

16

5. CONCLUSIONS

Thisprojecthasoutlinedtheprocessofbuildingamodeltopredicthotelbookings.TheanalyzeddataincludedsearchpatternsandbookingsfromExpedia’sdatasetduringtheperiod2013-2014.

I foundthatwecanpredictwhensomeonewouldbookornot,basedonsearchpatternsandthatvariablessuchasthenumberofchildren,orthelocationwherethebookingisbeingmadehaveanimpactonthebookingwindow.

I was capable of identifying the users that aremost likely and least likely to book, creating threedifferentsegmentswhichareverypowerfultogenerateeffectivestrategiestoimpacttherightuserswiththerightnudgesthatwillleadtomorebookings,hencemorerevenue.

Thisultimatelyallowsmarketersandhospitalityprofessionalsorofanyactivityrelatedtohospitalityto stir theirmarketing efforts inmore effectiveways, creating the right promotions ormarketing‘nudges’atthemostrelevanttimeinthesearchandbookingfunnel.

Featureselectingprovedtobeextremelyimportantforreachinggoodresults,asincludingafullsetoffeaturescancreatetoomuchnoiseandmakeitdifficulttofindunderlyingpatternsthatwillmakethemodel better. This was not the case for this project, where including the whole set of variablesproducedthebestmodeloutcomes.

XGBoostprovedtobethemosteffectivePredictiveModel,withthehighestROCAUC.Hencethemodelmostsuitedforthisanalysisandtrainingoffuturefeaturecohortsforsearchandbookingdata.

The objectives of this study were different from the objectives established within the Kagglecompetitionforthisdataset.Thisstudy’stargetwastopredictthelikelihoodofbooking,whilstthecompetitionwascreatedtodeterminethelikelihoodofauserstayingat1ofthe100hotelgroups.

This dataset was selected due to having the relevant attributes for this study’s purpose. Kaggles’winningmodelsareveryspecificandtailoredforthecompetitionandwhenpresentedwithothersetsofdatadonotperformaswell.

HotelsandHospitalityprofessionalsfrommultipleindustrieshavecompleteaccesstothisdata,thecomputationalpoweravailabletodayandthevariousfreetoolinganddatasetsavailableprovideallthenecessaryinstrumentstoconductmeaningfulanalysisthatcanpushhospitalityrelatedcompaniestotheforefront.Itallcomesdowntopropertimeallocationandone’swillingnesstotestandplaywiththeavailableresources.

Page 24: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

17

6. LIMITATIONSANDRECOMMENDATIONSFORFUTUREWORKS

Futureworkscouldconsidercombiningexternalvariablesandextremeeventsandcrossingthesewiththedataathand,findingcorrelationswiththese,thatwillleadtobetterpredictivepower.

Automationofthemodel’spredictiveoutcomesshouldalsobeconsideredandfurtherexplainedinafutureproject.Allowingfordata-orientedmarketingstrategiesthatcansurelyoutperformtraditionalmarketingefforts.

Becausedatawasanonymised,Icouldnotseetheactualcontinentsandcountrieswherethebookingsweretakingplace.Thepriceswerealsounavailable.Meaningthatthedatahadpredictivepowerbutlackedinterpretabilitythatwouldenableactingupontheresults.

Infutureworks,amorecompletedataset,withconcretevalueswouldenablemuchbetterpredictionsandactionableoutcomes.Beingable to see theeffectofprice inbookingpatterns andhow smallvariationsinpricecaninfluencethepredictivepowerofthemodels.

It isalsoreasonabletoconsiderthatweatherconditionsanda location’seconomic/politicaland/orenvironmentalstabilityplayanimportantroleindeterminingpropensitytobookacertainpropertyonacertainlocation,leadingtochangeintourismdemand.Thesearefactorsthatdynamicallychangeinacontinuousway,henceamajorchallengewouldbeprovidinganacceptablemeasurementforthesefactorsinordertoincludetheminfuturepredictivemodelling.

Futurestudiesshouldalsoconsiderimplementingpsychographicstraitsandpersonastoprovidebetterpromotionstotheconsumersmostlikelytobuy.Offeringdifferentproductsanddifferentpackagesdependingontheusers’levelofextraversionorconsciousness,twotraitsthatprovedtobeaccuratepredictorsofuserspropensitytobuycertainproducts(Rauschnabel,Brem,&Ivens,2015).

Using recommender systems can also be highly valuable to accurately predict users’ purchasingbehaviour,henceshouldbeincludedinfuturepredictionstudies.

Page 25: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

18

7. BIBLIOGRAPHY

Bangwayo-Skeete,P.F.,&Skeete,R.W.(2015).CanGoogledataimprovetheforecastingperformanceoftouristarrivals?Mixed-datasamplingapproach.TourismManagement,46,454–464.https://doi.org/10.1016/J.TOURMAN.2014.07.014

Bo,X.,&Shi-Ting,L.(2014).20147thInternationalConferenceonIntelligentComputationTechnologyandAutomation,IntelligentComputationTechnologyandAutomation(ICICTA),20147thInternationalConferenceon,IntelligentComputationTechnologyandAutomation,InternationalConf.https://doi.org/10.1109/ICICTA.2014.91

Chan,C.K.,Witt,S.F.,Lee,Y.C.E.,&Song,H.(2009).TourismforecastcombinationusingtheCUSUMtechnique.TourismManagement,31(6),891–897.https://doi.org/10.1016/j.tourman.2009.10.004

Chen,T.,&Guestrin,C.(n.d.).XGBoost:AScalableTreeBoostingSystem.Retrievedfromhttps://github.com/dmlc/xgboost

Claveria,O.,Monte,E.,&Torra,S.(2015).Commontrendsininternationaltourismdemand:Aretheyusefultoimprovetourismpredictions?TourismManagementPerspectives,16,116–122.https://doi.org/10.1016/J.TMP.2015.07.013

Claveria,O.,&Torra,S.(2014).ForecastingtourismdemandtoCatalonia:Neuralnetworksvs.timeseriesmodels.EconomicModelling,36,220–228.https://doi.org/10.1016/J.ECONMOD.2013.09.024

Constantino,H.,Fernandes,P.O.,&Teixeira,J.P.(2016).ModelaçãodaProcuraTurísticaparaMoçambiqueIVCongressoInternacionaldeTurismodaESG/IPCATourismforthe21stCenturyModelaçãodaProcuraTurísticaparaMoçambiqueJoãoPauloTeixeira,(July).

duPreez,J.,&Witt,S.F.(2003).Univariateversusmultivariatetimeseriesforecasting:anapplicationtointernationaltourismdemand.InternationalJournalofForecasting,19(3),435–451.https://doi.org/10.1016/S0169-2070(02)00057-2

Ferreira,A.(2017).Expresso|Turismo.Portugaléo14.omaiscompetitivodomundo.RetrievedJune13,2017,fromhttp://expresso.sapo.pt/economia/2017-04-06-Turismo.-Portugal-e-o-14.-mais-competitivo-do-mundo

Haldar,M.,Abdool,M.,Ramanathan,P.,Xu,T.,Yang,S.,Duan,H.,…Legrand,D.(n.d.).ApplyingDeepLearningToAirbnbSearch.Retrievedfromhttps://arxiv.org/pdf/1810.09591.pdf

Hudson,S.,&Thal,K.(2013).TheImpactofSocialMediaontheConsumerDecisionProcess:ImplicationsforTourismMarketing.JournalofTravel&TourismMarketing.https://doi.org/10.1080/10548408.2013.751276

INE.(2017).PortaldoInstitutoNacionaldeEstatística.RetrievedJune13,2017,fromhttps://www.ine.pt/xportal/xmain?xpgid=ine_main&xpid=INE

JoanaNunesMateus.(2017).UmQuartodosPortuguesesVaiTrabalharParaoTurismo|ExpressoEmprego.RetrievedJuly4,2017,fromhttp://portugalarecrutar.expressoemprego.pt/noticias/um-quarto-dos-portugueses-vai-trabalhar-para-o-turismo/4353

JulieCave.(2016).DigitalMarketingVs.TraditionalMarketing:WhichOneIsBetter?-Digital

Page 26: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

19

Doughnut.RetrievedJuly10,2017,fromhttps://www.digitaldoughnut.com/articles/2016/july/digital-marketing-vs-traditional-marketing

Kaggle.(2015).ExpediaHotelRecommendations.Retrievedfromhttps://www.kaggle.com/c/expedia-hotel-recommendations

Kang,H.(2013).Thepreventionandhandlingofthemissingdata.KoreanJournalofAnesthesiology,64(5),402–6.https://doi.org/10.4097/kjae.2013.64.5.402

KevinMay.(2017).Googlecanrejoice:PricelineGroupspent$3.5billiononPPCin2016.RetrievedJune1,2017,fromhttps://www.tnooz.com/article/priceline-group-3-5-billion-advertising-2016/

Laptev,N.,Smyl,S.,&Shanmugam,S.(2017).EngineeringExtremeEventForecastingatUberwithRecurrentNeuralNetworks-UberEngineeringBlog.RetrievedJune18,2017,fromhttps://eng.uber.com/neural-networks/

Law,R.,&Au,N.(1999).AneuralnetworkmodeltoforecastJapanesedemandfortraveltoHongKong.Retrievedfromhttp://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.13.4253&rep=rep1&type=pdf

Misrahi,T.,&Crotti,R.(2018).TheTravel&TourismCompetitivenessReport2017Pavingthewayforamoresustainableandinclusivefuture.

Mukherjee,S.,&Lakshmanan,L.(2017).GoogleCloudprovidesaunified,streamlinedwaytoexecuteyourMLstrategy|GoogleCloudBigDataandMachineLearningBlog|GoogleCloudPlatform.RetrievedFebruary2,2018,fromhttps://cloud.google.com/blog/big-data/2017/11/google-cloud-provides-a-unified-streamlined-way-to-execute-your-ml-strategy

NiklasDonges.(2018).TheRandomForestAlgorithm.RetrievedJanuary3,2019,fromhttps://towardsdatascience.com/the-random-forest-algorithm-d457d499ffcd

Pai,P.-F.,Hung,K.-C.,&Lin,K.-P.(2014).Tourismdemandforecastingusingnovelhybridsystem.ExpertSystemswithApplications,41(8),3691–3702.https://doi.org/10.1016/J.ESWA.2013.12.007

PatrickWhyte.(2017).SmartCitiesNeedOpenDataandaWillingnesstoTestandLearn.RetrievedJune20,2017,fromhttps://skift.com/2017/06/15/smart-cities-need-open-data-and-a-willingness-to-test-and-learn/?utm_campaign=SkiftWeeklyReviewNewsletter&utm_source=hs_email&utm_medium=email&utm_content=53244701&_hsenc=p2ANqtz--3Fu3wJdXe_S4rGD8KhVQjb-vWPMvXMODGF9d-

RamyaBhaskarSundaram.(2018).UnderstandingtheMathbehindtheXGBoostAlgorithm.RetrievedJanuary6,2019,fromhttps://www.analyticsvidhya.com/blog/2018/09/an-end-to-end-guide-to-understand-the-math-behind-xgboost/

Rauschnabel,P.A.,Brem,A.,&Ivens,B.S.(2015).Whowillbuysmartglasses?Empiricalresultsoftwopre-market-entrystudiesontheroleofpersonalityinindividualawarenessandintendedadoptionofGoogleGlasswearables.ComputersinHumanBehavior,49(May),635–647.https://doi.org/10.1016/j.chb.2015.03.003

SunilRay.(2017).EssentialsofMachineLearningAlgorithms(withPythonandRCodes).RetrievedJanuary6,2019,fromhttps://www.analyticsvidhya.com/blog/2017/09/common-machine-learning-algorithms/

Page 27: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

20

Torres,E.N.,Singh,D.,&Robertson-Ring,A.(2015).Consumerreviewsandthecreationofbookingtransactionvalue:Lessonsfromthehotelindustry.InternationalJournalofHospitalityManagement.https://doi.org/10.1016/j.ijhm.2015.07.012

TrefisTeam.(2016).HereAreTheKeyGrowthAreasForPriceline’sBooking.com-Nasdaq.com.RetrievedJuly12,2017,fromhttp://www.nasdaq.com/g00/article/here-are-the-key-growth-areas-for-pricelines-bookingcom-cm723530?i10c.referrer=https%3A%2F%2Fwww.google.pt%2F

Wu,L.,Mattila,A.S.,&Hanks,L.(2015).Investigatingtheimpactofsurpriserewardsonconsumerresponses.InternationalJournalofHospitalityManagement,50,27–35.https://doi.org/10.1016/j.ijhm.2015.07.004

Zhang,L.(2016).Pricetrendsforcloudcomputingservices.WellesleyCollege.Retrievedfromhttp://repository.wellesley.edu/thesiscollection/386

Page 28: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

21

8. APPENDIX

8.1. APPENDIX1–DESCRIPTIVEANALYSIS

Figure3-Searchwindowperweekpercontinent

Figure4-Searchwindowwhenbookingwithpackageorwithoutpackage

Page 29: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

22

Figure5-Searchwindowifbookingincludesweekendordoesnotincludeweekend

Figure6-Searchwindowwhenbookingwithchildren

Page 30: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

23

Figure7-Numberofbookingsincludingchildren

Figure8-Numberofbookingspercontinent

Page 31: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

24

8.2. APPENDIX2–PREDICTIVEANALYSIS

Table5-VariablesExplanation

Table6-Models'results

Page 32: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

25

Figure9-Variablesimportance

Figure10-Bookinglikelihood

Page 33: Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION

26

8.3. APPENDIX3-AUTOMATION

Figure11-Automationworkflow