consumer data research centre masters research ... · sainsbury plc (sainsbury’s) and was...

17
Masters Research Dissertation Programme Case Studies 2016 CONSUMER DATA RESEARCH CENTRE www.cdrc.ac.uk Edited by Guy Lansley 7 October 2016

Upload: others

Post on 27-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CONSUMER DATA RESEARCH CENTRE Masters Research ... · Sainsbury plc (Sainsbury’s) and was designed to investigate multi-channel shopping behaviour within Sainsbury’s grocery business

Masters Research Dissertation Programme

Case Studies 2016

CONSUMER DATA RESEARCH CENTRE

www.cdrc.ac.uk

Edited by Guy Lansley7 October 2016

Page 2: CONSUMER DATA RESEARCH CENTRE Masters Research ... · Sainsbury plc (Sainsbury’s) and was designed to investigate multi-channel shopping behaviour within Sainsbury’s grocery business

Forward

The Consumer Data Research Centre’s Masters Research Dissertation Programme instigates

several student led research projects which seek to tackle topical problems put forward by

industry. Each year we invite representatives from major retailers and organisations which

handle consumer data to propose research ideas to Masters level students. Students from all

UK institutions are eligible to apply to undertake the projects via the CDRC website.

Successful students complete the research over the summer with joint supervision from their

academic tutors and their industry sponsor.

This year we welcomed 16 students from a diverse range of academic disciplines. Between

them they used a wide range of software packages and analytical techniques in their

research.

A selection of short summaries of their research have been provided in this document.

If you are interested in becoming an industrial partner or are a student wishing to find out

more please visit https://www.cdrc.ac.uk/retail-masters/ or contact Guy Lansley

[email protected].

Page 3: CONSUMER DATA RESEARCH CENTRE Masters Research ... · Sainsbury plc (Sainsbury’s) and was designed to investigate multi-channel shopping behaviour within Sainsbury’s grocery business

Modelling Multi-Channel Adoption at Sainsbury’sThomas Albery1, Charlotte Price1 and Tim Rains2

1University of Warwick, 2Sainsbury’s

Project Background

There is a substantial body of previous researchwhich shows that customers who shop acrossmultiple channels are good for business. Forexample, those who shop online and in-storeare understood to be more profitable, moresatisfied and more loyal than those who shopthrough only one channel. It is thought that thisis due to the increased levels of convenienceprovided by multiple channels, as well as themore sophisticated set of interactions betweenbusiness and customer enabled throughmultiple contact points. To this extent, strategicapproaches to retailing no longer focus onmaximising the value of a customer’s nexttransaction, but on increasing the lifetime valueof customers by encouraging them to shopacross multiple channels (Chu & Pike, 2014).

Given the acknowledged benefits of multi-channel usage among customers, knowledge offactors driving multi-channel adoption, as wellthe ability to predict customers’ future channelchoices are important considerations. Thisresearch project was undertaken for JSainsbury plc (Sainsbury’s) and was designedto investigate multi-channel shopping behaviourwithin Sainsbury’s grocery business.Sainsbury’s currently sells grocery productsthrough over 700 convenience stores and 600supermarkets, as well as taking an average of215,000 online orders per week (J Sainsburyplc, 2015). The project aimed to investigatemulti-channel shopping behaviour with dualaims of:

Developing a statistical model to predict

single-channel customers with a high

likelihood of adopting multiple channels.

Explaining the main drivers of multi-channel adoption among Sainsbury’scustomers.

Data and Methods

While a number of studies have investigatedmulti-channel adoption by surveying customers,this study aimed to do so using a sample ofSainsbury’s customer data. The sample used forthe analysis was made up of 150,012 activeNectar Card users. These were customers whohad signed-up to Sainsbury’s Nectar Cardloyalty scheme. Data was drawn from twoconsecutive years of transaction history,allowing for an assessment of change in

behaviour over time. Logistic regression wasused to model the data, allowing for both thekey drivers of multi-channel adoption to beidentified and probabilities of multi-channeladoption to be derived at the customer level.Variables selected as potential predictors werebased on a review of previous research. Theyincluded: a number of geodemographicindicators, some of which were drawn fromopen data sources, such as Office for NationalStatistics (ONS) data; distance variables, whichprovided information on levels of customeraccess to Sainsbury’s stores; and variablesderived from transaction histories, such as thevalue and frequency of transactions.

Key Findings

The findings of the research show that it ispossible to predict multi-channel adoption withincreased levels of accuracy in comparison to arandom model, offering the possibility of moreeffective targeting of offers andcommunications aimed at encouraging newchannel adoption. In terms of the key drivers,the study finds evidence that a customer’sprevious channel is the most important factor,with convenience customers having increasedodds of adoption and online customers havingreduced odds, when compared to supermarketcustomers. The study finds evidence ofneighbourhood effects, whereby larger numbersof neighbouring customers shopping in all threechannels increases the odds of a given single-channel customer adopting multiple channels.Evidence is also provided that there areincreased odds of adoption in areas with higherthan average levels of non-white ethnic groups.

Page 4: CONSUMER DATA RESEARCH CENTRE Masters Research ... · Sainsbury plc (Sainsbury’s) and was designed to investigate multi-channel shopping behaviour within Sainsbury’s grocery business

An investigation of what triggers customer activation of credit facilitiesApichaya Boonpratanporn1, Zenon Michaelides1, Simon Hill2, Nicola Dunford2

1University of Liverpool, 2Shop Direct

Project Background

Shop Direct wants to understand thebehaviours that prompt its cash customers tostart using credit facilities in order to providethe best possible user journeys and outstandingservice for individual customers. The research,based on this business case study, is aninvestigation of the characteristics of those cashcustomers that are most likely to apply forcredit facilities to purchase products in future.

Data and Methods

The investigation scope starts from June 2014to July 2015. The total sample set was 374,320customer records, including all customers whosuccessfully converted their account andsampled cash customers who purchasedproducts during the period. The target was setas binary indicator; 1 – represent creditconversion customers and 0- represents non-credit conversion customers. The data relevantto customers’ characteristics and purchasingbehaviors one year before conversion date werecollected into one sample set. However, thosewho received email contact about cash to creditpromotion were excluded due to the absence ofany cash to credit marketing campaigns duringthe study period. The overall methodologyconsisted of data mining procedures. A decisiontree was selected as a main technique forattempting to answer the research question,while logistic regression plays a notable role asa competing model. Both data modellingprocesses were performed in SAS EnterpriseMiner. Prior to building the models, datacleaning and missing value replacementprocedures were run. This was especiallyimportant for the logistic regression modelwhich is quite sensitive to missing values.

Key Findings

The result between the two algorithms weresimilar; however the logistic regressioncontributed a broader range of answers. Withthe high number of new membership whoconverted their accounts, it became difficult tofurther investigate due to the lack of theirhistorical purchasing data and profiles.Therefore, new memberships were removedfrom this analysis scope. The analysis revealedthat customers who had previously beenrefused credit and frequently visited the websitewere most likely to convert their accounts.These two aspects were also found in the

results of logistic regression. However, thisalgorithm contributed additional possible factorssuch as gender, customer segment group, age,tenure, products viewed, purchasing channels,and month conversion (Table 1). The gain chartis used to measure model performances interms of response rate (Figure 1). It is foundthat more predictive answers do not alwaysgenerate the greater result. It is suggested thata combination of the two algorithms provide theoptimal insight.

Table 1 Output interpretation

Figure 1. Gain Chart

Value of the Research

This output could be further analysed to delivera recommendation plan in order to create theoptimal user journeys for pure cash customersand those who are likely to want to becomecredit customers in future. With thisinformation, the customer teams can createretail and financial service strategies that servethe right information to the right people.

Model type Effect variables Evaluation output

Gender Female > males

Customer segment groupMost are in Financially stratched group, following by

customers in Confortable Communities group

Age Younger age group

Tenure More than 2 years

Visited site

1. More frequently viewed products , especially in

electronic department and furniture

2. Viewed credit information before conversion

Purchasing behaviour

1. Credit request refusal experience: Yes > no

2. Month conversion: Purchased in May

3. Order channel: Offline > online

4. Payment pattern: switch to purchase with credit

card before conversion

5. Incentive uses: use discount code (pound off)

6. Low or no any purchasing amount in Home

products

Effect variables Evaluation output

Visited site Frequently viewed product ; more than 8.5 times

Purchasing behaviour experienced credit request refusal before

Logistic

Regression

Decision Tree

Page 5: CONSUMER DATA RESEARCH CENTRE Masters Research ... · Sainsbury plc (Sainsbury’s) and was designed to investigate multi-channel shopping behaviour within Sainsbury’s grocery business

Social Energy Responsibility: Identifying Vulnerable Energy Customers Through a K-MeansClustering Approach

Ffion Carney1, Alex Singleton1 and Ben McKeown2

1University of Liverpool, 2E.ON UK

Project Background

In order to reduce domestic energyconsumption and increase domestic energyefficiency, the government implemented theEnergy Company Obligation (ECO); a scheme toobligate large energy suppliers to deliverenergy efficiency measures to domesticpremises. One of the obligations of this schemeis focused on improving the ability of lowincome and vulnerable households to heat theirhomes. In order to achieve this obligation, it isvital that energy companies are able todetermine what constitutes as a 'vulnerablehousehold' and are also able to identify whichareas should be targeted. This study thereforeaimed to identify areas that contain a highproportion of vulnerable households and shouldbe targeted as part of the ECO, by taking intoaccount demographic and propertycharacteristics alongside average annual energyconsumption data.

Data and Methods

The main dataset used in this study was E.ON'sin-house customer data, which consisted ofactual and modelled annual electricity and gasconsumption data alongside severaldemographic and property characteristics forover 3.6 million E.ON customers. The datasetwas aggregated to LSOA level to allow foradditional data from the 2011 Census to beincluded, primarily consisting of data onhousing tenure, residence type and averageproperty size that were not included in the mainE.ON dataset. These variables were selected asthey had all been shown to have a strongrelationship with energy consumption, efficiencyand vulnerability. The final dataset was thenused to undertake a k-means clusteringanalysis, with the aim of identifying the clusterthat contained LSOAs with the highestproportion of vulnerable households.

Key Findings

The k-means clustering algorithm was found tobe an effective method of segmenting thedataset into seven distinct clusters. Comparingthe defining characteristics of each of theseclusters, alongside their electricity and gasconsumption, allowed for the cluster containingLSOAs with the highest proportion of vulnerablehouseholds to be identified. Cluster 6, named'Fuel Poor Private Renters', was identified as the

most vulnerable cluster, primarily due to thehigh proportion of low income households alongwith the higher than expected average energyconsumption when compared to income. Thiscluster also contained a significant proportion ofsolid walled properties and households sufferingfrom fuel poverty.

In addition to identifying the cluster containingLSOAs with a high proportion of vulnerablehouseholds, this study also ranked theremaining clusters in terms of vulnerabilitybased on their defining characteristics andaverage energy consumption, as seen in Figure1.

Figure 1. Cluster vulnerability ranking

Value of the Research

This study provides a method for identifyingareas that contain a high proportion ofvulnerable E.ON energy customers and shouldtherefore be targeted as part of the ECO. Thevulnerability ranking produced in this studycould also assist in any future targeting ofenergy efficiency measures, allowing for areaswith a higher proportion of vulnerablehouseholds to be prioritised. This understandingof vulnerability is vital for the effective targetingof vulnerable households in order to ensure thesuccessful implementation of energy efficiencymeasures and carbon policies in the future.

Page 6: CONSUMER DATA RESEARCH CENTRE Masters Research ... · Sainsbury plc (Sainsbury’s) and was designed to investigate multi-channel shopping behaviour within Sainsbury’s grocery business

An analysis of Argos concession store performance located inHomebase and Sainsbury’s stores across the UK

Duncan Clayton-Stead1, Andy Newing1 and James Holden2

1University of Leeds, 2Argos

Project Background

Little is known about how concession storesbehave and there are various reasons behindthe development of concession stores within theretail sector. Factors such as competition, thedevelopment of online retailing and stricterplanning laws are all causing retailers tomanage their stores more efficiently. As aconsequence retailers are looking at alternativemethods in an attempt to continue theirexpansion. This dissertation intends to help inthe understanding of how concession storesperform. In total 111 Argos concession storeslocated within Homebase and Sainsbury’soutlets were analysed in order to interpret howthese stores perform and what the potentialfactors of store performance are.

Data and Methods

All datasets used within this research wereprovided by Argos. In total there were fourcategories of data. The first dataset usedexamined sales data for all 111 concessionstores. This data was presented as concessionpostcode spend and provided sales figures foreach postcode for every concession store.Average weekly sales figures had to becalculated because the volume of data for eachstore was not consistent, due to some storesbeing open for longer periods than others. Astore attributes dataset was also providedwhich contained key characteristics of Argosconcession stores, such as their store size andstorage space. The demographic data providedwas valuable in that it provided populationfigures for store catchments and the percentageof people in each catchment who classify withineach of the Mosaic Classification categories(Experian, 2014). The final dataset includedlimited competition data, such as the number ofcompetitor retail units within a 20 minute driveradius.

The research required an analysis of storeperformance in order to find out which Argosconcession stores performed strongly. This wasachieved by undertaking sales data analysis.The analysis found that there were threesuitable measures of store performance; storeswhich consistently performed well on a weeklybasis would have stronger average weeklysales. The trading intensity of a store was ableto examine the correlation between sales and astore’s storage capacity. It was also noted that

during the week of Black Friday, all concessionstores performed considerably better than theydid on a weekly basis. Following this, thedrivers of store performance were thenexamined. A k-means clustering analysis wasundertaken within IBM: SPSS, which groupedtogether stores based on their three measuresof store performance. An examination of storecharacteristics could then be made. Storeswhich had similar performance levels shouldshow similarities in their characteristics, thushelping to find the drivers of concession storeperformance.

Key Findings

The results from this research has provided abetter understanding of how Argos concessionstores perform and that there are three suitablemeasures of concession store performance. Thestrongest performing Argos concession storeshave densely populated catchments with a largepercentage of typical Argos customer type andthat the stores are small in size. The weakestperforming concession stores have smallcatchment populations, with a low number ofcompetitor units within a 20 minute drive,suggesting that competition may not havemuch influence on concession storeperformance, contradicting that found inliterature.

Value of the Research

This research has provided a greaterunderstanding as to how concession storesperform. The trade intensity of Argosconcession stores could be beneficial for Argoswhen negotiating with partner retailers duringthe process of opening new stores, becausesmaller stores appear to have a strongerperformance. The clustering process indicatedthe characteristics needed for strongperforming stores. Argos could find similarlocations to these strong performing stores forpotential new store locations.

Page 7: CONSUMER DATA RESEARCH CENTRE Masters Research ... · Sainsbury plc (Sainsbury’s) and was designed to investigate multi-channel shopping behaviour within Sainsbury’s grocery business

How does competitor presence influence the performance of click and collect sites?Alec Davies1, Dani Arribas-Bel1 and Matthew Pratt2

1University of Liverpool, 2Sainsbury’s

Project Background

Click and collect is a relatively new service inthe supermarket industry. Sainsbury’s haveonly offered the service for less than 2 years,with little known of the effect of competition,catchments and population characteristics onstore performance. This study aimed to bringfurther understanding of performance andcompetition across click and collect services.Existing literature has demonstrated that aswell as local competitor counts,geodemographic factors had strong links tocustomer loyalty and thus were also consideredin the analysis.

Data and Methods

This paper uses the empirically testedgravitational model of Huff (1963) in order toproduce non-linear catchments for Sainsbury’sgrocery click and collect operation acrossEngland, following the methodology used byDolega et al (2016). This project utilised opensoftware, notably R, used due to its opensource nature of infinite refinability, along withQGIS for quick visualization. Catchmentestimation required applying a methodology forretail centre catchments to Sainsbury’s groceryclick and collect points by using anattractiveness measure of store descriptives togenerate store catchments – mainly store sizeand trade intensity based. Once the catchmentswere created, point in polygon analysis wasused to derive competitor numbers. The studyused two competition datasets, an in-housedatabase and GeoLytix retail points. Bothdatasets were cleaned to only include majorcompetitors offering similar product ranges.Geodemographic variables including the Indexof Multiple Deprivation (IMD), the Internet UserClassification, car or van availability, highestqualification (IUC), NS-Sec and Ethnicity werealso aggregated to store level using weighting,mean and mode (variable dependent), andmerged to store descriptives for furtheranalysis.

Key Findings

Point in polygon analysis of the catchmentsshowed that Sainsbury’s own dataset was verysimilar to GeoLytix retail points and thus couldbe considered accurate. Once cleaned to onlyinclude major competitors, the datasets werevery similar in count and distribution, althoughthe individual competitor counts had some

variance. Regression modelling was used toexplore the effect of competition on demand.For both datasets greater competition increaseddemand. Store characteristics andgeodemographic factors of catchments wereused to further assess the extent of the effectsof competition on performance. Storecharacteristics inclusion led to competitiondecreasing with demand, although thecoefficient was insignificant and likely biased bythe use of these factors in attractiveness.Geodemographic factors of IMD, IUC andcensus variables led to an almost doubling ofthe effect of competition and much more of themodels explained demand (with increased r-squared values).

Figure 1. Huff catchments and store demand forclick and collect points

Value of the Research

The paper demonstrates a practical applicationof huff catchments at the national level forindividual stores. The paper has real worldapplication with planned use in the decisionmaking process for the next 5 years of click andcollect at Sainsbury’s and also the selection ofsites for the next 100 collection points,replacing linear catchment analysis. The studyis of value not only to the sponsor but also tothe wider online grocery market, showing easyapplication for more complex catchments andfurther consumer understanding.

Page 8: CONSUMER DATA RESEARCH CENTRE Masters Research ... · Sainsbury plc (Sainsbury’s) and was designed to investigate multi-channel shopping behaviour within Sainsbury’s grocery business

Identifying Drivers of Full Price Sales of Clothing and Footwear for an Online RetailerJyldyz Djumalieva1, Teresa Brunsdon1, Matthew Doubleday2

1Sheffield Hallam, 2Shop Direct

Project Background

From the perspective of any retailer, thefinancial return is greatest for full price salesand, therefore, the strategical imperative is toexplore ways to limit the level of promotionaldiscount given to customers and to stimulatefull price sales instead. In line with this, theprimary objective of this work was to identifydrivers of full price sales of women’s clothingand footwear (C&F). As a secondary objective,the impact of customers’ membership in aparticular segment on their propensity to shopat full price was evaluated. The rationale forlooking at customer segments was that whilethere are likely to be common factorsinfluencing customer shopping behaviour, in asituation where distinct groups of customersexist within a population, individual groupsmight be influenced by these factors to adifferent extent.

Data and Methods

As for the scope, the study focused oncustomer and product characteristics related toC&F orders made by customers at Very.co.uk,one of the Shop Direct websites, during 2015and 2016. To achieve the objectives of thestudy, cluster analysis was conducted as a firststep to identify distinct segments of customerswithin the data. As a result, six clusters wereformed, which were found to differ substantiallyon cash/credit status, the proportion of C&Fitems purchased with Collect + option andshopping diversity (i.e. the number ofdepartments shopped from). Following this, alogistic regression model was fitted to estimatethe probability of full price sales and, alongsideother potential predictors, customer segmentwas included as an input variable.

Key Findings

Several significant predictors of full price saleswere identified, including both factors thatincrease the likelihood of a full price sale andthose that reduced it. The most prominentfactors with a positive impact on full price saleswere average price paid for a C&F item andprevious spending on C&F category. Among thestrongest predictors with a negative impact onfull price sales were average savings per C&Fitem, the proportion of C&F items purchasedwith Collect + option and number ofdepartments ordered from.

Customer segment was found to be astatistically significant predictor as well. Thereis strong evidence that the observed variationin propensity to shop at full price amongclusters is explained to a large extent by thefact that the variables, which were decisive informing the clusters, were also found to besignificant predictors of full price C&F sales. Forinstance, number of departments and theproportion of C&F items purchased with Collect+ were the variables that separated the sixclusters the most. These were also identified assome of the strongest predictors in the finalmodel. At the same time, it is possible thatassignment to clusters captures additionalfactors, not directly reflected by the inputvariables used. The obtained insights suggestthat there are several distinct customersegments, which differ substantially in theirpropensity to shop at full price (Figure 1).

Figure 1. Estimated typical probability of fullprice sales across clusters. The size of thebubble refers to the size of cluster in 2015

Value of the Research

The study findings regarding customersegments and components of the predictivemodel are likely to enhance the industry’sunderstanding of customer characteristics thatdrive full-price sales. Equipped with moreinsight about the strength and direction ofrelationships between customer characteristicsand shopping behaviour, online retailers couldimprove the effectiveness of promotionalactivities and to reduce the proportion ofdiscounted sales thereby increasing profitability.This research also contributes to the industry’sstate of knowledge about the performance andaccuracy of various predictive modellingapproaches.

Page 9: CONSUMER DATA RESEARCH CENTRE Masters Research ... · Sainsbury plc (Sainsbury’s) and was designed to investigate multi-channel shopping behaviour within Sainsbury’s grocery business

The performance of Argos concessions in other storesNatalia Gil1, Vassilis Kodogiannis 1 and James Holden2

1University of Westminster, 2Argos

Project Background

Sainsbury's announcement of the acquisition ofArgos will create a multi-product multi-channelretailer. As a consequence, as many as 200 ofArgos's 845 stores are expected to close overthe coming years with some relocated inSainsbury's supermarkets. They are referred toas concessions stores.

Similarly, Homebase also announced thepurchase of part of Argos business andtherefore Argos concessions will also beallocated in Homebase retailers. From anindustry point of view, research has been donefor Argos own stores using only classicalmethods but not comparing differentforecasting techniques to predict andunderstand annual sales in this new concept ofdistribution.

Data and Methods

Five initial data sets provided by Argoscomprise store characteristics, parent storeattributes, catchment demographics,competition information and weekly salesinformation.

Weekly sales have been aggregated into annualsales. Taken as an indicator of storeperformance, annual sales have been analysedfor the period between June 2015 to May 2016using association and modelling techniquessuch as correlations, ANOVA, T Test and theirnon-parametric equivalent tests Kruskall Wallisand U Mann Whitney.

Additionally, forecasting techniques includingmultiple linear regression analysis, Multilayerperceptron neural networks and Chaid DecisionTrees have been applied to forecast annualsales once having identified the best predictorsfrom the initial datasets. Input variables havebeen chosen amongst the highest significantlycorrelated variables with annual sales with afew transformed variables to avoidmulticollinearity.

Key Findings

A few hypothesis using the above mentionedtechniques have been set in order to establishwhether there was a significant difference in theannual sales of concession stores based ongeographic location, type of parent store,affluence of people in the catchment area and

the number of competitors at five minutesdriving distance.

Figure 1. Argos datasets

Results from the analysis show there are nosignificant differences in annual sales based ongeographical location (North, East, South andCentral parts of the UK as well as locationinside or outside of M25).

Annual sales are higher at Sainsbury’s storescompared to Homebase where size of the storeand space available to the public aresignificantly greater.

With regards to affluence there is a significantpositive correlation between less affluent socialgroups and annual sales, meaning Argos targetmarkets are medium to low socioeconomicgroups.

The presence of competitors at five minutedistance negatively affects annual sales. Thisnegative relationship has not been found forcompetitors located at twenty minute distance.

A few models have been compared usingMultiple Linear Regression, CHAID DecisionTrees and Multilayer Perceptron NeuralNetworks to forecast annual sales at theconcession stores. A seven input multiple linearregression model and a two layer perceptronneural network have offered the best forecast.

Value of the Research

This research has found the main drivers ofconcession store performance that can helpmanagerial staff to understand and explainArgos concession stores sales results.

The forecast models to predict annual sales ofexisting and future concession stores makinguse of not only classical methods but alsomachine learning techniques is also aimed tohelp future decision making.

Page 10: CONSUMER DATA RESEARCH CENTRE Masters Research ... · Sainsbury plc (Sainsbury’s) and was designed to investigate multi-channel shopping behaviour within Sainsbury’s grocery business

Can interactive data visualizations enable a retailer to identify new insights about customer purchasebehaviour?

Audrey Henkels1, Jason Dykes1 and Tim Rains2

1City University of London, 2Sainsbury’s

Project Background

Consumer behaviour in the retail industry ischanging due to the rise in convenience andmulti-channel retailing. In response to thesechanges, and to better understand the types ofshopping missions undertaken by customers,Sainsbury’s has developed a four-tierclassification system of shopping baskets.Through surveys and predictive clustering,Sainsbury’s has developed an algorithm thatassigns each transaction into one of fourmission types: 1) “Food for Now;” 2) “Food forLater Today/Tonight;” 3) “Food for Tomorrow/ACouple of Days;” and 4) “Food for Many Days.”Through this project, Sainsbury’s sought thedevelopment of interactive visualization(s) tobetter understand the trends regarding thesefour mission types in order to make decisionsregarding store layout and planning.

Data and Methods

This research used the Design StudyMethodology (DSM) to guide the process ofeliciting requirements, and designing, building,and implementing interactive visualizations. Thevisualizations use visual data mining techniquesto enable users to identify trends in a complextransaction dataset and were built onProcessing, a Java-based open-source platform,according to the Incremental Developmentmethodology. The DECIDE Framework was usedto guide an evaluation session with threeunique participants, who completed timed tasksusing the visualizations and questionnairesrating the effectiveness and efficiency of thevisualizations to address two main tasks. Thesetasks were: 1) Determine how mission typepercentages from one category and weekcompare to another category and/or weekwithin the dataset; and 2) Determine howmission type percentages differ by category,time of day, weekend vs. weekday, or weekwithin the dataset.

Key Findings

Two visualizations were created through theproject. Visualization #1 includes two side-by-side donut charts displaying mission typepercentages for a given category and/or week.A line graph depicts the overall trend of themission types for each selected category at thebottom of the visual and serves as a tool forusers to select a given week. Visualization #2displays circles whose area corresponds to the

number of each type of transactions for eachmission type per hour, color-coded forweekends and weekdays, for a given categoryand/or week.

Figure 1. A demonstration of Visualization #1

The results from the evaluation of the twovisualizations were very promising. Each of thethree participants specified that he/she“strongly agreed” or “agreed” that the taskswere relevant to his/her role and respondedthat he/she “strongly agreed” or “agreed” thatboth visualizations were effective and efficientin addressing the two tasks. Each participantspecified at least one new learning, a question,and a hypothesis that could be answered foreach visualization. Additionally, each participantspecified that he/she would use Visualization#1 to do his/her job.

Value of the Research

The evaluation results suggest that thevisualizations created could help Sainsbury’sidentify new insights into customer behaviour,which may help for planning purposes withinexisting stores, or to make forecasting decisionsto expand, downsize, open, or close stores. Theevaluation discussions suggested potentialimprovements that could be made andadditional features that could be added tobetter address these tasks or other needs of theorganization. This project may lead to moreinteractive visualizations using visual datamining being used at Sainsbury’s and othersupermarkets or retailers to drive decision-making behaviour.

Page 11: CONSUMER DATA RESEARCH CENTRE Masters Research ... · Sainsbury plc (Sainsbury’s) and was designed to investigate multi-channel shopping behaviour within Sainsbury’s grocery business

Youths Spending & GeodemographicsChrysanthi Kollia1, Guy Lansley1 and Ben Gilbert2

1University College London, 2goHenry

Project Background

For many years, youths were considered asinvisible consumers due to their absence frommost consumer datasets. However, theunderstanding that youths have significantspending power gradually aroused the interestof researchers and the retail industry. Thisstudy seeks to analyse youths’ consumptionhabits by using the data from a youth bankingcard provider (GoHenry). Youths’ accountdetails and their transaction data can provide agood insight into their consumption behaviour.Based on the existing literature, theirconsumption profile is known to be influencedby factors such as age, gender and the socio-economic background of their families.However, the extent of this relationship has notbeen extensively researched with a largedataset.

Data and Methods

For this study, data pertaining to users of thepre-paid debit card scheme whose ages rangefrom 8 to 18 were provided. The data includesdemographic information about the users aswell as records of their card transactions. Theprotection of this very sensitive data has beenensured by accessing the data onsite at the JDIResearch Laboratory at UCL only andappropriately aggregating any outputs. Bycomparing the demographics of users topopulation characteristics recorded from the2011 Census, it was possible to estimate therepresentation of the data. The retailers fromthe transaction data were also aggregated into13 groups: Supermarkets, Catering, Apparel,Health/Cosmetics, High Street Shops,Entertainment, Education, Transportation/PetrolStation, Amazon, Online Media and SubscriptionServices, Paypal, ATM and Miscellaneous.Paypal and Amazon were isolated as uniquecategories due to the large volume oftransactions between them and the fact thatneither neatly fit into the other retailcategories. Considering the distribution oftransactions between different categories, itwas then possible to statistically cluster theshopping habits of regular users. The eventualaim was to comprehend if trends in shoppingbehaviour can be linked to demographic andsocio-economic characteristics.

Key Findings

This study confirms that youths are a uniqueand yet diverse group of consumers. The

research identified interesting trends. Girlstypically spend more money from a youngerage than boys, although this trend levels out forthe older participants. The research alsoidentified that the most popular retail categoryfor youths was Supermarkets. However, trendsin the popularity of different retail categoriesvary by demographic characteristics. Purchasesin ‘Online Media and Subscription Services’ aremainly driven by boys while girls spent moremoney on Catering outlets. It was alsointeresting to consider variations in the averagetransaction amount per category.

Figure 1. The spread of spend between retailcategories by different again groups based on asample of users from 2014

Value of the Research

This research attempts to provide a goodinsight into how youths spend their money.While little is known about youths’ consumptionpatterns, research findings suggest thatretailers should accept youths as a unique anddiverse group who possibly change theirbehaviour substantially as they mature. Thedata from the banking card provider could alsobe useful for identifying longitudinal changes inconsumer behaviour. Whilst this study primarilyfocused on a snapshot of data from one year,there is scope to acquire more data tounderstand the changing consumer attitudesand behaviours that develop as youths age, andhow these may vary depending ongeodemographic influences.

Page 12: CONSUMER DATA RESEARCH CENTRE Masters Research ... · Sainsbury plc (Sainsbury’s) and was designed to investigate multi-channel shopping behaviour within Sainsbury’s grocery business

Understanding and Predicting Consumer Behaviour in Music Festivals with Machine LearningLuis Francisco Mejia Garcia1, Guy Lansley1 and Ben Calnan2

1University College London, 2Movement Strategies

Project Background

Music festivals have become very popular socialevents across the globe and make a largeproportion of their revenue from the sale ofconsumables within the festival sites. However,as these purchases are typically concentratedover a couple of days in temporary locations,relatively little insight can be achieved aboutconsumers relative to the more sophisticateddata and modelling approaches available tolongstanding retailers with an establishednetwork of stores. Therefore, new technologiesand innovative techniques could be useful toestimating temporal patterns in footfall across afestival site in order to model patronage at pop-up catering facilities.

Data and Methods

This study presents an exploratory analysis ofnewly available GPS data collected from the FYFmusic festival in the United States in order toestimate consumer behaviour inside the event.The data was originally collected from a mobilephone app made available to festival visitors.The festival app used the mobile phonespositioning systems to record users locations atvarious different time intervals. In total thedata included a 100 million records of userlocations at various times, social mediainformation on some users and the festivalschedule. 6 features were engineered torepresent the factors that might be influencingthe users in the decision when they go to thebar areas. Machine learning algorithms such asRandom Forest and Artificial Neural Networkswere subsequently trained using these featuresto identify which are the most influential factorsfor estimating visits to the bar areas across thefestival.

Key Findings

The influence of the 6 variables over time onSunday have been shown in figure 1. The ‘totaltime spent by the users in the festival’ wasshown to be the most influential factor on theconsumer followed by ‘time since their last visitto the bar’. The ‘distance from the closest bar’feature didn’t prove to be a significantinfluencing factor. The ‘artist’s popularity’seems to have more influence in the case ofupcoming shows than in the past shows.‘Gender’ feature is the less influential factoraccording to the results.

Figure 1. Feature Importances on Sunday

A prediction model was also devised and usedon a sample of data from Sunday which werewithheld from our initial analysis. The predictionmodel based on artificial neural networkspresented an accuracy of 75% when comparedto the actual results – despite the festival dataonly pertaining to two days of data.

It was also proved that the modification oforganisational factors could lead to anincrement in the occupancy of the bars, forexample, with a decrement of 30% of thedistance between the users and the bar areasthe occupancy could be incremented by 7% onaverage.

Value of the Research

The results of this study demonstrate theimportance of a good feature engineeringprocess as the behaviour of people can bededucted from a relatively simple locationdataset. The Technology used for the collectionof data in this project proved to be a goodsource of information useful for modelling theoccupancy in bar areas.

The classification and predictive modelsanalysed in this work could represent anopportunity for companies to implement asimilar process in other scenarios. Occupancy ofpublic spaces or the analysis of consumerbehaviours across retail sites could be analysedwith the correct feature selection and theappropriate machine learning models if similardata is made available.

Page 13: CONSUMER DATA RESEARCH CENTRE Masters Research ... · Sainsbury plc (Sainsbury’s) and was designed to investigate multi-channel shopping behaviour within Sainsbury’s grocery business

Topic extraction and document classification on textual survey data with Unsupervised modellingtechniques

Eirini Milaiou1, Guy Lansley1 and Chase Farmer2

1University College London, 2CACI UK

Project Background

In most customer surveys there is plenty ofinformation in the form of comments in rawunstructured texts. Thus, the necessity oftaking this information into account forunderstanding customer behaviours leads to theneed for analysing this data within frameworksfrom text mining to extracting underlyingpatterns. Topic modelling is a highly populararea of text mining for documents toautomatically understand their content andextract high quality information, without humanannotation. In this project the main aim is toidentify and to capture the underlying topicsand to cluster the user’s comments into thosetopics, using unsupervised topic modellingtechniques. The research was based on datafrom a large survey of shopping centres acrossthe UK.

Data and Methods

The main challenge was to handle the structureof the documents, which are short and withoutproper syntax text messages. To achieve theset goals, topic extraction models wereimplemented to reveal the latent themes in thecollection and to cluster the documentsaccordingly. Biterm, LDA with variationalinference and Gibbs sampling and Topicmodelling with distributed representation ofwords were the algorithms which wereimplemented and tested for this problem.Biterm topic modelling and Gaussian MixtureModels with distributed representation of wordswere examined due to their good performanceon short documents. LDA, even though it is notaddressed by the literature as the mostappropriate algorithm for modelling shortdocuments, it was favoured because thedocuments from the data typically onlyrepresent singular topics. Most documentsexpress their topics clearly and in few words. Inthat way, it is assumed that LDA can capturethe underlying topics by operating on shortdocuments. The evaluation of topic models isnot a straightforward task due to the lack oflabelled/test data. The evaluation wasapproached as a three level procedure includingqualitative and quantitative methods, based onthe themes interpretation, topic coherence andthe successful clustering of the documents. Forthe best results, cosine similarity of the topicswas also conducted, it identified that no pair of

topics exceeded similarity of 0.4 illustrating thatthe themes are satisfyingly discrete.

Key Findings

LDA with Gibbs sampling on single documentsoutperforms the rest of the models. The mostsatisfying model produced distinctive andinformative topics, close to the industry’sexpectations and it also performed well onclassifying comments into the appropriatetopics. Following an exploration of perplexityscores, it was concluded that this particulardataset can be described by 13 distinct topics.Additionally, a combination of the extractedfeatures of the documents with the numericalvariables in the dataset highlights somepatterns regarding the attributes of eachshopping centre, patterns which would beinefficient to be extracted by human inspection.For instance, analysis of certain shoppingcentres using the composition of survey topicsand specific ratings was carried out, whichindicated potential issues the centres exhibit.Moreover, clustering of these centres was alsoconducted using the distribution of commentsper topics in order to identify broad trendsacross the data.

Figure 1. Grouped words using the means fromthe components of a Gaussian Mixture Model

Value of the Research

The novelty of this work is that it assessesvarious topic extraction techniques on shortcomments, a useful tool for survey analysis.Additionally, for the evaluation of the results, athree level assessment is suggested in order toencounter the problem of lacking labelled data.However, the experiments on a real-worlddataset from industry were successful and areuseful for achieving quick insight on largevolumes of textual data.

Page 14: CONSUMER DATA RESEARCH CENTRE Masters Research ... · Sainsbury plc (Sainsbury’s) and was designed to investigate multi-channel shopping behaviour within Sainsbury’s grocery business

An Empirical Study into Co-op On-the-Go Stores’ Turn-in Rate Using a Scorecard ApproachYing Shen1, Graham Clarke1 and Peter Woodhouse2

1University of Leeds, 2The Co-operative Food

Project Background

The convenience sector has become one of thegrowth engines in the UK grocery market. Asthe major supermarkets are facing a bottleneckof super stores’ development, there has been aheightened focus on improving theirconvenience store (c-store) offerings. Manyconsumers shop at c-stores due to their easyaccessibility and extended opening hours, andtheir baskets typically only contain a few items.Therefore, the location and the turn-in rate (orthe rate of store visits per passer-by) are vitalto a convenience outlet. However, the locationevaluation methodology applied for big storesare redundant to c-stores because micro-leveldata is not applicable for those methods.Moreover, there is little knowledge on turn-inrates due to the difficulties of data collection.Thus, this project researches locational andconsumer behavior variables to explore theinfluential factors on the urban conveniencestore’s turn-in rate.

Data and Methods

The footfall and visitor data of 30 conveniencestores located in UK’s major cities are providedby Co-op. Other data was made available fromGoogle Maps. After the review on shoppers’behaviour and patronage decisions, a scorecardapproach was undertaken consisting of fourinfluential variables. The four variables(accessibility on foot, store visibility, distance tostations and road traffic) are scored bydesignated matrices designed for each factor.The weights of each sub-attribute in the matrixare evaluated. Regression models were appliedacross the sample stores to analyse therelationship between turn-in rates and thescorecard variables. After that, the results werevalidated in the validation store samples.

Key Findings

The demographic attributes of the catchment iscritical to store patronage to supermarkets andhypermarkets, however demographical factorsare less significant to c-stores. Previous studieshave indicated that the customers from allsocial groups visit convenience outlets atrelatively similar rates. The result of the 2-tailed Pearson test also shows similar outputthat the social-demographics do notsignificantly correlated to turn-in rate.

Both regression models illustrate the significantcorrelation between the predictors and the turn-in rate captured by the devices. The resultshows that the multi-regression model providesa better fit prediction (R square = 0.866) thanthe linear regression model tested with thescorecards total scores (R square = 0.846).

Figure 1. Linear regression scatter graphbetween total score and turn-in rate

The research shows that for convenienceoutlets, the exterior atmosphere and microlocation factors act as more important roles onconsumer patronage than they do for largerstore formats. Especially for the outlets locatedin major cities, customers would shop for theirinstant needs (like newspaper, meal for today,or refreshment). Therefore a convenientlocation and eye-catching outlook are importantto attract consumers to visit the store.

Value of the Research

This is the first research on the conveniencestore turn-in rate and provides insights on c-stores located in metropolises. The scorecardapproach is feasible to evaluate a shop’sattributes and predict the turn-in rate withsimple calculations. The simplicity of thismethod enables easy deployment across thebusiness and the wider industry. In light of thisadvantage, this approach could also be easilyapplied in other works. It can be used to predictthe patronage rates change on store refittingevaluation. It also can be used for newconvenience store location selection. With thepredicted turn-in rate and the footfall data,sales and turnover can be predicted.

Page 15: CONSUMER DATA RESEARCH CENTRE Masters Research ... · Sainsbury plc (Sainsbury’s) and was designed to investigate multi-channel shopping behaviour within Sainsbury’s grocery business

An Investigation into the Potential of Bluetooth Beacons to Monitor the Movement of People onPublic Transport: A Preliminary Case Study of the Norwich Bus Network.

Daniel Stockdale1, Guy Lansley1 and Ben Calnan2

1University College London, 2Movement Strategies

Project Background

There are currently a limited number ofmethods to derive important movement data ofpassengers on most public transport systems,beside expensive roadside surveys. Without thisdata it is not possible to produce crucial origindestination matrices or dwell times for transportplanners. Bluetooth Beacon technology offers apossible alternative to traditional methods dueto technological development, increased userengagement as well as a unique, persistent andanonymous ID making it suitable for trackingmovement. This paper assessed the potentialthat Bluetooth data has in providing passengermovement data at a higher spatial andtemporal granularity and at a much lower costthan has previously been available.

Data and Methods

The data supplied for this paper was providedby a proximity advertising company that haveinstalled Bluetooth beacons on buses inNorwich. As the primary purpose of the data isfor providing hyper-contextual adverts and notfor to estimate the movement of patrons acrossa public transport network, a significant amountof pre-processing is required on the raw data.This processing was essential to identify outliersand erroneous data.

The raw dataset contained 236,827 interactionsbetween devices and Bluetooth beacons over a358 day study period between December 2014and November 2015. Post data processing, 708unique journeys were observed from 220distinct users over 91 buses.

Key Findings

The exploratory analysis yielded temporalmovement patterns that are in line with resultsfrom relevant literature. Across the weekSunday has the lowest count of passengerspresumably due to a reduced commuter flow,as well as shorter retail hours. Over a dailyperiod, the Bluetooth data also highlighted themorning and afternoon rush hour peaks in theweek with the peak number of trips taken laterat the weekend.

Assigning the origin and destination of eachjourney to the nearest bus stop allowed thejourney flows to be mapped spatially. Figure 1,highlights a polycentric pattern with the

majority of the flows occurring to and fromNorwich City Centre. A large proportion of theflows occur on the East side of the city and itssurrounding suburbs. It was also possible toaggregate the bus stops to specific routes toestimate the strain on the bus networks. Thecombination of these results allows transportplanners to improve the operational robustnessof the location of buses, drivers and routes tomeet future demand.

Figure 1. A network graph showing the originand destination flows at bus stop level

Value of the Research

The lack of volume of data post-processinglimited the insights that could be drawn onpassenger movements specifically to Norwich.However, the results obtained from the beacondeployment highlight the potential thatBluetooth technology has to capture themovement of individuals in a network. Thetemporal patterns observed are promising anda sound methodology was developed to assignto bus stops and line colour. This means thatdata analysis can be carried out across largerpublic transport networks around the world toprovide more meaningful insight to transportplanners. Rolling this methodology out on largernetworks would have the added advantage ofusing smart card systems as a ground truthvalue that would allow for further validation andthe penetration rate to be established.

Page 16: CONSUMER DATA RESEARCH CENTRE Masters Research ... · Sainsbury plc (Sainsbury’s) and was designed to investigate multi-channel shopping behaviour within Sainsbury’s grocery business

Customer segmentation using spatio-temporal dataEllen Talbot1, Alex Singleton1 and Dean Riddlesden2

1University of Liverpool, 2Boots

Project Background

Shopping habits are changing; when, whereand how. With data from a major High StreetRetailer’s loyalty card members, this researchproject aims to begin to understand the when.Literature already covers much of the whereand how; the digital revolution has reshapedthe landscape of the high street immeasurably,and the literature is well informed on howpeople are using the freedom the internet offersin order to shop 24 hours a day, but relativelylittle is known about when people visit stores,what influences them to shop and spend whenthey do, what for and how frequently. With afocus on big data and exploratory methods, thisproject looks for patterns without basing themwholly on existing theory, instead inferringpossible explanations from the resultsthemselves.

Data and Methods

The report finds and describes four distinctclusters of customers from a dataset ofmembers of a High Street Retailer’s loyalty cardscheme, all of which have temporal spendingdifferences, which could be applied in thebusiness to create a new customersegmentation. Using the CLARA clusteringalgorithm overcomes the complexities oftraditional data analysis and advocates the useof a data driven 4th paradigm, coping well withan extremely large set of ‘Big Data’. Itsuccessfully handled a dataset of over 150million rows by operates by considering subsetsof fixed size, sampling over the entire datasetso that time and storage requirements becomelinear in n rather than quadratic. Extracted fromeach cluster were a set of values which offeredinsight into the temporal patterns. By creating aranking tables from the computed results andjoining this with the figures creates over thethree main temporal grains; daily, weekly andmonthly; it was possible to write shortdescriptive statements analysing thecharacteristics of each cluster.

Key Findings

The Pen Profiles for each cluster consolidate theanalysis into short and descriptive statementsand were given the following ‘short andinteresting’ headings; Big Budget, Big Shop,those who spend the most money the mostregularly; Weekday Browsers, those who areretired, browsing and spending very little;

Pocket Money Pick-ups, young people who buycheap items fairly frequently; and Sun andSanta Shoppers, who show an increase inspending around seasonal events such assummer holidays and Christmas.

Table 1. Cluster summary table - proportions

Figure 1. Average hourly spending pattern bycluster

Value of the ResearchOverall this report finds that from the inclusionof temporal data into an analysis for newcustomer segmentations useful new patterns doemerge, which can be placed within the existingliterature – for example; young people tend notto spend a lot in the High Street Retailer atChristmas because they are choosing to movetheir shopping online and older people are morelikely to shop during the week because they arenot constrained by work commitments. Thisresearch has opened the door for thecontinuation of research into temporaldemographics and the scope for further studyencompasses ideas such as the inclusion ofproduct categories and store types to deepenthe understanding of temporal patterns, forexample, whether or not each cluster shops ata different store type given their needs andavailable time budget.

Page 17: CONSUMER DATA RESEARCH CENTRE Masters Research ... · Sainsbury plc (Sainsbury’s) and was designed to investigate multi-channel shopping behaviour within Sainsbury’s grocery business

Clustering Market Baskets with Bagging and Latent Dirichlet Allocation atCustomer and Transactional Levels

Mariflor Vega1, Ioanna Manolopoulou1, Ed Manley1 and Dani Theodoulou2

1University College London, 2Sainsbury’s

Project Background

This dissertation investigates the application ofLatent Dirichlet Allocation (LDA) in order tocluster market baskets at customer andtransactional levels; and introduces a bagging(or bootstrap aggregation) method to improvethe stability of topic modelling using data fromSainsbury’s. The primary aim was to develop ameans to understand the different types ofcustomers based purely on the content of theirbaskets. Analysing customer behaviours byaggregating their transactions is only possiblewhen customers swipe the loyalty card inexchange for loyalty points for the value of theirpurchases. However, 57% of transactions arerecorded without a loyalty card, preventing thecompany from having a completeunderstanding of their customers and theirdifferent behaviours. Thus, the complementaryneed of building comparisons between loyaltyand non-loyalty transactions arises in order todetermine whether both types of transactionsexhibit the same type of behaviours.

Data and Methods

Topic Models such as LDA were developed inorder to uncover the hidden topical patterns ina collection of documents. The documents aredefined as bags of words where the grammarand word order are disregarded, and wordfrequencies are document features.Implementing LDA for retail data not onlyallows us to discover interpretable topics thatcharacterise different types of market baskets,but also handles the high variety of items. Inour interpretation of topic modelling,transactions take the place of documents andthe items replace the words, where the order ofitems do not play a significant role. However,topic model inference inherently producesdifferent realizations of the underlying topicdistributions, deeming a global interpretationchallenging. We introduce a novel methodologywhich utilises Bagging in order to improvestability by identifying the topics that appearfrequently throughout multiple realizations ofLDA.

We implemented LDA and Bagging algorithmson four experiments. First, we identify types ofcustomers through aggregated loyaltytransactions. Second and third, we clusterloyalty and non-loyalty transactions

independently. Fourth, we cluster both type oftransactions in a balanced sample.Subsequently, we analysed the topics acrossthe four experiments in order to identify thetype of topics that only characterise eitherloyalty and non-loyalty transactions and theirconnection at the customer level.

Key Findings

We found a variety of topics that describe thetype of customers and transactions, frombaskets that contain fruits and vegetables tobaskets that contain confectionery and snacks.The majority of shopping behaviours exist inboth loyalty and non-loyalty transactions.However, there is a set of behaviours thatreflect almost exclusively non-loyaltybehaviours (these transactions include highproportions of tobacco sales). On the otherhand, we have not found super categories thatexclusively characterise loyalty behaviours. Theimplementation of Bagging alongside LDAretrieves seeds that generates topics thatappear more frequently throughout multipleversions of LDA. Therefore, this implementationretrieves similar topic distributions for differentruns, as opposed to different topic distributionsfor different realisations. We achieve this bycalculating the similarity between analogoustopics with and without Bagging over loyaltyand non-loyalty transactions. We observed thatfor both types of transactions, LDA withBagging retrieved 14% and 35% closer topics,concluding that Bagging improves the stabilityof LDA.

Value of the Research

This research developed a practical applicationof topic modelling in order to cluster marketbaskets that describe customer behaviours andtype of transactions. Some commercialapplications of this research might be thedevelopment of directed marketing campaignsand a recommender system. The identificationof topics that are almost exclusive of non-loyalty transactions could help the retailer tailortheir stock to meet the needs of non-cardholders. Furthermore, the research contributesto science by introducing a new method thatimproves consistency and stability on TopicModelling results.