a ride on the consumer’s rollercoaster of choices...
TRANSCRIPT
A Ride on the Consumer’s Rollercoaster of Choices:
Predicting Healthy Shopping Behavior
Linda Grondsma
14th of June 2016
2
Master Thesis
A Ride on the Consumer’s Rollercoaster of Choices:
Predicting Healthy Shopping Behavior
Date 14th of June
Name L.M. Grondsma
Student number 2228947
Educational program Master in Marketing Intelligence and Marketing Management
Department Department of Marketing
Educational institution Rijksuniversiteit Groningen
Address Peizerweg 18A,
9726 JJ Groningen
Phone number 06 37440748
Email address [email protected]
1st Supervisor Prof. Dr. ir. K. van Ittersum
2nd Supervisor Prof. Dr. T.H.A. Bijmolt
Company Plus Retail
Company supervisor A. Westerveld Msc., Consumer Insights Manager at Plus Retail
3
EXECUTIVE SUMMARY
As a result of the problem of obesity, a trend towards the promotion of healthier choices has evolved
over the past years. This problem is largely driven by overconsumption of unhealthy foods. A good
starting point to solve the problem is where people purchase their food products: grocery stores. Many
studies have been performed there on product-level, but due to recent technological developments it
now possible to take the entire shopping trip into account. Preliminary studies have shown that there
is reason to believe that the healthiness of choices across the shopping trip evolve, which are called
‘Healthy Shopping Dynamics (HSD)’. The purpose of this research is to discover how such dynamics
evolve, what influences healthy shopping baskets and whether such dynamics can be used to forecast
the healthiness of future choices. This gives many insight in customers’ healthy shopping behaviors.
To perform this research, basket-level scanner data was made available by Plus, one of the
largest grocery retailers in the Netherlands. This data was used to uncover drivers of healthy shopping
baskets and to find out what determines the healthiness of the next purchase could be performed.
The results of this study suggest that several drivers of the healthiness of shopping baskets and
of the healthiness of the next purchase can be distinguished. In particular the drivers of HSD seem to
have an impact on the healthiness levels. No effects were found for the three other drivers: general
promotions, health labels and economic health interventions. Most importantly it seems that healthy
shopping dynamics evolve first positively towards healthy behavior after the first few choices.
However, after a number of choices there is a tipping point and behavior becomes oriented towards
unhealthier choices, perhaps due to licensing effects where customers allow themselves to make
unhealthier choices if they already made healthy ones before. Moreover, the insignificance of general
promotions and health labels contradicts a great deal of existing literature.
All these findings have some specific implications for Plus, but also broader implications for
the entire grocery retailing sector. This study gives new insights in how healthy shopping dynamics
evolve over the course of a shopping trip and that analysing this type of data has a lot of potential for
future research.
Keywords: obesity, healthy shopping dynamics, healthy shopping baskets, in-store decision making,
scanner data
4
PREFACE
“What is success? I think it is a mixture of having a flair for the thing that you are doing; knowing that
is not enough, that you have got to have hard work and a certain sense of purpose”
Margaret Thatcher
When I started studying at the University of Groningen in 2011, I was not sure what degree would fit
my interests. I choose to follow the bachelor Business Economics, which is where I first got in touch
with marketing. Soon my interest for marketing began to grow and in February 2015 I started the
masters Marketing Management and Marketing Intelligence. Together, these tracks have taught me
several facets of the marketing field. I developed a passion for data analytics and am truly grateful that
I received the opportunity to put this passion into practice with this thesis. For this, I owe much to my
second supervisor prof. dr. T.H.A. Bijmolt, who suggested this project to me in the first place. I thank
both him and my first supervisor prof. dr. ir. K. van Ittersum for their thorough guidance, feedback and
their pure interest for the research I performed. Moreover, I would like to thank Plus for the
opportunity to use their data and to thank Astrid Westerveld Msc. and Marco Maatman Msc. for their
support, feedback and interesting insights. In addition to them, I also would like to thank my research
partner Rutmer Faber. Even though we wrote two different theses, I appreciate the time we spent
together to help each other when needed.
This is also a good opportunity for me to thank my parents, Klaas Jan and Sita, and my sister
Daniëlle for their ongoing support during my study time in Groningen. Thank you for always believing
in my capabilities and giving me the chance to develop myself into the person I have become. My
friends have been of great value too and I would like to thank my best friend Hetty in particular. Not
only for her help with the development of this thesis, but also for making my study time here in
Groningen unforgettable. Finally, I am grateful for all the support I have received from my boyfriend
Ricardo. Whenever I was stressed, tired or just very enthusiastic, you were always by my side.
To you who is holding this thesis now: I hope you enjoy reading it!
Linda Grondsma
Groningen, June 2016
5
TABLE OF CONTENT
EXECUTIVE SUMMARY ............................................................................................................................. 3
PREFACE .................................................................................................................................................. 4
TABLE OF CONTENT ................................................................................................................................. 5
1. INTRODUCTION ................................................................................................................................... 7
1.1 RESEARCH QUESTIONS .................................................................................................................. 8
1.2 RELEVANCE .................................................................................................................................. 10
1.3 OUTLINE ....................................................................................................................................... 11
2. THEORETICAL FRAMEWORK .............................................................................................................. 12
2.1 CONCEPTUAL FRAMEWORK ........................................................................................................ 12
2.2 HEALTHY SHOPPING DYNAMICS.................................................................................................. 13
2.2.1 HEALTH INDEX OF THE FIRST PURCHASE DECISION ............................................................. 14
2.2.2 HEALTH INDEX OF THE PREVIOUS PURCHASE DECISION ..................................................... 14
2.2.3 TREND ................................................................................................................................... 15
2.2.4 PEAKS .................................................................................................................................... 16
2.2.5 VOLATILITY ............................................................................................................................ 16
2.2.6 AVERAGE HEALTHINESS OF THE BASKET SO FAR ................................................................. 17
2.3 DRIVERS OF HSD .......................................................................................................................... 18
2.3.1 GENERAL PROMOTIONS ....................................................................................................... 18
2.3.2 HEALTH LABELS ..................................................................................................................... 18
2.3.3 ECONOMIC HEALTH INTERVENTIONS .................................................................................. 20
2.4 SELF-REGULATION THEORY ......................................................................................................... 21
2.4.1 SELF-REGULATION AND HEALTHY SHOPPING DYNAMICS .................................................... 22
2.4.2 SELF-REGULATION AND DRIVERS OF HSD ............................................................................ 23
2.5 OVERVIEW HYPOTHESES ............................................................................................................. 24
3. METHODOLOGY ................................................................................................................................. 25
3.1 DATA COLLECTION ....................................................................................................................... 25
3.2 SAMPLE AND CRITERIA SCREENING ............................................................................................ 25
3.3 OPERATIONALIZATION OF VARIABLES ........................................................................................ 26
3.3.1 HEALTH INDICES ................................................................................................................... 26
3.3.2 TREND AND VOLATILITY ....................................................................................................... 27
3.3.3 DRIVERS OF HSD ................................................................................................................... 28
3.4 RESEARCH METHOD .................................................................................................................... 29
4. RESULTS ............................................................................................................................................. 32
4.1 MODEL 1 ...................................................................................................................................... 32
6
4.1.1 EXPLORATORY ANALYSIS ...................................................................................................... 32
4.1.2 MODEL ASSUMPTIONS ......................................................................................................... 34
4.1.3 INTERPRETATION .................................................................................................................. 35
4.2 MODEL 2 ...................................................................................................................................... 38
4.2.1 EXPLORATORY ANALYSIS ...................................................................................................... 38
4.2.2 MODEL ASSUMPTIONS ......................................................................................................... 40
4.2.3 INTERPRETATION .................................................................................................................. 40
4.3.4 PREDICTIVE VALIDITY ............................................................................................................ 43
5. CONCLUSION ..................................................................................................................................... 45
5.1 DISCUSSION ................................................................................................................................. 46
5.1.1 HEALTHY SHOPPING DECISIONS ........................................................................................... 46
5.1.2 DISCOVERING PATTERNS TO FORECAST DECISIONS ............................................................ 47
5.2 LIMITATIONS AND FURTHER RESEARCH...................................................................................... 49
5.3 MANAGERIAL IMPLICATIONS ...................................................................................................... 50
5.4 FINAL CONCLUSION ..................................................................................................................... 51
REFERENCES .......................................................................................................................................... 52
APPENDICES ........................................................................................................................................... 56
APPENDIX 1: DIFFERENCES BETWEEN STORES .................................................................................. 56
APPENDIX 2: PROMOTION WEEK ...................................................................................................... 57
APPENDIX 3: MODEL ASSUMPTIONS MODEL 1 ................................................................................. 57
3.1 MULTICOLLINEARITY ............................................................................................................... 57
3.2 NORMALITY ............................................................................................................................. 57
3.3 HETEROSCEDASTICITY ............................................................................................................. 59
APPENDIX 4: MODEL ASSUMPTIONS MODEL 2 ................................................................................. 59
4.1 MULTICOLLINEARITY ............................................................................................................... 59
4.2 NORMALITY ............................................................................................................................. 59
4.3 HETEROSCEDASTICITY ............................................................................................................. 61
7
1. INTRODUCTION
Over the past years, several trends in consumer food markets have evolved. In particular, the trend
towards ‘healthy choices’ in particular has been quite substantial, as a result of a worldwide problem:
obesity. Since 1980, worldwide obesity has doubled, leading to several consequences, such as
cardiovascular diseases, diabetes, musculoskeletal disorders and multiple types of cancer (World
Health Organization, 2015). Especially in the Netherlands, the rates for obesity are shocking: 40% of
the population of 4 years and older is overweight and another 10% of the people is considered to be
obese (CBS, 2014). According to Ng et al. (2014), overweight and obesity caused 3 to 4 million deaths,
3-9% of years of life lost, and 3-8% of disability-adjusted life-years in 2010 only. On a large scale, this
problem is driven by overconsumption of unhealthy, energy-dense and nutrient-poor foods that have
high concentrations of fat, sugar, and salt (Asfaw, 2011). This identifies the need for healthier lifestyles,
which can be stimulated by grocery retailers themselves (Payne et al., 2014). In fact, many grocery
retailers aim their promotional activities towards stimulating customers to make more conscious,
healthy choices. In order to do this more effectively, it would be insightful to know how people shop
at a grocery store: how does the healthiness of customers’ shopping behavior play a role and what the
sequence is in which they buy products. Unfortunately, there is still little understanding on how this
influences the healthiness of the shopping basket.
Until now, most research in this area was limited to single-product purchases, which does not
give any indication on how customers behave throughout the supermarket (Waterlander et al., 2012;
2013). Preliminary results of a recent pilot study indicate that it is most likely that some patterns in
the way customers make healthy choices throughout the shopping trip exist (Van Ittersum and Bijmolt,
2015). Such a pattern could be called ‘Healthy shopping Dynamics’ (HSD), as it represents the dynamics
of health-levels of sequential choices that customers make during a grocery shopping trip. These
dynamics then identify the influence of previous purchases on the next purchase or average
healthiness of the complete basket, which could be influenced by several other drivers as well. To
uncover what these drivers are and how they affect customers’ choices during the shopping trip, a
study needs to be performed to find out how customers can be triggered to make healthier purchase
decisions. A lot of research has been oriented to the role of emotions (Mukhopadhyay and Johar, 2007;
Williams and DeSteno, 2008; Chen and Sengupta, 2014) and self-regulation (Baumeister and
Heatherton, 1996) which can be linked to healthy shopping decisions. These are, however, not
practical theories to change behavior, but merely describe how people make decisions. In this study,
several different drivers will be identified to build a prediction model to see why customers decide to
make a certain (un)healthy decision.
8
To observe consumer decisions in the most natural way, data from a Dutch grocery retailer will
be used. Grocery retailers are one of the most interesting sources for observing consumer behavior
for several reasons. First, even though the growth of online stores is a major trend in consumer food
markets, still around 50% of all groceries are purchased in brick-and-mortar grocery stores (Glanz et
al., 2012). Second, several other trends are ongoing, such as that the recession of the past years has
led to a cost-saving-orientation of the shopping public. As a result, shoppers’ stated priorities when
choosing their groceries are aimed towards quality, taste or price. Finally, the interest for making
healthy choices has grown among customers. However, their actual purchase behaviors do not always
seem to actually follow this state of mind.
In this study, a first step will be made to uncover such drivers of (un)healthy shopping behavior
based on real purchase data. Plus, a large Dutch grocery retailer, has made data that incorporates the
sequence in which customers do their groceries available for this research. Plus is a grocery chain with
255 supermarkets held by 218 entrepreneurs (Plus, 2016a). The fact that all supermarkets are
franchisers, makes this retailer a special case compared to other grocery retailers. The entrepreneurs
have more freedom when it comes to including for instance local products in the supermarket, while
still following the national marketing campaigns. These campaigns incorporate the four brand values
that the company stands for: Attention, Quality, Local, and Responsible (Plus, 2015). Recently, Plus
received the award for being the best supermarket in promoting Corporate Social Responsibility,
making it even more interesting to take Plus as a source to study healthy shopping behavior (GfK,
2016).
1.1 RESEARCH QUESTIONS
This paragraph lists and explains several research questions that guide this study. Even though the
importance of finding several drivers of healthy shopping behavior has been mentioned before, the
main goal of this research is to uncover what affects customers’ healthy decision-making. The HSD that
were mentioned before, could be an example of such drivers. Real-time data should uncover what
these dynamics look like in actual data, as up to now the only preliminary results that reflect this are
based on experimental data (Van Ittersum and Bijmolt, 2015). When there is an idea of how such
dynamics evolve, the effect of HSD of all shopping decisions will be tested on the average health of the
basket. Besides, the roles of emotion and self-regulation will be included in this research, as they might
explain why certain patterns exist.
Beside investigating the HSD over the whole shopping trip, the existence of other drivers of
healthy shopping behavior will be tested. So far, much research is aimed at different health
interventions that can be performed by the grocery retailer, even though little is known on which
9
methods work the best (Giesen et al, 2011; Wansink and Chandon, 2006; Waterlander et al., 2013). In
this study, two health interventions that drive healthy shopping behavior will be included: Economic
Health Interventions and Health Labels. In collaboration with Plus, two health interventions can be
studied. These are not only the specific marketing campaigns of the supermarket that can be called
‘Economic Interventions’ (the promotion period where relatively more fruits and vegetables are on
promotion), but also health labels, which are widely available on products in the Netherlands since the
introduction of ‘Het Vinkje’ in 2006 (Het Vinkje, 2016). This label is carried in the food and beverage
industry, retail and foodservice and is created to help consumers in making healthier decisions. Beside
these two, another driver of shopping behavior is included: ‘General Promotions’. Supermarkets have
different price options and many customers do their groceries in a very price conscious manner (Glanz
et al., 2012). Therefore, it is expected that such promotions probably also drive the final decision and
that price can be chosen over ‘healthiness’. This all results in the following research question and
subquestions:
1. How are healthy shopping decisions influenced during a shopping trip?
1.1 What do HSD look like in real purchases?
1.2 Do HSD affect the average healthiness of decisions throughout the shopping trip?
1.3 Do other drivers, such as health interventions and general promotions affect the average
healthiness of decisions throughout the shopping trip?
Addressing these issues will help uncovering what influences healthy choices. However, this still poses
a problem, because knowing what drives healthy choices does not necessarily mean that customers’
behaviors are always intentionally influenced. Naturally, we assume that people are rational beings
and make conscious choices. That would imply that people’s behavior is then predictable. However, a
lot of unplanned buying occurs in the supermarket, which implies that behavior is perhaps not always
rational (Gilbride et al., 2015). Therefore, a prediction model will be formed afterwards, to investigate
whether people make rational choices and if it is even useful to try and influence people when they
are tempted to buy products impulsively. This results in the second research question:
2. Can a pattern be distinguished in the scanner data that can forecast the
healthiness of customers’ purchase decisions?
10
2.1 Can the drivers of healthy shopping decisions of Model 1 be used to predict customers’
behavior?
1.2 RELEVANCE
The objective of this study is to discover whether there is such a phenomenon as HSD, how the
dynamics of choices during the shopping trip affect the healthiness average level of the basket and if
such dynamics can be used to forecast the healthiness of future choices. This will give an indication in
how grocery retailers can direct customers towards interchanging unhealthier options for healthier
alternatives. Subsequently, the effects of two types of health interventions and of general promotions
will be absorbed in the model to investigate whether these have an impact on the HSD and eventually
influence the healthiness of the current decision, which gives the possibility to predict the healthiness
level of the next purchase.
Up to now, much research has been oriented at discovering what influences the healthiness
of product-decisions customers make during the shopping trip. This has shown that for instance the
size and shape of the package of a product has an influence (Ordabayeva and Chandon, 2013; Wansink,
1996), as well as the location of healthy and unhealthy food products inside the supermarket (Desai
and Ratneshwar, 2003) and prices of these food products (An 2012; Andreyeva, Long and Brownell,
2010) influence the healthiness of single-product purchases. However, there is little to no research
performed on subsequent purchases during the shopping trip. So far, only the study of Waterlander et
al. (2013) takes the total shopping trip into account. Outcomes of this study are striking: the positive
effect of the single product-purchases may be eradicated when taken as part of a larger shopping trip.
Additionally, food labels do not have a large effect on food purchases, whereas price discounts do
encourage the purchase of healthy products. However, these price cuts do not discourage the
purchase of unhealthy foods and lead to larger end-of-trip basket. Therefore, different articles indicate
that more research is needed to unravel how pricing strategies can best be designed to result in overall
improved food purchases and what role food labels could have to reach this goal. Besides, a pilot study
by Van Ittersum and Bijmolt (2015) has shown that there is reason to believe HSD exist.
Taking the previous mentioned studies together, there is enough reason to investigate this
matter. A question remains why this has not been investigated in the past. Due to the lack of availability
of data of grocery shoppers that take the sequence of shopping into account, specific research to find
out how customers make healthy shopping decisions was simply not possible until this point. Recently,
supermarkets introduced the options for customers to skip the line and scan their groceries already
while doing them using a handscanner. This scanner saves the sequence in which the groceries are into
the basket and thus provides the academic world a grand insight in to how customers shop.
11
This research does not only contribute to the relevance from an academic perspective, but it
aims to provide added value to Plus as well. As one of the brand values of the grocery retailer is focused
on responsibility and health, it is of great importance to understand how HSD evolve during the
shopping trips and how active health interventions of the supermarket can stimulate the purchases of
healthier food products. With the current consumer trends to eat healthier and the growing problem
of obesity, more supermarkets in the Netherlands are developing campaigns that focus on healthier,
more responsible purchases (Plus, 2016a). Plus can use the results of this study to stay competitive and
continuously keep its customers satisfied.
1.3 OUTLINE
The remainder of this study has the following structure. The next chapter shows the conceptual
models, followed by a deeper look at theories from previous literature that can be linked to the
relationships of the model, resulting in numerous hypotheses. Then, chapter 3 describes the data and
methodology of this study, resulting in a model that tests the stated hypotheses. Chapter 4 discusses
the results of the model and the final chapter embodies a discussion of these results and a number of
managerial implications. Finally, limitations and guidelines for future research are discussed.
12
2. THEORETICAL FRAMEWORK
This chapter reviews existing literature to propose the underlying hypotheses to answer the research
questions. First, the conceptual model is represented in a visual way. Afterwards, numerous
paragraphs elaborate on the variables that are pointed out in the models and several hypotheses are
presented. Moreover, a number of psychological theories are linked to the relationships in the models
to give possible explanations for the relationships in the model. Finally, at the end of the chapter an
overview of all hypotheses is provided.
2.1 CONCEPTUAL FRAMEWORK
As shortly explained before, the objective of this study is to uncover the influence of health
interventions, promotions and HSD on the average health index of the basket, which is displayed in
Fig. 1 by the blue arrows. Subsequently, the aim is to uncover whether such choices are all rational and
if a prediction model can be estimated, which is visually displayed with the red variable and red arrows.
Fig 1 Conceptual Framework
13
2.2 HEALTHY SHOPPING DYNAMICS
In the first chapter HSD were already mentioned briefly, but no clear definition of this phenomenon
has been stated yet. This paragraph will first discuss different researches that already looked into
shopping dynamics in general, followed by a short elaboration on previous work in the field of healthy
food purchases, resulting in a definition of the term.
Up to now, the existence of shopping dynamics in general has been proven in different studies,
where mainly the contribution of Dhar et al. (2007) is important. They identified that customers go
through a process which is called the shopping momentum. This refers to the psychological impulse
that is provided when an initial purchase is made and that enhances the purchase of a second,
unrelated product. This theory was linked to previous work by Gollwitzer et al. (1990), which explains
the occurrence of the shopping momentum as a result of the psychological process caused by the initial
purchase. This makes the consumer move from a deliberative to an implemental mind-set, driving
subsequent purchases. Dhar et al. (2007) also describe manners in which this shopping momentum
can be interrupted.
Beside the shopping momentum, there is also an excessive amount of literature on the
phenomenon of impulsive and unplanned purchase behavior. When customers walk through a
supermarket, they are confronted with many items that they could potentially purchase, possibly
leading to unplanned buying (Gilbride et al., 2015). In this state, two types of dynamics can be
distinguished: carryover effects of earlier purchases on subsequent unplanned versus planned
purchases, and a change in the probability of making an unplanned versus a planned purchase over
the course of the shopping trip (Gilbride et al., 2015). One of the reasons that such impulsive purchases
take place is affect, or better said the mood of the consumer at the moment of making the purchase
decision (Vohs and Faber, 2007). Several other theories, both social and psychological, could underlie
these dynamics (Cannuscio et al., 2014), which will be further discussed in paragraph 2.4.
There is thus existing literature on shopping dynamics, but surprisingly little research is
performed in the area shopping dynamics when purchasing healthy food products. Until now, the only
research directed to HSD was a pilot study of Van Ittersum and Bijmolt (2015), in which 54 MTurk
participants were asked to make eleven purchase decisions. For every available product four options
and a no-purchase option were available, and a picture of the product, its price and calorie information
were provided. The results of this study lead to a pattern in which participants shop, which is visually
represented in Fig. 2. The pattern does not seem to evolve linearly, but more in ‘rollercoaster’ type of
manner.
14
Fig. 2 Healthy shopping Dynamics - results from pilot
study
Source: Van Ittersum and Bijmolt (2015)
Thus, according to this pilot study there is reason to believe that HSD exist. Along this line, the following
definition of HSD will be leading throughout this study: ‘HSD are shifts in the healthiness indices of all
combined purchase decisions throughout the shopping trip’. The following subparagraphs describe six
ways in which HSD can possibly be measured. It needs to be noted that there is a difference between
the meaning of the health index that is used in this study and healthiness. An increase in the health
index denotes a decrease in the healthiness of the basket/next purchase, since the health index is
based on the number of calories. When this number increases, the healthiness thus decreases. All
hypotheses will be stated in terms of the health index of the basket or of the next purchase.
2.2.1 HEALTH INDEX OF THE FIRST PURCHASE DECISION
The first experience in a sequence of experiences tends to have a stronger influence on the judgment
of individuals than the following experiences, due to primacy effects (Montgomery and Unnava, 2009).
An example of such effects is that when people memorize a list of words, they put greater attention
on the first words compared to the following ones, resulting in better memorization of them (Greene,
1986). In the topic of healthy choice behavior, this indicates that the healthiness of the first purchase
would have a great impact on the following shopping behavior of the customer. This should be taken
into account and results in the following hypotheses:
H1A The health index of the first purchase decision is positively related to the average health
index of the basket
H2A The health index of the first purchase decision is positively related to health index of the
next purchase
2.2.2 HEALTH INDEX OF THE PREVIOUS PURCHASE DECISION
Just as the first decision, also the last decision that was made tends to be weighted more heavily by
customers, due to so called recency effects (Greene, 1986). Sticking to the example of memorizing
words, this recency effect entails that people tend to recall items that they studied at the end more
often than those in the middle, just as they do with the first few items (Greene, 1986). Strong proof
15
for recency effects was found in an experiment by Kahneman et al. (1993). They show that people
choose to rather feel pain for a longer amount of time, provided that this experience is ended with a
pleasant feeling, instead of feeling pain for a shorter amount of time, where this pleasant part is not
present. Translating this in the context of healthy shopping behavior, this implies that customers recall
the healthiness of their most recent purchase decision more than of the decisions they made before
that, making the last purchase decision an interesting factor to take into consideration. Combining this
fact with the self-regulation theories on guilt that will be described in paragraph 2.4, it is expected that
customers will compensate the relative unhealthiness of their previous purchase with a healthier next
choice. This results in the second hypothesis for Model 2:
H2B The health index of the previous purchase decision is negatively related to the health index
of the next purchase
Besides, as compared to primacy effects, recency effects are expected to dominate when affecting the
healthiness of the current purchase decision, because recall diminishes when the time since the first
decision increases (Greene, 1986). Therefore, the third hypothesis for Model 2 is stated:
H2C The effect of the health index of the previous purchase decision on the health index of the
next purchase decision is larger than the effect of the health index of the first purchase
2.2.3 TREND
A trend of subsequent experiences can either be increasing or decreasing. Consumers usually prefer
improvement over a certain amount of time compared to decline, which is called their negative time
preference (Loewenstein and Prelec, 1993). In the case of healthy purchase behavior, it can be
concluded that customers with an improving trend of healthy choices are more likely to choose a
relatively healthy product again than customers with a more negative trend. Even though the pilot
study by Van Ittersum and Bijmolt (2015) suggests that the pattern in HSD is not linear, it is still valuable
to discover whether healthy shopping behavior improves or declines throughout the shopping trip,
which results in the next hypotheses:
H1B An improving trend of healthy choices has a positive influence on the average health index
of the basket
H2D An improving trend of healthy choices has a positive influence on the health index of the
next purchase
16
2.2.4 PEAKS
During the shopping trip, peak moments in the level of healthiness will occur. Such peaks have an
impact on later choice behavior, because the most intensive moments are remembered the best
(Montgomery and Unnava, 2009). The same holds for the reversed situation: an extreme ‘low point’ is
also remembered more. It does not matter when the healthy/unhealthy peak takes place during the
shopping trip. When a very healthy choice is made, this might strengthen the motivation to make
healthy decisions again through feelings of pride, or to do the opposite and find the justification to
choose unhealthier products, which is called licensing (Khan and Dhar, 2006; Mukhopadhyay and
Johar, 2007; Williams and DeSteno, 2008). These concepts are further elaborated upon in paragraph
2.4. Therefore, it is expected that the healthy peak has an influence in both models, but the sign is
unknown. For the unhealthy peaks, it is expected that through feeling of guilt customers will tend to
make a healthier decision afterwards (Chen and Sengupta, 2014). This concept of guilt will also be
discussed later on in paragraph 2.4. This results in the following hypotheses:
H1C Healthy peaks during the shopping trip have an influence on the average health index of
the basket
H1D Unhealthy peaks during the shopping trip have a negative influence on the average health
index of the basket
H2E Healthy peaks during the shopping trip have an influence on the health index of the next
purchase
H2F Unhealthy peaks during the shopping trip have a negative influence on the health index of
the next purchase
2.2.5 VOLATILITY
Volatility is a term that is used a lot to describe stock prices in financial markets. Stock prices tend to
vary a lot over time, and the many peaks in these patterns are considered to be volatile. In this area of
stock prices, many theories describe such volatile behaviors, of which one is called the theory of
‘random walks’. This theory implies that a series of stock price changes has no memory, meaning that
the past history of the series cannot be used to predict the future in any meaningful way (Fama, 1965).
Of course, this is a very extreme theory and many changes in stock prices can be described by current
events. Putting this in the context of shopping behavior we also see that the volatility in shopping
decisions cannot be explained conclusively. Research shows that when there were more peaks in
emotional moment-to-moment evaluations (so higher volatility) this can results in both a feeling of
excitement that leads to a positive evaluation (Teixeira et al., 2012) as well as a feeling of uncertainty
17
that results in a negative evaluation (Anderson, 2003). Thus, existing literature does not seem to give
a clear result of volatility, and applying this to healthy shopping decisions it seems logical that when
customers choose many products with very different levels of healthiness, it is harder to predict their
next move than for customers who have quite stochastic behavior. Although it is expected that there
is some influence, the next hypotheses cannot give a conclusive direction:
H1E The volatility of the health indices of previous purchases influences the average health index
of the basket
H2G The volatility of the health indices of previous purchases influences the health index of the
next purchase
2.2.6 AVERAGE HEALTHINESS OF THE BASKET SO FAR
For Model 2 the dependent variable of Model 1 is included in the model as an additional driver of the
next purchase decision. When shopping, customers have orientations that differ from one another.
Some of them might go grocery shopping and buy certain items for hedonic reasons, whereas other
customers might feel the urge to buy healthy items (Arnold and Reynolds, 2003). The overall
healthiness level of the previous purchase decisions that were made could give an indication for
people’s intention to buy healthier products. Therefore, when customers already made relatively
healthy choices overall before, they must have a higher probability of making a healthier decision again
and vice versa. Such a causality closely follows the ideas of self-regulation theory, which describes how
people set goals and how they need to control themselves in order to achieve such goals. If customers
shop for, on average, healthier groceries over the trip, this may indicate that they will choose healthier
products again. More elaboration on this and more accompanying theories will follow in paragraph
2.4. Taking everything in consideration, this results in the following hypothesis:
H2H The average health index of previous purchases has a positive influence on the health index
of the next purchase
18
2.3 DRIVERS OF HSD
Now that HSD have clearly been defined and that several indicators have been established, a deeper
look is taken into what drives customers’ healthy choice behavior. First, these could be general
promotions. Second, this could be health interventions. The importance of health interventions by
retailers or suppliers has been investigated in several studies (Giesen et al, 2011; Wansink and
Chandon, 2006; Waterlander et al., 2013). They distinguish between different methods in which
customers could be guided towards making healthier purchase-decisions. In these articles two main
groups of health interventions can be identified: health labels and economic interventions. The first
subparagraph describes the effect of general promotions, which are also expected to influence buying
behavior. The following two subparagraphs will elaborate on the before mentioned interventions.
2.3.1 GENERAL PROMOTIONS
Every week, supermarkets have different items on promotion. Such promotions have the purpose for
customers to make more unplanned purchases, and with success (Inman et al., 1990). In fact, low
cognition customers even purchase goods that are on promotion by just the look of the promotion
signal, without even checking whether there is a real price discount (Inman et al., 1990). Moreover,
research has shown that promotions can accelerate purchases in 2 ways. First, the acceleration of
customers’ purchases of the product and second the acceleration of the shopping trip to the store
(Kahn and Schmittlein, 1992). Therefore, general promotions in grocery stores do have a large impact.
However, it will also result in more unplanned behavior, having a negative impact as customers might
lose track of their shopping goal. More theories that can be linked to this are provided in paragraph
2.4. The more products on promotion are added to the basket, the higher the health index of the
basket is expected to be, thereby decreasing the healthiness. This results in the following two
hypotheses:
H1F General promotions have a positive influence on the average health index of the basket
H2I General promotions have a positive influence on the health index of the next purchase
2.3.2 HEALTH LABELS
With the growing problem of obesity, many regulatory agencies wonder whether low-fat nutrition
labels influence people’s food consumption (Wansink and Chandon, 2006). Up to now, researchers
have looked into the subject of health labels, with different results. In their work, Wansink and
Chandon (2006) found that the use of a ‘low-fat’ label has a significantly different impact on overweight
19
consumers compared to people with a normal weight. Their results show that low-fat labels lead to
overconsumption of snack foods by all consumers, but that these effects are stronger for people who
already are overweight. Besides, the presence of salient serving-size information such as ‘Contains two
Servings’ reduces overeating for people with a normal weight, but has no impact on overweight
consumers. Therefore, this research thus indicates that food labels do have an impact on consumers.
However, the people that are overweight are not paying enough attention to such health labels. To
stimulate this to a greater extent, manufacturers and retailers could consider making labels more
explicit by altering the packages or promoting these characteristics more heavily. However,
other work by Waterlander et al. (2013) on health labels and pricing strategies to influence healthy
shopping behavior gives different results. The outcomes of this study show that price effects
overshadow the effects of food labels. These food labels per se do not have any significant effect on
the purchase of healthy foods.
In this study, the effectiveness of ‘Het Vinkje’, which was shortly mentioned before, will be
investigated. There are two types of Vinkjes as shown in Fig. 3: one with a green and another with a
blue circle (Het Vinkje, 2016).
Fig. 3 Het Vinkje
The green logo indicates that the food product belongs to the healthier products of the food pyramid
and contains important nutrients that you need on a daily basis. The blue logo indicates that the
product does not belong to the food pyramid and that you should not eat this too often, but that it is
a better choice within the product category. The effectiveness of the Vinkje is shown in an internal
research by Plus (2016b), which indicates that 83% of the customers are aware of the Vinkjes and 65%
experiences it as a positive addition. The case is, however, that only 19% of the customers actively pays
attention to the Vinkjes while doing groceries.
Recently, the largest Dutch customers association ‘De Consumentenbond’ started a campaign
against this health label. According to them, Dutch consumers are not well aware of what the two
labels mean and according to them it does not result in healthier choices (Consumentenbond, 2016).
This is all based on a qualitative research among 1057 panel members. An interesting fact about these
researches about the Vinkjes is that they are based on questionnaires that were filled out by a panel.
There is, however, no known research that investigates the Vinkjes in a quantitative way. Therefore,
this study might add different insights to the effects of this health label.
Based on the previous research about the Vinkje and health labels in general, there seems to
be some inconclusiveness on the effect of health labels. Therefore, it is assumed that the health labels
themselves on the product do have an impact, but no direction of this relationship can be established.
20
This reflects the effect of the health label itself, not of the healthiness of the products that carry it. It
does not necessarily have to concern a very healthy product, but as the blue labels indicated it can also
be placed on a product that is a better choice within a relatively unhealthy product category. This
results in the following hypotheses:
H1G Health labels have an influence on the average health index of the basket
H2J Health labels have an influence on the health index of the next purchase
2.3.3 ECONOMIC HEALTH INTERVENTIONS
Mainly due to economic shocks such as a recession, falling income or dramatic increases of food prices,
purchase behavior can be influenced (Andreyeva et al., 2010). Times like these create pressure to
purchase food that is lowest in cost, making processed, unhealthier foods more attractive. In theory,
there would be two ways to deal with situations like these and stimulate healthier purchase-behavior:
either lowering prices of healthy food products (i.e. a subsidy), or raising prices of relatively unhealthy
products (i.e. a fat tax). Different studies already indicated that mainly the first intervention, a subsidy
on healthier products, could be a successful way to stimulate healthy shopping. According to An (2012),
subsidizing healthier foods tends to be effective in modifying dietary behavior. The only constraint to
this finding is that long-term effectiveness and impact on the overall diet intake are unknown.
Waterlander et al. (2012) studied the effects of price subsidies and taxes on respectively
healthy and unhealthy foods throughout the entire shopping trip. They found that price increases on
unhealthy food products up to 25% of the original price do not result in differences in healthy food
purchases. This indicates that the tendency to purchase healthier food products will only increase
when a substantial tax on unhealthy food is introduced. Besides, their results showed that price
discounts on healthy foods have two effects. On the one hand they encourage customers to purchase
healthy products. On the other hand it makes customers increase the energy of the total shopping
basket, resulting in an equally (un)healthy shopping basket. This indicates that the complete purchase
process of customers is more dynamic and is not only explained by prices.
There thus seems to be a positive influence of economic interventions when single-product
purchases are made, but no change on the healthiness of the complete basket. Therefore, it is
expected that there is some influence of economic health interventions in both models, but no clear
cut direction of that relationship. This results in the following hypotheses:
H1H Economic health interventions have an influence on the average health index of the basket
H2K Economic health interventions have an influence on the health index of the next purchase
21
2.4 SELF-REGULATION THEORY
This paragraph discusses different psychological theories that interfere with how customers make
decisions in a grocery store. This study focuses on the decisions customers make during a shopping trip
and there are many possible influences that can distract them. The paragraph continues by describing
several underlying mechanisms that possibly explain why customers make certain choices, in
combination with the variables that were described previously in the conceptual model.
When making the decision to buy a healthy product, a certain level of self-regulation is
required. Baumeister and Heatherton (1996) describe three ingredients of this self-regulation. First,
standards are important, which are ideals, goals or other conceptions of possible states. These
standards are essential, as either a dilemma of conflicting standards or even a lack of having any
obstructs effective self-regulation. Second, monitoring entails the current state of being that is
compared to the standard and loops of feedback of one’s actions which are necessary to guide an
individual to their goals. When people cease to follow their actions, they tend to lose control. The third
and last phase is called operate, which follows the second phase closely. If it turns out that the current
state is not compatible with the standards, a certain process is set in motion to change this. The first
two ingredients have been researched widely, but it is quite unsure how these processes in the last
phase actually function as they seem to be much more complex.
This self-regulation resource is, however, limited and can thus be depleted (Baumeister and
Heatherton, 1996). An individual’s capacity to self-regulate is limited, as someone simply cannot
regulate everything at once, which of course differs on a case to case basis due to individual
differences. Besides, a person can become exhausted from making many simultaneous demands and
can therefore sometimes fail at self-control of choices they would normally succeed in. Moreover, the
self-regulation muscle can be trained in order to make it stronger and the more this is done, the easier
it becomes to self-regulate. Multiple studies have already shown the effects of certain stimulants to
boost this muscle. For example, in an experiment by Tice et al. (2007) people made an initial act of self-
regulation. After being shown a comedy video or given a surprise gift, their self-regulatory resource
was recharged, whereas people who did not experience these events showed resource depletion.
Another example was shown in research by Gailliot et al. (2007), whose results showed that acts of
self-control would reduce blood glucose levels in the body, resulting in poor performance on self-
control tasks. They found that consuming a glucose drink would restore these values and self-control
performance would improve. These examples indicate that self-resource depletion can be overcome,
posing an opportunity for grocery retailers to help customers achieve this.
22
2.4.1 SELF-REGULATION AND HEALTHY SHOPPING DYNAMICS
Some of the theories described in the previous paragraph can be closely linked to the decision process
that is researched in this study: is an individual going to choose something they might like better in
terms of taste, but is unhealthy, or can they regulate their actions and choose for healthier products?
It goes without saying that customers need to have the internal goal to make such decisions. When
customers simply do not care about eating healthier, this theory cannot be applied.
Since the publication of the article of Baumeister and Heatherton (1996), many researchers
followed this up with studies on self-regulation theory and depletion of the self-regulatory resources.
In more recent work, an opposite phenomenon is found with regard to the self-regulation resource.
When two consecutive self-regulatory situations require similar control processes, the self-regulation
resource does not get exhausted, but in fact enhances (DeWitte et al., 2009).
Beside exercising this control, customers also feel more subjective emotions during the
shopping trip. Different concepts such as licensing, pride and guilt can be linked to the process that is
captured within HSD. These three concepts all have their influence in different ways, but affect the
choice for healthy products in a positive way. First, licensing occurs when “a prior, virtuous intent
boosts people’s self-concepts, thus reducing negative self-attributions associated with the purchase of
relative luxuries” (Khan and Dhar, 2006, p. 256). In the current research context, this means that when
customers have to motivation to make a healthy choice, this boosts their self-concepts and results in
a them feeling that it is justified to make a second, unhealthier choice. Second, pride can play a part
within the shopping trip. When a consumer made the choice to purchase a healthy product instead of
a relatively unhealthy one, he or she resisted and facilitated self-regulation, which will give a sense of
pride (Mukhopadhyay and Johar, 2007; Williams and DeSteno, 2008). Third, guilt plays its part too
when customers are unable to resist the temptation of buying a relatively unhealthy product, making
them feel more motivated to continue their shopping trip with the purchase of a healthier product
(Chen and Sengupta, 2014).
In addition to these three concepts that boost healthy shopping behavior, there is one final
concept that needs to be discussed briefly. It is possible that customers continue to buy unhealthy
foods after their first failure, which is called What-The-Hell (Cochran and Tesser, 1996). The name itself
already reflects that this behavior cannot be explained by any of the previous theories and is thus
merely observed. All of these concepts will most likely be identified within the data that is made
available by Plus.
Decisions during the shopping process are sometimes rational, but as a result of distraction it
is possible that unplanned or impulsive buying occurs. A lot of research has been done in the field of
unplanned buying. Gilbride et al. (2015) investigate how unplanned versus planned purchases are
23
determined by elements of the current trip and previous shopping trips. Their findings indicate that
the probability for unplanned behavior increases as the shopping trip continues.
Since even research does not seem to agree on how self-regulatory processes develop during
a series of choices, it is hard to predict HSD. In the next subparagraph, a closer look will be taken into
the interconnection of self-regulation theory and drivers of HSD.
2.4.2 SELF-REGULATION AND DRIVERS OF HSD
It is also a possibility to connect self-regulation theory to health interventions, as these
interventions aim to influence customers’ choice behavior and general promotions. Not a lot of
research within self-regulation is performed with regard to promotions within the store. At the start
of their shopping trip, most customers have some idea of what they want to buy, but shopping goals
might be a bit fuzzy (Lee and Ariely, 2006). As the trip proceeds, these goals become clearer. However,
promotions influence customers spending more when their goals are less concrete compared to
customers with less fuzzy goals (Lee and Ariely, 2006). In general, promotions already seem to
influence behavior more than health labels (Waterlander, 2013). Thus the influence of promotions is
expected to be larger than the influence of health labels in influencing healthy shopping behavior,
resulting in the last hypotheses:
H3A The effect of (healthy) promotions is larger than the effect of health labels in influencing
healthy shopping decisions in Model 1
H3B The effect of (healthy) promotions is larger than the effect of health labels in influencing
healthy shopping decisions in Model 2
The concepts of self-regulation that have been described before can be linked to these three drivers
of HSD as well. Licensing (Khan and Dhar, 2006), pride (Mukhopadhyay and Johar, 2007; Williams and
DeSteno, 2008) and guilt (Chen and Sengupta, 2014) might eradicate the mistake to buy a relatively
unhealthy product in the beginning of the trip, and as consumer goals become clearer, customers’
decisions might become less ambiguous. These two theories can thereby strengthen each other. This
is especially applicable to the increased unplanned buying behavior as a result of general promotions
throughout the store.
24
2.5 OVERVIEW HYPOTHESES
Table 1 Overview of all hypotheses
Hypotheses +/-
H1A The health index of the first purchase decision is positively related to the average health index of the basket
+
H1B An improving trend of healthy choices has a positive influence on the average health index of the basket
+
H1C Healthy peaks during the shopping trip have an influence on the average health index of the basket +/-
H1D Unhealthy peaks during the shopping trip have a negative influence on the average health index of the basket
-
H1E The volatility of the health indices of previous purchases influences the average health index of the basket
+/-
H1F General promotions have a positive influence on the average health index of the basket +
H1G Health labels have an influence on the average health index of the basket +/-
H1H Economic health interventions have an influence on the average health index of the basket +/-
H2A The health index of the first purchase decision is positively related to health index of the next purchase
+
H2B The health index of the previous purchase decision is negatively related to the health index of the next purchase
-
H2C The effect of the health index of the previous purchase decision on the health index of the next purchase decision is larger than the effect of the health index of the first purchase
H2D An improving trend of healthy choices has a positive influence on the health index of the next purchase
+
H2E Healthy peaks during the shopping trip have an influence on the health index of the next purchase +/-
H2F Unhealthy peaks during the shopping trip have a negative influence on the health index of the next purchase
-
H2G The volatility of the health indices of previous purchases influences the health index of the next purchase
+/-
H2H The average health index of previous purchases has a positive influence on the health index of the next purchase
+
H2I General promotions have a positive influence on the health index of the next purchase +
H2J Health labels have an influence on the health index of the next purchase +/-
H2K Economic health interventions have an influence on the health index of the next purchase +/-
H3A The effect of (healthy) promotions is larger than the effect of health labels in influencing healthy shopping decisions in Model 1
H3B The effect of (healthy) promotions is larger than the effect of health labels in influencing healthy shopping decisions in Model 2
25
3. METHODOLOGY
3.1 DATA COLLECTION
The data used in this study is made available by Plus, one of the largest Dutch grocery retailers. Since
the introduction of their new store concept a few years ago, Plus provided the availability of self-
scanning devices for customers in many grocery stores. What makes this data unique from regular
scanner data is that it saves the sequence in which customers scanned the items as they walked
through the grocery store. In this manner, the way customers grocery shop and the sequence in which
they make decisions can be traced precisely.
The data was collected in January and February of 2016 and contained one pre-promotion
week, one promotion week and one post-promotion week for three different Plus stores. These three
different stores are similar in size, but differ in the type of neighbourhood where they are located. To
test whether there are any differences between supermarkets, a one-way ANOVA and three
regressions were performed (Appendix 1). These tests indicate that there are no major differences
between the three stores. Therefore, the customers that form three groups divided over the different
stores are combined in one data file to look at the total customer database available for this research.
The three weeks for which there is data available (i.e. the pre-promotion, promotion and post-
promotion week) were used to test the previously stated hypotheses.
In addition to this scanner data, the internal data of Plus on the nutritional information of their
products was also used to create insights about the healthiness phenomena. The nutritional
information was, however, far from complete and thus calorie information for all missing products had
to be found on product catalogues from other retailers, blogs and articles available online. When this
file was complete, it was merged with the scanner data file. Afterwards, the calories were recalculated
and transformed into indices. Paragraph 3.3 provides more detail on the operationalization of the
healthiness of the products.
3.2 SAMPLE AND CRITERIA SCREENING
The scanner-data initially contained 47.082 customers, with shopping baskets ranging from a few to
hundreds of products. As stated before, HSD are expected to arise when customers make several
decisions when walking through the supermarket. Therefore, very short shopping trips that include
only a few choices are not very likely to capture these dynamics to the same extent as the longer trips.
To solve this, all shopping baskets were screened for two criteria:
26
1. HSD are expected to be found with different choices. When a customer chose the same
product a number of times in a row, this was observed as one choice they made. The
aggregation of all baskets resulted in smaller baskets for almost all customers;
2. These smaller baskets were then categorized in remaining sizes. All baskets that contained less
than 10 different products were deleted from the dataset. This way the dataset is based on
customers with shopping baskets that were formed through a substantial amount of choices.
As a result, the remaining baskets all contained at least 10 products without any duplicates.
Subsequently, due to issues with the missing nutritional information, it was necessary to take
subsamples from every store for every week. Samples of 300 customers were drawn, which resulted
in a final sample of 2.700 customers.
3.3 OPERATIONALIZATION OF VARIABLES
In this sub-paragraph, the operationalization of several of variables in the model will be described.
3.3.1 HEALTH INDICES
To measure the healthiness of purchase decisions, the nutritional information of the products that
customers bought is used. For customers, this information is available at the back of each product in
the supermarket, containing information about for instance sugar, calories, salt and carbs. The
importance of such information becomes clear in the research by Burton et al. (2006), where
customers filled out a survey and participated in an experiment that aimed to uncover how well
customers are aware of the amounts of fat and calories. Their results specify that a shocking amount
of customers is not aware of the high amount of calories and fat in the food they consume. This
illustrates the added value of nutritional information, as it can have a positive impact on public health
and should therefore not be overlooked. Other researches also uncover the important added value
that calorie information has on customers’ awareness of the (un)healthiness of the food they consume
as well (Giesen et al., 2011). In this study, the number of calories that a product contains were used as
an indicator of the healthiness of that product. The amount of calories per 100 grams that is always
provided on the package of the product was used. Not only because this gives an equal idea of the
relative (un)healthiness of products with in their category, but also because this is the information that
customers have available when shopping at the grocery store.
In this study, nutritional information is used to form a ‘health index’ based on the relative
healthiness of every product within the entire product category, which leads to certain health scores
per product. An index of 1 implies that the healthiness of the product is average for the given product
27
category, an index <1 implies that the product is relatively healthy, and an index >1 implies that it is
relatively unhealthy. Besides, the average healthiness of each product category is calculated by taking
the average amount of calories of each product category, and creating health indices for every product
category. Then the health index of each product is multiplied by the health index of the product
category it belongs to. This operationalizes what healthy choices are and how they evolve over a
shopping trip. To illustrate this a bit more clearly, the process of creating the health indices is described
with the following formulas in two steps:
1. Average number of calories of the product category =Total number of calories within the product category
Total number of products in the category
2. Health Index of the product = Number of calories of Product J
Average number of calories of the product category (1)
The second formula shows the final health index that was used for each product. This health index
adjusts the healthiness of the product for the healthiness of the product category it belongs to.
Moreover, the calculated health indices under (2) were used as a basis to operationalize the
variables for the primacy and recency effects and the healthy and unhealthy peaks:
- Primacy effect = First adjusted Health Index (2) for each customer i
- Recency effect = Previous adjusted Health Index (2) for each customer i
- Healthy peak = Lowest/minimum adjusted Health Index (2) for each customer i
- Unhealthy peak = Highest/maximum adjusted Health Index (2) for each customer i
Finally, the dependent variable ‘average health index’ indicates whether the customer chose relatively
more healthier or unhealthier products during the trip. This variable is calculated by
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐻𝑒𝑎𝑙𝑡ℎ 𝐼𝑛𝑑𝑒𝑥𝑖 = 𝑇𝑜𝑡𝑎𝑙 𝑠𝑢𝑚 𝑜𝑓 ℎ𝑒𝑎𝑙𝑡ℎ 𝑖𝑛𝑑𝑖𝑐𝑒𝑠 𝑜𝑓 𝑎𝑙𝑙 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 𝑐ℎ𝑜𝑖𝑐𝑒𝑠 𝑏𝑦 𝑐𝑢𝑠𝑡𝑜𝑚𝑒𝑟 𝑖
𝑇𝑜𝑡𝑎𝑙 𝑎𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑠 𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒𝑑 𝑏𝑦 𝑐𝑢𝑠𝑡𝑜𝑚𝑒𝑟 𝑖
3.3.2 TREND AND VOLATILITY
For every customer in the dataset, a trend line is calculated which describes the linear trend for the
health indices (HI) for all t moments during the shopping trip:
HI = ß0 + ß1 t + Ɛt (1)
As a result, Eq. (1) is estimated separately for each customer. The ß1 estimator reflects the linear trend.
In their study, Shehu et al. (2016) calculated the trend line in a similar way. They calculated volatility
28
as well and their method for calculating this will be closely followed in this study. In order to find out
how volatile the health indices are from choice to choice, the autocorrelation of the error terms of Eq.
(1) can be used. Autocorrelation can arise in two ways: positive or negative. Positive autocorrelation
refers to the occurrence of residuals in t that have the same sign as the residual in t-1. Negative
autocorrelation shows a pattern of positive and negative values for the residuals compared to the
trend line interchanging (Leeflang et al., 2015). In the case of health indices, it is expected that healthy
and unhealthy choices will interchange a lot, resulting in a curved patterns, which was shown before
in Fig. 2. Therefore, negative autocorrelation would reflect this the best way. As a measure to evaluate
this autocorrelation, the Durbin-Watson statistic is used. This statistic ranges from 0 to 4, where a value
close to 0 indicates positive autocorrelation, close to 2 indicates non-autocorrelation and close to 4
indicates negative autocorrelation (Shehu et al., 2016). Therefore, a higher value of the Durbin-Watson
statistic indicates higher volatility. When following the method performed in the study by Shehu et al.
(2016), it becomes clear that in this dataset the average Durbin-Watson statistic is 1,6812, which
indicates that most of the patterns show slight positive autocorrelation. This implies that most cases
do not show strong variability.
3.3.3 DRIVERS OF HSD
The drivers of HSD are operationalized in the following way. The Economic Health Intervention of Plus
is indicated in the data by a dummy variable (0 indicates a ‘regular week’, 1 indicates the promotion
period). In Appendix 2, an example of what this promotion looks like is displayed. Within the data, for
the three stores two weeks are non-promotion and one week is during the health-focused promotion
period. Besides, the Health Labels that are used in this study, ‘Vinkjes’, are simply summed up per
customer. Afterwards, the average amount of food products with such a health label is calculated, to
give insight in to when customers buy many of little products with a health label and what effects this
has. This is divided by the amount of food products, because health labels are cannot be placed on
non-food products. Finally, similar to the method used for the Health Labels, the amount of general
promotions purchased by a customer is also summed up and subsequently the average is calculated
to provide additional insights. For promotions, the total amount of products bought is considered,
since general promotions can also include non-food products. The following two formulas were used
to calculate the proportion of health labels and promotions purchased:
- 𝑃𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 𝑜𝑓 𝐻𝑒𝑎𝑙𝑡ℎ 𝐿𝑎𝑏𝑒𝑙𝑠𝑖 = 𝑇𝑜𝑡𝑎𝑙 𝑎𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑠 𝑤𝑖𝑡ℎ 𝑎 𝐻𝑒𝑎𝑙𝑡ℎ 𝐿𝑎𝑏𝑒𝑙 𝑏𝑜𝑢𝑔ℎ𝑡 𝑏𝑦 𝑐𝑢𝑠𝑡𝑜𝑚𝑒𝑟 𝑖
𝑇𝑜𝑡𝑎𝑙 𝑎𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑓𝑜𝑜𝑑 𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑠 𝑏𝑜𝑢𝑔ℎ𝑡 𝑏𝑦 𝑐𝑢𝑠𝑡𝑜𝑚𝑒𝑟 𝑖
- 𝑃𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 𝑜𝑓 𝑃𝑟𝑜𝑚𝑜𝑡𝑖𝑜𝑛𝑠𝑖 = 𝑇𝑜𝑡𝑎𝑙 𝑎𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑠 𝑜𝑛 𝑝𝑟𝑜𝑚𝑜𝑡𝑖𝑜𝑛 𝑏𝑜𝑢𝑔ℎ𝑡 𝑏𝑦 𝑐𝑢𝑠𝑡𝑜𝑚𝑒𝑟 𝑖
𝑇𝑜𝑡𝑎𝑙 𝑎𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑠 𝑏𝑜𝑢𝑔ℎ𝑡 𝑏𝑦 𝑐𝑢𝑠𝑡𝑜𝑚𝑒𝑟 𝑖
29
3.4 RESEARCH METHOD
Since the data used for this study is of a numerical form, the research methods used are quantitative.
As stated before, there are two objectives to this research. The first objective is testing how healthy
shopping decisions are influenced during the shopping trip. This is Model 1, for which the drivers were
stated in the first conceptual model. The equation below describes the regression model for the first
model. It needs to be mentioned that the Durbin-Watson statistic will only be included in the
regression when the average value indicates that there is in fact autocorrelation. This is tested in the
next chapter.
Model 1
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐻𝐼𝑖𝑗 = 𝛽0 + 𝛽1 𝐹𝑖𝑟𝑠𝑡𝑖𝑗 + 𝛽2 𝐻𝑒𝑎𝑙𝑡ℎ𝑃𝑒𝑎𝑘𝑖 + 𝛽3 𝑈𝑛ℎ𝑒𝑎𝑙𝑡ℎ𝑃𝑒𝑎𝑘𝑖 +
𝛽4 𝑉𝑜𝑙𝑎𝑡𝑖𝑙𝑖𝑡𝑦𝑖 + 𝛽5 𝑇𝑟𝑒𝑛𝑑𝑖 + 𝛽6 𝐺𝑒𝑛𝑒𝑟𝑎𝑙 𝑃𝑟𝑜𝑚𝑜𝑡𝑖𝑜𝑛𝑖 + 𝛽7 𝐻𝑒𝑎𝑙𝑡ℎ 𝐿𝑎𝑏𝑒𝑙𝑠𝑖 +
𝛽8 𝐸𝑐𝑜𝑛𝑜𝑚𝑖𝑐 𝐻𝑒𝑎𝑙𝑡ℎ 𝐼𝑛𝑡𝑒𝑟𝑣𝑒𝑛𝑡𝑖𝑜𝑛 + 𝜀𝑖𝑗
Where
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐻𝐼𝑖𝑗 = The average health index of product j for customer i of all
products in the final basket, which is calculated by
𝐻𝐼1+ 𝐻𝐼2+ 𝐻𝐼3+⋯+ 𝐻𝐼𝑗−1
𝑗−1 for each customer
𝐹𝑖𝑟𝑠𝑡𝑖𝑗 = The health index of the first product j chosen by customer i
𝐻𝑒𝑎𝑙𝑡ℎ𝑃𝑒𝑎𝑘𝑖 = The healthy peak in one of the health index of the products
in the basket of customer i
𝑈𝑛ℎ𝑒𝑎𝑙𝑡ℎ𝑃𝑒𝑎𝑘𝑖 = The unhealthy peak in one of the health index of the
products in the basket of customer i
𝑉𝑜𝑙𝑎𝑡𝑖𝑙𝑖𝑡𝑦𝑖 = The variability of the health indices in all the choices made
by customer i
𝑇𝑟𝑒𝑛𝑑𝑖 = The slope of trend line of all health indices of the products
bought by customer i
𝐺𝑒𝑛𝑒𝑟𝑎𝑙 𝑃𝑟𝑜𝑚𝑜𝑡𝑖𝑜𝑛𝑖 = The amount of products on promotion bought by customer i
𝐻𝑒𝑎𝑙𝑡ℎ 𝐿𝑎𝑏𝑒𝑙𝑠𝑖 = The amount of products containing a health label bought
by customer i
𝐸𝑐𝑜𝑛𝑜𝑚𝑖𝑐 𝐻𝑒𝑎𝑙𝑡ℎ 𝐼𝑛𝑡𝑒𝑟𝑣𝑒𝑛𝑡𝑖𝑜𝑛𝑖 = Dummy (0/1) that indicates whether the products were
bought during the promotion or not by customer i
30
Model 2 will be presented after an explanation of the operationalization of the second dependent
variable: health index of the next purchase. The objective of Model 2 is to find out whether a pattern
can be distinguished in the healthy shopping decisions customers make while shopping for groceries.
One product category will be taken into account to test whether it is possible to build a prediction
model that forecasts how relatively healthy the next purchase decision in this category will be. In
theory, this could be attempted for all product categories in the supermarket. However, to try whether
it works, it will first only be built for one category. All categories have different healthiness scores
relative to the other categories, and some of them have larger differences between the health scores
of products within the category than others. In this study, the product category ‘dairy’ is chosen to test
the prediction model, for the following reasons:
1. When looking at the store plan of the average Plus supermarket, it becomes evident that the
dairy section is situated in the second part of the store. This means that customers already
made a number of choices before getting to this section, which provides the basis for building
a prediction model;
2. Dairy is usually perceived as quite a healthy product, as it contains calcium which is good for
the body and can prevent for instance osteoporosis at a later age (Voedingscentrum, 2016).
The data also shows that compared to other product categories, dairy is a relatively very
healthy category, with a health index far below one (0,3933). As this study investigates healthy
shopping behavior, it makes sense to choose a category that is healthy and to see whether the
healthy choices can be predicted;
3. Even though that on average the dairy section contains relatively healthy products, within the
category itself the decisions can vary from very healthy (Lowest Health Index = 0,130) to very
unhealthy (Highest Health Index = 5,547). Therefore, even while the decision to buy a dairy
product per se is healthy, within the section the choice can still be possibly unhealthy.
Since the dairy section offers a number of interesting factors, this category is chosen to build the
prediction model. In order to do this, the dataset had to be altered. All customers who did not buy any
dairy products were excluded, leaving a dataset containing 2003 customers. This means that all
customers are included: not only the ones that only made a few decisions before arriving at the dairy
section, but also customers who made more than fifty decisions. The dataset was split at the first dairy
purchase made by the customers, deleting all following product choices. The first dairy purchase
became the dependent variable. For the prediction part of the analysis, this variable was excluded
from the dataset and the remaining variables were used to predict the health indices of the first dairy
31
products. These results were compared to the actual results and a naïve model and several tests were
performed to show the predictive validity of the model.
Model 2
As the conceptual model in paragraph 2.1 described already, one variable will be added to capture the
recency effect: ‘Previous’. Another variable that is added to the model is the ‘average HI of the basket
so far’. Model 1 investigated the drivers of this variable. This results in the following final outline for
Model 2:
𝐻𝐼𝑖𝐽 = 𝛽0 + 𝛽1 𝐹𝑖𝑟𝑠𝑡𝑖𝑗 + 𝛽2 𝑃𝑟𝑒𝑣𝑖𝑜𝑢𝑠𝑖𝑗 + 𝛽3 𝑃𝑜𝑠𝑃𝑒𝑎𝑘𝑖 + 𝛽4 𝑁𝑒𝑔𝑃𝑒𝑎𝑘𝑖 + 𝛽5 𝑉𝑜𝑙𝑎𝑡𝑖𝑙𝑖𝑡𝑦𝑖 +
𝛽6 𝑇𝑟𝑒𝑛𝑑 + 𝛽7 𝐺𝑒𝑛𝑒𝑟𝑎𝑙 𝑃𝑟𝑜𝑚𝑜𝑡𝑖𝑜𝑛𝑖 + 𝛽8 𝐻𝑒𝑎𝑙𝑡ℎ 𝐿𝑎𝑏𝑒𝑙𝑠𝑖 +
𝛽9 𝐸𝑐𝑜𝑛𝑜𝑚𝑖𝑐 𝐻𝑒𝑎𝑙𝑡ℎ 𝐼𝑛𝑡𝑒𝑟𝑣𝑒𝑛𝑡𝑖𝑜𝑛 + 𝛽10 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐻𝑒𝑎𝑙𝑡ℎ𝑖𝑛𝑒𝑠𝑠 𝐵𝑎𝑠𝑘𝑒𝑡𝑖 + 𝜀𝑖𝑗
𝐻𝐼𝑖𝐽 = Health index of the next product purchased J
𝐹𝑖𝑟𝑠𝑡𝑖𝑗 = The health index of the first product j chosen by customer i
𝑃𝑟𝑒𝑣𝑖𝑜𝑢𝑠𝑖𝑗 = The health index of the previous product j chosen by
customer i
𝐻𝑒𝑎𝑙𝑡ℎ𝑃𝑒𝑎𝑘𝑖 = The healthy peak in one of the health index of the products
in the basket of customer i
𝑈𝑛ℎ𝑒𝑎𝑙𝑡ℎ𝑃𝑒𝑎𝑘𝑖 = The unhealthy peak in one of the health index of the
products in the basket of customer i
𝑉𝑜𝑙𝑎𝑡𝑖𝑙𝑖𝑡𝑦𝑖 = The variability of the health indices in all the choices made
by customer i
𝑇𝑟𝑒𝑛𝑑𝑖 = The slope of trend line of all health indices of the products
bought by customer i
𝐺𝑒𝑛𝑒𝑟𝑎𝑙 𝑃𝑟𝑜𝑚𝑜𝑡𝑖𝑜𝑛𝑖 = The amount of products on promotion bought by customer i
𝐻𝑒𝑎𝑙𝑡ℎ 𝐿𝑎𝑏𝑒𝑙𝑠𝑖 = The amount of products containing a health label bought
by customer i
𝐸𝑐𝑜𝑛𝑜𝑚𝑖𝑐 𝐻𝑒𝑎𝑙𝑡ℎ 𝐼𝑛𝑡𝑒𝑟𝑣𝑒𝑛𝑡𝑖𝑜𝑛𝑖 = Dummy (0/1) that indicates whether the products were
bought during the promotion week or not by customer i
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐻𝑒𝑎𝑙𝑡ℎ𝑖𝑛𝑒𝑠𝑠 𝐵𝑎𝑠𝑘𝑒𝑡𝑖 = The average healthiness of the basket up to now compiled
by customer i
32
4. RESULTS
4.1 MODEL 1
4.1.1 EXPLORATORY ANALYSIS
In order to obtain some preliminary insights about the data, a correlation matrix was created for all
the independent and dependent variables of Model 1. As Pearson’s correlation matrix in Table 2 shows,
there are multiple significant correlations between the variables. A possible explanation for these
correlations is that the first 6 variables in the list (Health Index, Volatility, HealthPeak, UnhealthPeak,
First and Trend) are all based on the health index. This seems to be quite a plausible explanation, as
the general promotions and health labels do not seem to correlate with the health index. Whether
these correlations result in problems such as multicollinearity is tested in paragraph 4.2.
Table 2 Pearson’s Correlation matrix Model 1
Average Health Index
First HealthPeak UnhealthPeak Volatility Trend Promotions Health Labels
Economic Health Intervention
Average Health Index 1 ,176** ,197** ,409** -,063** ,032 ,072** ,126** ,046*
First ,176** 1 ,033 ,088** -,095** -,083** -,020 -,002 -,008
HealthPeak ,197** ,033 1 -,088** ,135** ,071** ,004 -,026 ,004
UnhealthPeak ,409** ,088** -,088** 1 -,044* ,015 -,056** ,072** ,046*
Volatility -,063** -,095** ,135** -,044* 1 -,017 -,040* -,016 -,025
Trend ,032 -,083** ,071** ,015 -,017 1 ,033 ,023 ,019
Promotions ,072** -,020 ,004 -,056** -,040* ,033 1 ,015 ,127**
Health Labels ,126** -,002 -,026 ,072** -,016 ,023 ,015 1 -,032
Economic Health Intervention
,046* -,008 ,004 ,046* -,025 ,019 ,127** -,032 1
* indicates that the correlation is significant at the 5% level (2-tailed) ** indicates that the correlation is significant at the1% level (2-tailed)
Subsequently, the HSD that were mentioned before can be visualized since all the data is available. In
Fig. 4 below, the development of healthiness throughout the shopping trip is visualized. These are the
aggregated shopping decisions of all customers. Therefore, choice 1 is an aggregate of the relative
healthiness of product choice 1 for every customer. The floating trend line shows a fluctuating pattern.
The index is corrected for the average healthiness of each product category, and therefore it does not
33
matter in which sequence the categories are visited. The average amount of products purchased
during all shopping trips is 18 items. Therefore, only these and an additional number of items in the
sequence are displayed in Fig. 4.
Fig. 4 Healthy Shopping Dynamics
Note: <1 = relatively healthy, 1= relatively neutral and >1 = relatively unhealthy.
The trend line resembles the ‘rollercoaster’ pattern that was mentioned before. It shows that
customers usually start their shopping trip with a relatively unhealthy trend. Afterwards, from item 5
until item 17, customers seem to pick products that are relatively healthy (<1) for a very long time.
Then, it seems that the pattern changes and trend of the shopping behavior becomes quite unhealthy.
Given that this graph was made with the shopping behavior of approximately 2.700 customers, it can
be said that this pattern is quite robust. The fluctuations in the figure prove that HSD can be
distinguished in real purchases. If there was a flat line around 1, this would indicate that customers all
shop very differently and that the average sequence of all customers would add up the same for each
point in the trip. The fact that there are a lot of fluctuations throughout the shopping trip, indicates
that HSD do evolve.
Finally, it might be interesting to find out whether the economic health intervention leads to
any differences in healthiness of customers’ baskets at all. Of course this effect is included in the
regression model, but it is interesting to gain some extra insights into this promotional campaign.
34
Table 3 Independent Samples T-test for economic health intervention
Economic Health Interventions
Economic Health Intervention Mean St. Deviation
Average HI 0 ,9957 ,17311
1 1,0123 ,18492
Independent Samples T-Test
Levene’s Test for Equality of Variances
F Significance T-value DF Significance
Equal variances not
assumed (α < 0,10) ,001 ,076* -2,244 1694,77 0,025
As Table 3 shows, the difference between the shopping weeks is significantly different. What is striking,
is that the average health index of the baskets is lower in weeks where there is no economic health
intervention, and thus the baskets are healthier in those weeks. This is not the expected effect,
considering that the promotional weeks are focusing on getting customers to choose healthier
products. Whether this effect will also come through in the regression analysis, is identified in
paragraph 4.1.3.
Before finding out whether any of the drivers are statistically significant and interpreting the
beta’s of the variables of the regression model, the model assumptions need to be checked. In the case
that there are violations of these assumptions, remedies might change the betas and their
corresponding levels of significance (Leeflang et al., 2015).
4.1.2 MODEL ASSUMPTIONS
Model assumptions about multicollinearity, normality, autocorrelation and heteroscedasticity were
tested. First of all, multicollinearity is no issue in this model, as VIF scores range from 1,005 to 1,057
and tolerance levels from 0,947 to 0,995 (Appendix 3.1). According to Leeflang et al. (2015), VIF scores
of >5 and tolerance levels <0,2 indicate problems with multicollinearity, which is not the case here.
Second, possible problems with normality were tested for the unstandardized residuals. Here,
the null hypothesis states that the error terms are normally distributed (Leeflang et al. 2015). As a
result, both the Kolmogorov-Smirnov test as the Shapiro-Wilk test indicated that with levels of
significance of p = 0,000 the null-hypothesis is rejected (Appendix 3.2). To account for this lack in
normality, the same regression was performed using a bootstrapping method, performed 1.000 times.
The results of this bootstrap are practically identical to the results of the first regression, which
indicates that the output of this regression can be interpreted without problems (Appendix 3.2). The
35
plot of the unstandardized residuals of that model is visualized, which indicates that these residuals
indeed seem to be normally distributed (Appendix 3.2).
Third, to test for autocorrelation the Durbin-Watson statistic was calculated once again. This
result (DW = 1,991) indicates that there is no reason to assume that autocorrelation plays a part, as a
value around 2 indicates that there is no autocorrelation (Leeflang et al., 2015).
Finally, heteroscedasticity was tested . In order to do so, the differences between the healthy
promotion weeks and regular weeks was taken into account. Levene’s test for equality of variances
was performed, where the null hypothesis states that there is equality of variances and thus
homoscedasticity. In case this hypothesis is rejected, there is a problem with the residuals. Since this
is not the case, this problem does not exist in this model (Appendix 3.3).
Since none of the model assumptions were violated in a sense that the regression output
changed, the results can be interpreted without re-estimating the model.
4.1.3 INTERPRETATION
In this paragraph the regression model results (Table 4) can be interpreted.
Table 4 Results Regression Model 1
Model Statistics
Model F-value 533,038 R2 0,613
Model Significance 0,001** Adjusted R2 0,612
Regression Output
Beta Std. Error T-value Significance
Constant ,654 ,010 63,705 ,001**
First HI ,043 ,004 9,602 ,001**
Healthy Peak ,305 ,011 27,012 ,001**
Unhealthy Peak ,075 ,001 57,379 ,001**
Volatility ,018 ,004 4,403 ,001**
Trend ,360 ,061 5,897 ,001**
Promotions ,017 ,021 ,837 ,403
Health Labels ,021 ,022 ,967 ,334
Economic Health Intervention ,001 ,005 ,322 ,747
** indicates significance at the 1% level (2-tailed)
The results indicate that not only the model as a whole is statistically significant (F = 533,038, p = 0,001),
but also that many of the variables are statistically significant. The only exceptions are the promotions,
36
health labels and the dummy that indicates the difference between the promotion and regular weeks,
which have a p-value that is higher than an α = 0,10 (resp. p = 0,401, p = 0,334 and p = 0,747). Therefore,
hypotheses 1F, 1G and 1H cannot be supported by the data. It is interesting to see that the economic
health intervention is not significant in this model, whereas the separate test showed that there is a
difference between the effect of the weeks on the average health index. This could be explained by
the other variables that are present in the model, that might rule out the effect of this variable.
Hypothesis 3A which suggested that the effect of promotions is larger than the effect of health labels
cannot be supported as well, due to the insignificance of both variables. These results mean that none
of the distinguished drivers of HSD have influence on the average health index of the basket, which is
quite unexpected. After the interpretation of all variables, an additional test was performed for the
two health labels separately, to assess whether perhaps one of the two labels influences the basket
health. The R2 of 0,613 indicates that 61,3% of the variation in the average healthiness of the basket is
explained by the variables that are included in the model.
None of the other beta’s lay within the range to be zero, and therefore all contribute to the
health indices. One by one the independent variables will be interpreted. First, the health index of the
first product purchased has a positive influence on the average health index of the basket (ß = 0,043,
p = 0,001), thereby supporting hypothesis 1A. This means that when the first purchase was one index
point higher and therefore unhealthier, it results in a shopping basket that was on average 0,043 index
points higher and therefore unhealthier and vice versa.
Second, the healthy peak (healthiest product bought) has a positive influence on the average
health index of the basket (ß = 0,305, p = 0,001), thereby supporting hypothesis 1C, which shows that
healthy peaks do indeed have an influence. Thus, when the healthiest product purchased would
increase the amount of calories with 1 index point, this would increase the average index of calories
of the shopping basket with 0,305 index points, thereby making the basket unhealthier. The other way
around, when the healthiest product would become one index point healthier, this would make the
average basket healthier, lowering the index with 0,305 index points.
Third, the unhealthy peak (unhealthiest product bought) has a positive influence on the
average health index of the basket (ß = 0,075, p = 0,000), thereby rejecting hypothesis 1D. The result
indicates that when instead of this product an even unhealthier product of 1 index point higher is
purchased, this results in an average basket health that is 0,075 index points unhealthier and vice
versa. It was expected that out of guilt, the more extremely unhealthy peaks would influence the
basket health more positively (Chen and Sengupta, 2014). However, the evidence provided by the test
suggests that this is not the case.
Fourth, the volatility of the levels of healthiness throughout the shopping trip has a slight
negative influence on the average health index of the basket (ß = 0,018, p = 0,001), thereby supporting
37
hypothesis 1E that stated that there would be some influence. This result indicates that when the
volatility increases with one unit, this results in a basket that is on average 0,018 index points
unhealthier.
Finally, the trend of the health indices based on the sequence in which customers did their
groceries has a significant, positive effect on the average health index of the basket (ß = 0,360, p =
0,001), thereby supporting hypothesis 1B. This means that when customers start making increasingly
unhealthier decisions which increases the slope of the trend by 1 index point, this results in a basket
health that is on average 0,360 index points higher and thereby unhealthier. The other way around,
this means that when customers improve the healthiness of their shopping behavior throughout the
trip, this results in a basket that is on average healthier with 0,360 index points.
There are two different types of health labels, the green (healthy choice) and blue (healthier
choice within the product category) labels. An additional independent samples t-test was performed
to identify whether there were any differences between these two labels. For each type of label a
dummy was created and the differences in average health index of the basket was tested. For both
labels Levene’s test indicated that equal variances could be assumed. The results are summed up in
Table 5.
Table 5 Results Independent Samples T-Test green and blue health labels, Model 1
Green Health Label
Dummy Green Mean St. Deviation
Average HI 0 ,9974 ,17708
1 1,0053 ,17745
Independent Samples T-Test
Levene’s Test for Equality of Variances
F Significance T-value DF Significance
Equal variances assumed ,001 ,971 -1,154 2695 ,248
Blue Health Label
Dummy Blue Mean St. Deviation
Average HI 0 ,9984 ,17470
1 1,0076 ,18281
Independent Samples T-Test
Levene’s Test for Equality of Variances
F Significance T-value DF Significance
Equal variances assumed ,082 ,774 -1,244 2695 ,214
* indicates significance at the 5% level (2-tailed)
38
The output in Table 5 indicates the differences in average healthiness of the basket when at least one
green/blue label was purchased compared to when no green/blue label was bought. The test results
for both green and blue labels show no significant differences in the means of the baskets, which
means that there are no single effects of the two health labels.
4.2 MODEL 2
4.2.1 EXPLORATORY ANALYSIS
Again, a correlation matrix was created as part of the exploratory analysis (Table 6). The matrix
indicates that there are many significant correlations between the variables. Whether these
correlations will cause problems when estimating the model parameters, will be tested in the next
paragraph where model assumptions are tested.
Table 6 Pearson’s Correlation matrix Model 2 First Dairy First Previous HealthPeak UnhealthPeak Volatility Trend Promo Health
Labels
Ec. H.
Interv.
Average
HI
First Dairy 1 ,001 -,003 ,295** -,023 ,002 ,085** ,057* ,017 -,003 ,071**
First ,001 1 ,017 ,018 ,243** -,092** -,332** -,040 -,068** -,007 ,366**
Previous -,003 ,017 1 -,002 ,552** ,103** ,241** -,101** ,060** ,003 ,461**
HealthPeak ,295** ,018 -,002 1 -,217** ,095** -,305** ,117** ,107** -,004 ,204**
UnhealthPeak -,023 ,243** ,552** -,217** 1 ,006 ,199** -,140** -,029 ,032 ,620**
Volatility ,002 -,092** ,103** ,095** ,006 1 ,010 ,000 ,043 -,020 ,000
Trend ,085** -,332** ,241** -,305** ,199** ,010 1 -,085** -,083** -,017 -,031
Promo ,057* -,040 -,101** ,117** -,140** ,000 -,085** 1 ,102** ,053* -,106**
Health Labels ,017 -,068** ,060** ,107** -,029 ,043 -,083** ,102** 1 -,044 ,015
Ec. H. Interv. -,003 -,007 ,003 -,004 ,032 -,020 -,017 ,053* -,044 1 ,025
Average HI ,071** ,366** ,461** ,204** ,620** ,000 -,031 -,106** ,015 ,025 1
* indicates that the correlation is significant at the 5% level (2-tailed) ** indicates that the correlation is significant at the1% level (2-tailed)
In addition to this exploratory analysis, another graph with HSD was created. However, in contrast to
the graph in paragraph 4.1 from the data of Model 1, the HSD curve in Fig. 5 is based on the smaller
data set that is cut off at the dairy section. The data suggests that the average amount of choices made
before arriving at the dairy section is 13 purchases. Therefore, the HSD curve only shows the first 16
purchases made by customers, to be able to get a more complete picture of the pattern. The HSD curve
shows a trend that is quite similar to the beginning of the first HSD curve in Fig. 4.
39
Fig 5 HSD curve Model 2
Finally, it might be interesting to find out whether the economic health intervention leads to any
differences in healthiness of customers’ first healthy choice. In theory it does not necessarily have to
be the case, since the promotion period mainly focused on the promotion of potatoes, fruit and
vegetables. The results in Table 7 show indeed that this is not the case. Of course the effect of the
economic health intervention is still included in the regression model.
Table 7 Independent Samples T-test for economic health intervention
Economic Health Interventions
Economic Health Intervention Mean St. Deviation
Average HI 0 ,89787 ,486014
1 ,89546 ,547446
Independent Samples T-Test
Levene’s Test for Equality of Variances
F Significance T-value DF Significance
Equal variances assumed ,423 ,515 ,100 2001 ,920
Again, before interpreting the beta’s of the variables of this model, the model assumptions need to be
checked. In the case that there are violations of these assumptions, remedies might change the beta’s
and their corresponding levels of significance (Leeflang et al., 2015).
40
4.2.2 MODEL ASSUMPTIONS
In this paragraph, the model assumptions multicollinearity, normality, autocorrelation and
heteroscedasticity are tested again. First, multicollinearity is no issue in this dataset, as all VIF-scores
are <5 (range from 1,010 – 1,132) and tolerance levels >0,2 (range from 0,884 – 0,990) (Appendix 4.1).
Second, normality was visualized with a histogram and tested with the Kolmogorov-Smirnov
and the Shapiro-Wilk test (Appendix 4.2). The figure illustrates that there is a bell curve, albeit slightly
skewed. The tests need to point out whether this light skewedness is influential to the normality. The
results of the tests indicated that the unstandardized residuals (that were derived with the logarithmic
dependent variable) were not normally distributed (Appendix 4.2). To deal with the non-normality, a
bootstrap was performed (Appendix 4.2). The outcomes of this bootstrap does not indicate differences
in the significance of the betas compared to the original regression model. Therefore, the regression
results can be interpreted without problems.
Third, the Durbin-Watson statistic was used to test for autocorrelation. The result (DW = 2,001)
indicates that autocorrelation does not play a part, as a value around 2 indicates no autocorrelation
(Leeflang et al., 2015).
Finally, heteroscedasticity was tested with Levene’s test for equality of variances, where once
again the unstandardized residuals were saved for the regression and the factor used was again the
healthy promotion week versus the regular weeks. The results indicate that there is no issue with the
variances over time (Levene statistic = 0,850, p = 0,357) (Appendix 4.3).
Since none of the assumptions were violated, the regression results are interpreted in the next
subparagraph.
4.2.3 INTERPRETATION
The regression output is summarized in Table 8 on the next page.
41
Table 8 Results Regression Model 2
Model Statistics
Model F-value 13,056 R2 0,062
Model Significance 0,001** Adjusted R2 0,058
Regression Output Beta Std. Error T-value Significance
Constant ,667 ,075 8,916 ,001**
First ,016 ,034 ,454 ,650
Previous -,033 ,016 -2,133 ,033*
Healthy Peak ,432 ,045 9,690 ,001**
Unhealthy Peak ,039 ,007 5,301 ,001**
Volatility -,009 ,022 -,399 ,690
Average HI basket ,021 ,054 ,387 ,699
Trend 1,363 ,336 4,051 ,001**
Promotions ,100 ,074 1,340 ,180
Health Labels -,003 ,082 -,032 ,975
Economic Health Intervention -,011 ,024 -,452 ,651
* indicates significance at the 5% level (2-tailed) ** indicates significance at the1% level (2-tailed)
Table 8 shows that this model as a whole is significant (F = 13,056, p = 0,001). The R2 of 0,062 indicates
that the variables included in the model explain only 6,2% of the variance in the health index of the
first dairy product purchased. This is much lower than the R2 of Model 1, which makes sense, because
Model 1 took the entire shopping trip into account and simply has more data points on which the
regression was based. Moreover, the dependent variable in Model 1, the average healthiness of the
basket, is to a greater extent linked to the variables in the model than the first dairy purchase. As a
contrary to Model 1, it becomes clear that many variables in the model are no longer significant: the
first product bought, volatility, average HI, promotions, health labels and economic health
intervention. Since the shopping basket is cut off at the point of the first dairy-decision, it is possible
that the baskets have become too small, causing these effects. As a result, hypotheses 2A, 2G, 2H, 2I,
2J, and 2K cannot be supported in this study. Again, after the interpretation of the significant variables
the effects of the two individual health labels will be tested. Besides, due to the insignificance of both
the effects of promotions and health labels, hypothesis 3B cannot be supported.
When looking at the results of the other independent variables, there are some similarities
and some differences when compared to Model 1. First, the variable indicating the health index of the
previous product chosen before arriving at the dairy section was added to the model. The results
42
indicate that the health index of the previous product choice has a negative influence on the health
index of the first dairy product purchased (ß = -0,033, p = 0,033), thereby supporting hypothesis 2B.
This indicates that when the health index of the previous product choice would be 1 index point higher,
the health index of the first dairy product decreases with 0,033 index points, making the dairy decisions
healthier. In Chap. 2 it was stated in another hypothesis that the effect of the previous decision would
be larger than of the first decision, as it was expected that recency effects would be larger than primacy
effects. Due to the insignificance of the primacy effect, hypothesis 2C cannot be supported.
Second, the healthy peak (healthiest product bought so far) has a positive influence on the
health index of the first dairy product purchased (ß = 0,432, p = 0,001), thereby supporting hypothesis
2E showing that healthy peaks do indeed have an influence. This implies that the healthiest product
chosen, so far, would decrease the calorie-index with 1 point, this would decrease the calorie-index of
the first dairy product choice with 0,432 index points, making this healthier and vice versa.
Third, the unhealthy peak (unhealthiest product bought so far) also has a positive influence on
the health index of the first dairy product purchased (ß = 0,039, p = 0,001), thereby rejecting hypothesis
2F. The result indicates that when instead of this product an even unhealthier product of 1 index point
higher is purchased, this results in a first dairy purchase that is 0,075 index points unhealthier and vice
versa. It was expected that out of guilt, the more extremely unhealthy peaks would influence the
health index of the dairy purchase in a positive way (Chen and Sengupta, 2014). However, this does
not turn out to be the case.
Finally, the trend of the health indices based on the sequence in which customers did their
groceries until the dairy section has a positive effect on the health index of the first dairy product
purchased (ß = 1,363, p = 0,001), thereby supporting hypothesis 2D. This effect indicates that when
customers start making increasingly unhealthier decisions which increases the slope of the trend by 1
index point, the health index of the first dairy product chosen increases with 1,363. The other way
around, this means that when customers improve the healthiness of their shopping behavior
throughout the trip, this results in a dairy choice that is 1,363 index points healthier.
Even though the health labels in the model were not significant, it is possible that there is a
minor effect of one of the labels, that was subsequently ruled out by the other insignificant health
label. Again, an independent samples t-test was performed to identify if one of the two health labels
affect the health index of the first dairy product differently. The results are summed up in Table 9 on
the next page.
43
Table 9 Results Independent Samples T-Test green and blue health labels, Model 2
Green Health Label
Dummy Green Mean St. Deviation
First Dairy Purchase 0 ,90794 ,509980
1 ,88662 ,503780
Independent Samples T-Test
Levene’s Test for Equality of Variances
F Significance T-value DF Significance
Equal variances assumed ,185 ,667 0,941 2001 ,347
Blue Health Label
Dummy Blue Mean St. Deviation
First Dairy Purchase 0 ,88322 ,472240
1 ,92543 ,022235
Independent Samples T-Test
Levene’s Test for Equality of Variances
F Significance T-value DF Significance
Equal variances not
assumed 11,794 ,001 -1,643 1110,522 ,101
Again, it becomes clear that there are no differences between the health labels individually and the
total effect of the health labels. In the test for the blue health labels equal variances could not be
assumed, due to the significance of Levene’s test for equality of variances. The difference between the
groups is only just not significant. If the effect of blue health labels would have turned out to be
negative which would indicate that the health labels do not work and even make future choices
unhealthier. However, that is cannot be stated, due to the insignificance of this test.
4.3.4 PREDICTIVE VALIDITY
Before the prediction variables are computed and tested, the non-significant parameters should be
deleted. This results in a final model with four parameters: health index of the previous purchase
decision, the healthiest purchase made so far (healthy peak), the unhealthiest purchase made so far
(unhealthy peak), and the trend. Subsequently, the data is split into an estimation sample (all groceries
before the first dairy choice) and a validation sample (all first dairy choices). 70% of the sample is used
to form this estimation sample and the resulting 30% represents the validation sample. This division
was made due to the large dataset, which allows to choose these sizes. Subsequently, forecasted
values were estimated with SPSS and the quality of these forecasted values is tested. Two tests were
44
used for this evaluation: the MAPE and RAE. The MAPE (Mean Absolute Percentage Error) is a measure
that evaluates the robustness of the predictive performance of the model and is used because it is
dimensionless and uses the absolute value of the error terms (Leeflang et al., 2015). The lower the
percentage of the MAPE, the lower the deviation of what is estimated is. Therefore, this measure is
quite insightful. The T denotes the total sample size in the dataset and T* denotes the estimation
sample that was used to generate the prediction results. Furthermore, yt stands for the true health
index of the first dairy purchase, whereas �̂�𝑡 indicates the predicted health index. For this model, the
MAPE is
𝑀𝐴𝑃𝐸 = 1
𝑇 − 𝑇 ∗ ∑
|𝑦𝑡 − �̂�𝑡|
𝑦𝑡
𝑇
𝑡=𝑇∗ +1
𝑥 100% = 51,91%
The result of the MAPE is quite high with a percentage of about 50%, which implies that more than
half of the time the predicted values have a large deviation from the actual values. To test whether
this model still works better than a naïve model where simply the health index of the first dairy
purchase of the previous customer is expected to be the same for the next customer (Leeflang et al.,
2015). To test this, the RAE is calculated, which weighs the prediction model against this naïve model.
For this model, the RAE is
𝑅𝐴𝐸 = ∑ |𝑦𝑡 − �̂�𝑡|𝑇
𝑡=𝑇∗ +1
∑ |𝑦𝑡 − 𝑦𝑦−1|𝑇𝑡=𝑇∗ +1
= 0,69072
The RAE is <1, which indicates that the regression model functions better than a naïve model. However,
due to the negative result of the MAPE, it can be said that this regression model is not suitable for
predicting customers’ healthy choices in the dairy section.
45
5. CONCLUSION
This study had two objectives. On the one hand, the goals was to discover what drives healthy shopping
behavior and whether HSD play a part in this, and on the other hand whether this healthy shopping
behavior follows a certain pattern and can therefore be forecasted. In order to test these objectives,
two models were built, for which a number of hypotheses were drafted. Table 10 below shows all
these hypotheses once again, with the finding of the tests performed in Chap. 4.
Table 10 Overview of all hypotheses
Hyp. Support Comment
H1A The health index of the first purchase decision is positively related to the average health index of the basket
Yes
H1B An improving trend of healthy choices has a positive influence on the average health index of the basket
Yes
H1C Healthy peaks during the shopping trip have an influence on the average health index of the basket
Yes
H1D Unhealthy peaks during the shopping trip have a negative influence on the average health index of the basket
No
H1E The volatility of the health indices of previous purchases influences the average health index of the basket
Yes
H1F General promotions have a positive influence on the average health index of the basket
No significant result
H1G Health labels have an influence on the average health index of the basket No significant result
H1H Economic health interventions have an influence on the average health index of the basket
No significant result
H2A The health index of the first purchase decision is positively related to health index of the next purchase
No significant result
H2B The health index of the previous purchase decision is negatively related to the health index of the next purchase
Yes
H2C The effect of the health index of the previous purchase decision on the health index of the next purchase decision is larger than the effect of the health index of the first purchase
Not tested due to insignificance 2A
H2D An improving trend of healthy choices has a positive influence on the health index of the next purchase
Yes
H2E Healthy peaks during the shopping trip have an influence on the health index of the next purchase
Yes
H2F Unhealthy peaks during the shopping trip have a negative influence on the health index of the next purchase
No
H2G The volatility of the health indices of previous purchases influences the health index of the next purchase
No significant result
H2H The average health index of previous purchases has a positive influence on the health index of the next purchase
No significant result
H2I General promotions have a positive influence on the health index of the next purchase No significant result
H2J Health labels have an influence on the health index of the next purchase No significant result
H2K Economic health interventions have an influence on the health index of the next purchase
No significant result
H3A The effect of (healthy) promotions is larger than the effect of health labels in influencing healthy shopping decisions in Model 1
Not tested due to insignificance 1F and 1G
H3B The effect of (healthy) promotions is larger than the effect of health labels in influencing healthy shopping decisions in Model 2
Not tested due to insignificance 2I and 2J
46
5.1 DISCUSSION
In this paragraph the outcomes of both models are discussed while taking into consideration the theory
developed in Chap. 2. Moreover, the research questions addressed in this study are answered in the
two paragraphs below.
5.1.1 HEALTHY SHOPPING DECISIONS
The first model was built in order to find solutions to the first research question: How are healthy
shopping decisions influenced during a shopping trip? and the three sub-questions. The first of these
questions asked what HSD look like in real purchases. The scanner data that was made available by
Plus consisted solely of 2.700 real shopping trips and was thus able to present a realistic picture. The
HSD curve in Fig. 4 (paragraph 4.1.1) proves that the healthiness of all these purchases along the
sequence of the shopping trip is not flat. The curve shows that resource depletion perhaps plays a role
in the consumer’s mind (Baumeister and Heatherton, 1996). As the curve indicates, the first half of the
trip (after the first 5 choices) is full of relatively healthy choices, whereas the second half is relatively
unhealthier. This is a great example of how customers’ self-regulation resources are limited and thus,
at a certain point, become depleted or mentally exhausted (Baumeister and Heatherton, 1996). This
explains a lot about customers’ shopping behaviors and poses an opportunity for grocery retailers to
help customers with this problem.
To answer the second sub-question that looked into the effect of HSD on the average health
index of the basket, several tests were performed. The first purchase, healthiest product purchased,
unhealthiest product purchased and the volatility were the HSD-variables that affected the average
health index of the basket. Primacy effects were proven to affect the average health index of the basket
positively, indicating that this decision in the sequence is remembered well by customers
(Montgomery and Unnava, 2009). Moreover, both healthy and unhealthy peaks have a positive
influence on the average basket healthiness. It was expected that moments of extremeness are
remembered better by customers (Montgomery and Unnava, 2009). However, the opposite effect was
expected for the unhealthy peak. Apparently feelings of guilt do not necessarily matter, which can be
caused by a lot of customers that simply do not care about shopping for healthy products yet (Chen
and Sengupta, 2014). They do not compensate their unhealthy choices for healthy ones, which could
explain this phenomenon. The positive effects for the healthy peaks may be caused by an underlying
feeling of pride (Mukhopadhyay and Johar, 2007; Williams and DeSteno, 2008). Finally, also the
volatility showed a positive influence on the average health index of the basket. The rationale behind
this effect is quite logical: when customers show more volatile behavior (Durbin-Watson statistic
increases toward negative autocorrelation), the average health index of the basket becomes higher,
47
thereby making it unhealthier. This implies that when the choice behavior fluctuates substantially,
customers lose touch with their shopping goals, resulting in a negative influence on the healthiness.
Therefore, the second sub-question can be answered with that the established HSD drivers affect the
average health index of the shopping basket at the end of the trip.
Beside these HSD-related variables, there were three other drivers included in the model to
explain healthy choices that answers last sub-question. First, buying more promotions has no
significant effect on the healthiness of shopping baskets. It was expected that when customers buy
more promotions for instance due to unplanned buying, they lose track of their healthy shopping goals,
but this did not turn out to be the case (Inman et al., 1990). Second, the total effect of health labels
did not significantly influence the average healthiness of the shopping basket. This finding is in line
with what the Consumentenbond has said as a result of their researches (Consumentenbond, 2016).
They claim that ‘Het Vinkje’ does not have a positive influence on customers’ healthy shopping
behavior. This quantitative study did find the same results as their panels indicated. There was also no
individual effect of either one of the health labels separately as well. Finally, the economic health
interventions that are used by Plus do not lead to differences in the healthiness of the shopping
baskets. This implies that as much as the campaign aims to influence customers’ healthy shopping
behavior, this does not happen. Even though this was not significant in the model, a closer look to
these economic health interventions showed that between regular and promotional weeks there is a
difference in the average health index of the basket. Surprisingly, it turns out that in regular weeks
customers have healthier end-of-trip baskets than during the promotion. Considering that the health-
focused promotion is aimed towards helping customers buy more products in the potatoes, fruit and
vegetable section, it is striking that the baskets are unhealthier during these weeks. Perhaps this can
be explained by licensing effects which eradicate the effect of a few healthy products purchased
through compensation with unhealthier products later on in the shopping trip (Waterlander et al.,
2013). The overall effect of the economic health intervention in the regression model, however, was
not significant. Therefore, the answer to the third sub-question is that none of the three identified
drivers significantly affect the average health index of the shopping basket at the end of the trip.
Considering the mentioned results, there are thus several drivers that influence healthy
shopping decisions during the shopping trip, but only in the form of HSD. In many cases, these effects
can be explained by psychological underlying processes that may not even be consciously made.
5.1.2 DISCOVERING PATTERNS TO FORECAST DECISIONS
The second model has a different purpose than the first one, as tries to find an answer to the second
research question: Can a pattern be distinguished in the scanner data that can forecast the healthiness
of customers’ purchase decisions? and a sub-question. This sub-question asks whether the drivers that
48
were used in Model 1 can be used in Model 2 to predict consumer behavior. Outcomes of tests
indicated that that was the case for some of the variables.
The primacy effects that were significant in the first model, turned out not to have an impact
in this second model. Recency effects, on the contrary, do affect the health index of the first dairy
purchase. However, this effect is negative, which could be explained by feelings of guilt of the customer
or licensing effects (Chen and Sengupta, 2014; Khan and Dhar, 2006). When a customer chooses a
relatively unhealthy product before going to the dairy section, this results in a relatively healthy dairy
purchase (due to guilt) and vice versa (licensing). It slowly becomes clear that psychological processes
in the customer’s mind play a bigger part than can be seen or perhaps even consciously felt during the
shopping trip.
Moreover, the effect of the healthiest product purchased (healthy peak) is found to be of
influence as well, as expected. Moments of extremeness stick to the mind and result in an impact on
behavior (Montgomery and Unnava, 2009). The result indicates that when the calorie index of the
healthiest choice would rise, the health index of the first dairy purchase rises as well. This is also
represented in the trend throughout the shopping trip so far: an improving trend in terms of calories
results in an unhealthier dairy choice and vice versa. This effect is also shown in the HSD curves from
both Model 1 and Model 2. In the beginning of the trip (where also the first dairy decision takes place,
when taking the average of all customers), the behavior seems to be quite determined after the first
five purchases. This might be explained by customers’ negative time preference, where they prefer an
improving trend over time compared to a declining one (Loewenstein and Prelec, 1993). In the part of
the trip until the dairy section customers seem to keep their behavior together, as there is a straight
trend towards healthier behavior. The before mentioned resource depletion does not play a part in
this part of the store (Baumeister and Heatherton, 1996). This is an important finding, as depletion
apparently starts playing a role at another section in the store. Beside the healthiest product purchase,
the unhealthiest product purchased (unhealthy peak) has a positive influence on the health index of
the first dairy product purchased. As well as in the first model, this is not in accordance with the
expectations, which indicated that they would influence the health index in a positive way due to
feelings of guilt (Chen and Sengupta, 2014). This is more in accordance with the described ‘What-The-
Hell-effect’, where customers continue to choose unhealthy foods after a first failure (Cochran and
Tesser, 1996). The unhealthiest product bought does not necessarily have to be the first unhealthy
product chosen, but it certainly affects the healthiness of the first dairy purchase in a negative way.
Again, the three separate drivers that were distinguished (promotions, health labels and
economic health intervention) proved all not to be of significance to determine the height of the health
index of the first dairy purchase. These findings are similar to the findings in Model 1, that were
discussed in the previous paragraph.
49
5.2 LIMITATIONS AND FURTHER RESEARCH
Even though this research has provided a lot of insight in the underexplored area of basket-level
scanner data, there are a number of limitations to the study. First of all, one of the larger limitations is
that the health index that is used to verify the healthiness of every purchase decision is solely based
on the amount of calories that the product contains. This information was easily available for each
product and therefore the health index was limited to the amount of calories. However, to get a
completer view of the real healthiness of a product, more information such as the amount of sugar,
fat or salt could be used as well.
Second, the dataset was limited in many ways. For starters, there was no information available
about the customers. This made it impossible to distinguish behavior between different customers and
no insight could be derived in the form of for instance differences between men and women. Besides,
there was no information available on what customers’ shopping goals were. Customers with the idea
to already shop for healthy groceries versus customers who do not care about this will most likely have
very different baskets. When it would be possible to distinguish between them, it could become clear
how their behaviors differ and how grocery retailers can help everyone towards a healthier lifestyle.
Another limitation of the dataset lies within the samples that needed to be made before the
analyses were performed. Due to a large amount of missing calorie information, it was necessary to
make these samples. Therefore, not all of the available customer data could be used. Besides, the data
was collected from only 3 weeks of 3 different stores. To acquire more in-depth insights, it would be
more interesting to include additional weeks and look at differences that could be a result of the time
of the year for instance. It seems logical that customers’ shopping behaviors are different during the
winter than in summer time. If that would be the case, grocery retailers could get a more specific idea
of how to target customers differently throughout the year.
The model created in this study appeared not to predict the next choice in the supermarket
too well. One solution for this could be to include more possible drivers of the next purchase made or
to include more customer information. Besides, it is possible that the dairy section is not the most
appropriate place in the grocery store to predict the healthiness of the next purchase. However, it
might also be a possibility that consumer behavior simply cannot be modelled in this way. Thus, more
research with this type of basket-level scanner data is thus needed.
Besides, customers seem to have the tendency to start their trip with a relatively healthy trend
until approximately the seventeenth decision. After this point, their decision behavior switches
towards relatively unhealthier choices. It is still unknown what causes this switch and therefore more
research needs to be performed to be able to forecast in which section or at after how many choices
customers seem to pass this tipping point.
50
Finally, the psychological theories discussed such as licensing were found in the patterns that
the analyses discovered (Khan and Dhar, 2006). The fact that this study used real life data makes this
finding very insightful, since such effects actually seem to exist. This implies that this type of data
should be further used in the future. Another advantage of this kind of data, is that the customers who
did the groceries were not aware that their data was going to be used for this research. In a way, they
might have even made decisions that represent their real behavior more truly than when they had to
pass a cash register, since the person behind this register would see what customers buy. Therefore,
it might also be an interesting future study to look at the differences between the healthiness of
baskets of customers that used a hand scanner versus customers who paid their groceries at a register.
5.3 MANAGERIAL IMPLICATIONS
In the Dutch grocery environment, the importance of corporate social responsibility is growing and
Plus gives the perfect example of a supermarket that is trying to help customers to do their groceries
in a healthier, more responsible way. With their promotional campaign they aim to help customers
towards purchasing healthier products. However, the results of this study only indicate that the
differences in healthiness of the baskets between regular and promotional weeks appear to be the
opposite of what was expected: customers have unhealthier baskets during the promotion week
compared to the weeks around the promotion. Therefore, it is most likely that customers compensate
their healthier choices with unhealthier ones, which gives an unwanted result. Plus might want to
change their promotional campaign slightly. Something else indicated by the results, for instance, is
that primacy effects have an influence on the healthiness of shopping baskets. This implies that
perhaps advertisements for healthy products could be placed again at the beginning of the store, since
customers seem to remember this well throughout the trip. Perhaps a display with healthy recipes,
clear posters with what healthy promotions can be bought or in-store demonstrations of how to make
healthy food with the promotions of that week could be used to help here. These are of course simply
some examples.
Moreover, the effect of the health labels was not found significant, which only backs the
research of the Consumentenbond who is already trying to get the ‘Vinkje’ out of the supermarkets.
These specific health labels are probably too confusing. If this health label would leave, it could be
replaced with a new health label. There lies an opportunity within the Dutch grocery market to
introduce a new type of label or to improve the understanding of current health label.
Finally, HSD seem to exist and evolve as the shopping trip continues and a pattern can be
distinguished. Customers seem to start their trips relatively healthy, except for the first 5 purchases.
The research has suggested that the first purchase a person makes influences the healthiness of the
51
basket. Given that the first few purchases with a negative trend towards unhealthier products, the
proposed implication of increasing marketing at the beginning of the store might also work to tackle
this phenomenon.
Moreover, after about 17 choices this trend switches to relatively unhealthy choices for a long
time. It looks like customers are trying to purchase relatively healthy products, but seem to struggle as
their shopping trip evolves, which is called depletion of the self-regulation resource. This poses an
opportunity for grocery retailers to help their customers. Different researches indicated manners to
help overcome this depletion, such as watching a comedy movie halfway, drinking a glucose drink or
receiving a surprise gift. Especially this last one could be used by grocery retailers in a trial. An example
of how this could work is by using the handscanner in the customer’s hand. For instance, after choosing
seventeen different products a message pops up saying the customer gets a 50% discount on a selected
healthy product if they purchase it in this trip. This could cause their self-regulation muscle to recharge.
5.4 FINAL CONCLUSION
This study has resulted in several implications for Dutch grocery retailers. Due to the low predictive
validity of the model, it does seem to be the case that even though drivers of future behavior can be
distinguished, this cannot provide a strong model for prediction. The rollercoaster-like patterns that
were established by the HSD give reason to believe that drivers for future behavior can be found. The
basket-level data provided interesting insights in the shopping behavior of customers and a lot of
different, new insights can be drawn from this type of data. Future research based on the same type
of data can help to understand healthy shopping dynamics even more and positively influence the
healthiness of consumers’ shopping behavior.
52
REFERENCES
An, R. (2012). “Effectiveness of subsidies in promoting healthy food purchases and consumption: A
review of field experiments”, Public Health Nutrition 16 (7), 1215-1228.
Anderson, C.J. (2003). “The psychology of doing nothing: Forms of decision avoidance result from
reason and emotion”, Psychological Bulletin 129 (1), 139-167.
Andreyeva, T., M.W. Long and K.D. Brownell (2010). “The impact of food prices on consumption: A
systematic review of research on the price elasticity of demand for food”, American Journal of
Public Health 100 (2), 216-222.
Arnold, M.J. and K.E. Reynolds (2003). “Hedonic shopping motivations”, Journal of Retailing 79, 77-95.
Asfaw, A. (2011). “Does consumption of processed foods explain disparities in the body weight of
individuals? The case of Guatemala”, Health Economics 20, 184-195.
Baumeister, R.F. and T.F. Heatherton (1996). “Self-regulation failure: an overview”, Psychological
Inquiry 7 (1), 1-15.
Burton, S., E.H. Creyer, J. Kees and K. Huggins (2006). “Attacking the obesity epidemic: The potential
health benefits of providing nutrition information in restaurants”, American Journal of Public
Health 96 (9), 1669-1675.
Cannuscio, C.C., A. Hillier, A. Karpyn and K. Glanz (2014). “The social dynamics of healthy food shopping
and store choice in an urban environment”, Social Science & Medicine 122, 13-20.
Centraal Bureau voor de Statistiek (2014). “Obesity increases risk of chronic disorders” [online].
Accessed on the 16th of February 2016. http://www.cbs.nl/en-GB/menu/themas/gezondheid-
welzijn/publicaties/artikelen/archief/2014/2014-3939-wm.htm.
Chen, F. and J. Sengupta (2014). “Forced to be bad: The positive impact of low-autonomy vice
consumption on consumer vitality”, Journal of Consumer Research 41 (4), 1089-1107.
Cochran, W. and A. Tesser (1996). “The “What the Hell” effect: Some effects of goal proximity and goal
framing on performance”, In: Martin, L.L. and A. Tesser (1996). “Striving and feeling: Interactions
among goals, affect and self-regulation”, 99-120.
Consumentenbond (2016). “Waarom de vinkjes van de verpakkingen af moeten” [online]. Accessed on
the 30th of May 2016. http://www.consumentenbond.nl/campagnes/vinkjes/waarom-de-
vinkjes-van-de-verpakkingen-af-moeten/.
Desai, K.K. and S. Ratneshwar (2003). “Consumer perceptions of product variants positioned on
atypical attributes”, Journal of the Academy of Marketing Science 31 (1), 22-35.
DeWitte, S., S, Bruyneel and K. Geyskens (2009). “Self-regulating enhances self-regulation in
subsequent consumer decisions involving similar response conflicts”, Journal of Consumer
Research 36 (3), 394-405.
53
Dhar, R., J. Huber and U. Khan (2007). “The shopping momentum effect”, Journal of Marketing
Research 44, 270-378.
Fama, E.F. (1965). “Random walks in stock market prices”, Financial Analysts Journal 21 (5), 55-59.
Gailliot, M.T., R.F. Baumeister, C.N. DeWall, J.K. Maner, E.A. Plant, D.M. Tice, L.E. Brewer and B.J.
Schmeichel (2007). “Self-control relies on glucose as a limited energy source: Willpower is more
than a metaphor”, Journal of Personality and Social Psychology 92 (2), 325-336.
GfK (2016). “GfK MVO Rapport”.
Giesen, J.C.A.H., C.R. Payne, R.C. Havermans and A. Jansen (2011). “Exploring how calorie information
and taxes on high-calorie foods influence lunch decisions”, The American Journal of Clinical
Nutrition 93, 689-694.
Gilbride, T.J., J.J. Inman and K.M. Stilley (2015). “The role of within-trip dynamics in unplanned versus
planned purchase behavior", Journal of Marketing 79, 57-73.
Glanz, K., M.D.M. Bader and S. Iyer (2012). “Retail grocery store marketing strategies and obesity. An
integrative review”, American Journal of Preventive Medicine 42 (5), 503-512.
Gollwitzer, P.M., H. Heckhausen and H. Ratajczak (1990). “From wighing to willing: Approaching a
change decision through pre- or postdecisional mentation”, Organizational Behavior and Human
Decision Processes 45, 41-65.
Greene, R.L. (1986). “Sources of recency effect in free recall”, Psychological Bulletin 99 (2), 221-228.
Inman, J.J., L. McAlister and W.D Hoyer (1990). “Promotion signal: Proxy for a price cut?”, The Journal
of Consumer Research 17 (1), 74-81.
Kahneman, D., Fredrickson, B.L., Schreiber, C.A. and Redelmeier, D.A. (1993). “When more pain is
preferred to less: Adding a better end”, Psychological Science 4 (6), 401-405.
Kahn, B.E., and D.C. Schmittlein (1992). “The relationship between purchases made on promotion and
shopping trip behavior”, Journal of Retailing 68 (3), 294-315.
Khan, U. and R. Dhar (2006). “Licensing effect in consumer choice”, Journal of Marketing Research 43,
259-266.
Lee, L. and D. Ariely (2006). “Shopping goals, goal concreteness, and conditional promotions”, Journal
of Consumer Research 33, 60-70.
Leeflang, P.S.H., J.E. Wieringa, T.H.A. Bijmolt and K.H. Pauwels (2015). “Modeling markets. Analyzing
marketing phenomena and improving marketing decision making”. Springer Science and
Business Media, New York, USA.
Loewenstein, G.E. and D. Prelec (1993). “Preferences for sequences of outcomes”, Psychological
Review 100 (1), 91-108.
Malhotra, N.K. (2009). “Marketing research. An Applied Orientation”. 6th edition, Prentice Hall, New
Jersey, USA.
54
Montgomery, N.V. and H.R. Unnava (2009). “Temporal sequence effects: A memory framework”,
Journal of Consumer Research 36 (1), 83-92.
Mukhopadhyay, A. and G.V. Johar (2007). “Tempted or not? The effect of recent purchase history on
responses to affective advertising”, Journal of Consumer Research 33.
Ng, M., T. Fleming, M. Robinson, … , E. Gakidou (2014). “Global, regional, and national prevalence of
overweight and obesity in children and adults during 1980-2013: A systematic analysis for the
Global Burden of Disease Study 2013”, Lancet 384, 766-781.
Ordabayeva. N. and P. Chandon (2013). “Predicting and managing consumers’ package size
impressions”, Journal of Marketing 77, 123-137.
Payne, C.R., M. Niculescu, D.R. Just and M.P. Kelly (2014). “Shopper marketing nutrition interventions”,
Psychology & Behavior 136, 111-120.
Plus (2015). “Commercieel Jaarplan 2015”.
Plus (2016a). “Commercieel Jaarplan 2016”.
Plus (2016b). “Rapportage Het Vinkje 2016”.
Shehu, E., T.H.A. Bijmolt and M. Clement (2016). “Effects of Likability Dynamics on Consumers’
Intention to Share Online Video Advertisements”, Journal of Interactive Marketing 35, 27-43.
Teixeira, T.S., M. Wedel and R. Pieters (2012). “Moment-to-moment optimal branding in TV
commercials: Preventing avoidance by pulsing”, Marketing Science 29 (5), 783-804.
Tice, D.M., R.F. Baumeister, D. Shmueli and M. Muraven (2007). “Restoring the self: Positive affect
helps improve self-regulation following ego depletion”, Journal of Experimental Social
Psychology 43, 379-384.
Van Ittersum, K. and T.H.A. Bijmolt (2015). “Healthy shopping dynamics: The origin of healthy shopping
baskets”, forthcoming.
Vinkje (2016). Accessed on the 26th of February 2016. http://www.hetvinkje.nl/over-het-vinkje/.
Voedingscentrum (2016). Accessed on the 24th fo May 2016. http://www.voedingscentrum.nl/
encyclopedie/calcium.aspx
Wansink, B. (1996). “Can package size accelerate usage volume?”, Journal of Marketing 60, 1-14.
Wansink, B. and P. Chandon (2006). “Can “low-fat” nutriction labels lead to obesity?”, Journal of
Marketing Research 43, 605-617.
Waterlander, W.E., I.H.M. Steenhuis, M.R. de Boer, A.J. Schuit and J.C. Seidell (2012). “Introducing
taxes, subsidies or both: The effects of various food pricing strategies in a web-based
supermarket randomized trial”, Preventive Medicine 54, 323-330.
Waterlander, W.E., I.H.M. Steenhuis, M.R. de Boer, A.J. Schuit and J.C. Seidell (2013). “Effects of
different discount levels on healthy products coupled with a healthy choice label, special offer
55
label or both: Results from a web-based supermarket experiment”, International Journal of
Behavioral Nutrition and Physical Activity 10, 1-8.
Williams, L.A. and D. DeSteno (2008). “Pride and perseverance: The motivational role of pride”, Journal
of Personality and Social Psychology 94 (6), 1007-1017.
World Health Organization (2015). “Obesity and overweight, fact sheet 311” [online]. Accessed on the
16th of February 2016. http://www.who.int/mediacentre/factsheets/fs311/en/.
56
APPENDICES
APPENDIX 1: DIFFERENCES BETWEEN STORES
Descriptive
Store ID Mean Std. Deviation
220 0,6211 0,22802
722 0,6084 ,21733
895 0,6180 ,25069
One-Way ANOVA test results
Sum of Squares
DF Mean Square F Significance
Between Groups
,127 2 ,063 1,172 ,310
Within Groups 145,557 2694 ,054
Total 145,684 2696
Regression results of 3 stores
Store 220 Beta
Sig.
Store 722 Beta
Sig.
Store 895 Beta
Sig.
Constant ,158 ,001** ,223 ,001* ,222 ,001**
First ,045 ,001** ,110 ,001* ,039 ,007**
Healthy Peak ,593 ,001** ,484 ,001* ,470 ,001**
Unhealthy Peak ,138 ,001** ,087 ,001* ,116 ,001**
Volatility -,027 ,024* -,021 ,098~ -,030 ,030*
Trend -,075 ,626 ,067 ,788 ,247 ,200
Promotions ,085 ,151 ,190 ,004* ,307 ,001**
Health Labels ,331 ,001** ,108 ,177 ,232 ,003**
Economic Health Intervention ,024 ,078~ -,005 ,715 ,002 ,880
~ indicates significance at the 10% level (2-tailed) * indicates significance at the 5% level (2-tailed) ** indicates significance at the1% level (2-tailed)
57
APPENDIX 2: PROMOTION WEEK
(Example promotion week deleted)
APPENDIX 3: MODEL ASSUMPTIONS MODEL 1
3.1 MULTICOLLINEARITY
Variable Tolerance VIF
First HI ,990 1,010
Healthy Peak ,946 1,057
Unhealthy Peak ,955 1,047
Volatility ,985 1,015
Trend ,960 1,041
Promotions ,972 1,029
Health Labels ,995 1,005
Economic Health Intervention ,979 1,022
3.2 NORMALITY
Kolmogorov-Smirnov and Shapiro-Wilk Tests for normality
Kolmogorov-Smirnov Shapiro-Wilk
Statistic DF Sig. Statistic DF Sig.
Unstandardized Residual 0,041 2697 0,000** 0,964 2697 0,000**
** indicates significance at the1% level (2-tailed)
58
Plot of distribution of unstandardized residuals
Regression and Bootstrap results
Regression Output Beta Std. Error T-value Significance
Constant ,654 ,010 63,705 ,001**
First HI ,043 ,004 9,602 ,001**
Healthy Peak ,305 ,011 27,012 ,001**
Unhealthy Peak ,075 ,001 57,379 ,001**
Volatility ,018 ,004 4,403 ,001**
Trend ,360 ,061 5,897 ,001**
Promotions ,017 ,021 ,837 ,403
Health Labels ,021 ,022 ,967 ,334
Economic Health Intervention ,001 ,005 ,322 ,747
Bootstrap Output Beta Bias Std. Error Significance
Constant ,654 ,000 ,013 ,001**
First HI ,043 3,143E-5 ,007 ,001**
Healthy Peak ,305 ,000 ,012 ,001**
59
Unhealthy Peak ,075 -5,811E-5 ,002 ,001**
Volatility ,018 3,651E-5 ,004 ,001**
Trend ,360 -,004 ,119 ,004**
Promotions ,017 ,001 ,022 ,402
Health Labels ,021 -,001 ,023 ,352
Economic Health Intervention ,001 4,581E-5 ,005 ,755
3.3 HETEROSCEDASTICITY
Levene’s Test for Homogeneity of Variance
Levene Statistic Df1 Df2 Significance
2,404 1 2695 0,121
APPENDIX 4: MODEL ASSUMPTIONS MODEL 2
4.1 MULTICOLLINEARITY
Variable Tolerance VIF
First ,935 1,069
Previous ,884 1,132
Healthy Peak ,885 1,130
Unhealthy Peak ,929 1,077
Volatility ,986 1,014
Average_HI ,885 1,130
Trend ,946 1,057
Promotions ,959 1,043
Health Labels ,981 1,019
Economic Health Intervention ,990 1,010
4.2 NORMALITY
Kolmogorov-Smirnov and Shapiro-Wilk Tests for normality
Kolmogorov-Smirnov Shapiro-Wilk
Statistic DF Sig. Statistic DF Sig.
Unstandardized Residual 0,096 1970 0,000** 0,873 1970 0,000**
** indicates significance at the1% level (2-tailed)
60
Plot of distribution of unstandardized residuals
Regression and Bootstrap results
Regression Output Beta Std. Error T-value Significance
Constant ,667 ,075 8,916 ,001**
First ,016 ,034 ,454 ,650
Previous -,033 ,016 -2,133 ,033*
Healthy Peak ,432 ,045 9,690 ,001**
Unhealthy Peak ,039 ,007 5,301 ,001**
Volatility -,009 ,022 -,399 ,690
Average HI basket ,021 ,054 ,387 ,699
Trend 1,363 ,336 4,051 ,001**
Promotions ,100 ,074 1,340 ,180
Health Labels -,003 ,082 -,032 ,975
Economic Health Intervention -,011 ,024 -,452 ,651
Bootstrap Output Beta Bias Std. Error Significance
Constant ,667 -,002 ,079 ,001**
First ,016 ,002 ,037 ,665
61
Previous -,033 -,001 ,013 ,015*
Healthy Peak ,432 -,006 ,052 ,001**
Unhealthy Peak ,039 ,000 ,007 ,001**
Volatility -,009 ,001 ,021 ,670
Average HI Basket ,021 ,001 ,054 ,675
Trend 1,363 ,020 ,408 ,002**
Promo ,100 ,001 ,074 ,184
Health Labels -,003 -,001 ,078 ,969
Economic Health Intervention -,011 -,001 ,024 ,656
4.3 HETEROSCEDASTICITY
Levene’s Test for Homogeneity of Variance
Levene Statistic Df1 Df2 Significance
0,850 1 1968 0,357