STUDY OF FACTORS AFFECTING THE BUYING BEHAVIOUR OF PEPSI
SUBMITTED BY
LIJO JOHN
AJURAM K SALIM
M.TECH
IEM,MED
NIT CALICUT
1. INTRODUCTION
In most marketing research applications, a survey is some sort of the method used, whether it is conducted through mail, telephone, in person, or more recently through internet. The other widely used class of study is experimentation. Just like in a laboratory we manipulate some variables and try to analyze the variations in other variables. The application of the experiments in the marketing research is a very wide one. Whenever a marketing mix variable such as price, specific promotion, or a type of distribution, etc is being changed we would want to know its effects. The experiment can be done with either one dependent variable and one independent variable or more than one independent variable.
One independent variable experiment is called one way ANOVA. ANOVA stands for Analysis Of Variance, a generic term given to study the cause and effect of one or more factors on the single dependent variable. When dependent variable is studied it is called MANOVA (Multivariate Analysis of Variance) is used. In this mini project we have used ANOVA on a large basis to study the effects of different independent variables on the liking of Pepsi.
1.1 Motivation
Pepsi is one of the major companies in the world with an annual turnover of more than a billion dollar every year. The brand Pepsi has become a household name across the globe. The major product offered by the company is a beverage marketed as Pepsi itself. Keeping the popularity of Pepsi in mind we have made a modest effort to study the various factors which lead to the liking of the beverage. Out of many factors, we have limited ourselves to four major factors namely SKU, gender, age-group, income, nature of drink, peer and place. We have tried to analyze the effects of each factor on the liking of the Pepsi.
The results of this study will be helpful in understanding the market behaviour and to make strategies depending upon the factors which affect the liking of the Pepsi the most.
1.2 Statement of problem
To study the effects of different factors affecting the liking of the brand pepsi, and to thereby conclude which factors should be given more importance while making the future strategies.
2. Analysis
A sample of 100 respondents is selected from the population consisting of students of NIT Calicut. They were asked to mark their response. Each respondent was asked to rate the Pepsi on a 10 point scale. The responses of 100 respondents are summarized in table 1 in appendix.
The data collected gave us the following information about the responder.
1. Gender – male or female2. Age group- less than 18, 18-23, 23-30, above 30.3. Income – less than 50,000, 50001- 150000, above 150000.4. Nature – normal or chilled.
5. Peers – with friends, with boyfriend or girlfriend, with family or alone6. Place- office/college , college canteen, hangout places like theatres, home.
We have considered different stock keeping unit for pepsi, viz, pepsi my can, 600ml, 1 lit, 1.5 lit, 2 lit.
All respondents were asked to rate how satisfied they feel after taking pepsi from any one of the SKU. This was done to check if there is any effect of the source on satisfaction derived from the product. Also other data was collected to find the other factors like gender, age, income has any effect on the satisfaction derived.
The first analysis we conducted on the collected set of data was to find out the nature of the data. We used pie charts to categorize the data. This gave the information that approximately almost all factors (gender, age, income, nature, peers and place) had equal amounts of respondents. And also approximately equal numbers of people were asked to give their satisfaction they derive from consuming pepsi from different sources.
The next step in the analysis was to find out which factors had the effects on the rating given by the respondents for the level of satisfaction. This was found out by using ANOVA. The rating and the four independent variables were subjected to one way ANOVA.
Furthermore, we went for multivariate analysis for finding out which factors among age, gender and income had an effect on the rating when combined with the SKU.
And finally we have tried to cluster the respondents into different clusters to study the nature of their behaviour and what other factors define the cluster
For the above stated analysis we used the statistical packages called SPSS and Minitab.
3. Description of various perspective
In this section we will be briefly displaying the results obtained by various analyses. First we will be evaluating the nature of the data collected. This has been done with the help of the pie charts.
Fig 1 : stock keeping unit
Here we can see that 14%, 22%,25%,18%,21% are the percentage of respondents who were asked to rate the level of satisfaction for my can, 600ml, 1lit,1.5lit, 2lit respectively. The numbers of respondents are asked to rate the various SKU’s such that almost all SKU’s are being rated equally.
Next we can see how the other factors are being devised among the respondents.
Fig 2: gender
The sample contains approximately the same amount of male and female respondents. This have been done to avoid any kind of biasness.
The next figure shows the amount of the respondents who fell into different age groups. The figure shows that 24%,27%,28% and 21% of respondents under the age group of less than 18, 18-23, 24-30,and above 30 respectively.
Fig 3 : age group
Fig 4: income
Figure 4 represents the information about the percentage of different income groups present in the data.
Fig 5: nature of drink
Figure 5 shows that 93% respondents like pepsi when it is chilled and just 7% respondents like it when it is not chilled.
Figure 6 shows that 72% respondents like to enjoy pepsi with friends when compared to just 19% respondents who enjoy the most when they have pepsi with their boyfriends or girlfriends, and 6% like to have pepsi with their family when a meagre 3% like to enjoy pepsi alone.
Figure 7 shows the distribution of the respondents who had different views about the place where the consume pepsi. 49% of the respondents felt that they enjoy pepsi the most when they are any hangout centres like malls, theatres etc, whereas 27% were of the opinion that they enjoy pepsi the most when they at their college canteen. 17% and 7% were of the opinion that they enjoy pepsi the most when they are at office/college and at home respectively.
Thus by analysing the different charts we can see that the data gives a reasonably good estimate about the behaviour of the population. But it is too early to give any conclusion without any further analysis. The preliminary analysis does not give us any detailed idea about the relationships these variables will be having with each other.
Thus for making any further comments results of other tests will be analysed too.
Fig 6: peer
Figure 7 place
In the second stage of analysis, we tried to find the effects of different factors on the rating of different SKU’s. This was done with the help of the ANOVA. In ANOVA the null hypothesis is that there is no variation in the population means ie μ1=μ2=….=μn . the null hypotheses that we have stated here are listed below
Null Hypotheses:
a. There is no significant difference in rating for the different SKU’s kept by Pepsi on the level of satisfaction derived by the customer.
b. There is no significant difference in rating due to gender on the level of satisfaction derived by the customer.
c. There is no significant difference in rating due to age group on the level of satisfaction derived by the customer.
d. There is no significant difference in rating due to the income on the level of satisfaction derived by the customer.
e. There is no significant difference in rating due to the nature of drink on the level of satisfaction derived by the customer.
f. There is no significant difference in rating due to the peer on the level of satisfaction derived by the customer.
g. There is no significant difference in rating due to the place on the level of satisfaction derived by the customer.
These null hypotheses are being tested by one way ANOVA using SPSS software. The results are being tabulated below.
Table 1: ANOVA results for SKU
ANOVA
rating
Sum of Squares df Mean Square F Sig.
Between Groups 9.018 4 2.255 .300 .037
Within Groups 713.342 95 7.509
Total 722.360 99
Table 2: ANOVA results for gender
ANOVA
rating
Sum of Squares df Mean Square F Sig.
Between Groups 24.891 1 24.891 3.463 .066
Within Groups 697.291 97 7.189
Total 722.182 98
Table 3: ANOVA results for age group
ANOVA
rating
Sum of Squares df Mean Square F Sig.
Between Groups 9.325 2 4.663 .634 .533
Within Groups 713.035 97 7.351
Total 722.360 99
Table 4: ANOVA results for income
ANOVA
rating
Sum of Squares df Mean Square F Sig.
Between Groups 1.534 2 .767 .103 .902
Within Groups 720.826 97 7.431
Total 722.360 99
Table 5: ANOVA results for nature
ANOVA
rating
Sum of Squares df Mean Square F Sig.
Between Groups 1.328 1 1.328 .180 .002
Within Groups 721.032 98 7.357
Total 722.360 99
Table 6: ANOVA results for peer
ANOVA
rating
Sum of Squares df Mean Square F Sig.
Between Groups 26.935 3 8.978 1.239 .001
Within Groups 695.425 96 7.244
Total 722.360 99
Table 7: ANOVA results for place
ANOVA
rating
Sum of Squares df Mean Square F Sig.
Between Groups 14.794 3 4.931 .669 .032
Within Groups 707.566 96 7.370
Total 722.360 99
All the above test have been done at 95% confidence level. When we observe the probability
values we can see that for SKU, nature, peer and place the ANOVA shows significant
values, ie less than 0.05 which is the α value or confidence level.
Therefore we can infer that SKU, nature, peer and place shows significant effect on the
rating or the satisfaction derived by consuming pepsi. On the other hand it can be seen that
the income, gender and age group do show any significant effect on the ratings. But it’s too
early to draw conclusions.
Next we used MANOVA to find the level of interaction of these variables on the rating or the
amount of satisfaction derived by the customer.
The results are tabulated below. This part of analysis was done with the help of statistical
package called Minitab.
General Linear Model: RATING versus SKU, GENDER
MANOVA for SKUs = 1 m = 1.0 n = 45.5
Test DFCriterion Statistic F Num Denom PWilks' 0.98634 0.322 4 93 0.863Lawley-Hotelling 0.01385 0.322 4 93 0.863Pillai's 0.01366 0.322 4 93 0.863Roy's 0.01385
MANOVA for GENDERs = 1 m = -0.5 n = 45.5
Test DFCriterion Statistic F Num Denom PWilks' 0.96505 3.368 1 93 0.070Lawley-Hotelling 0.03622 3.368 1 93 0.070Pillai's 0.03495 3.368 1 93 0.070Roy's 0.03622
General Linear Model: RATING versus SKU, AGE
MANOVA for SKUs = 1 m = 1.0 n = 45.0
Test DFCriterion Statistic F Num Denom PWilks' 0.99394 0.140 4 92 0.067Lawley-Hotelling 0.00610 0.140 4 92 0.067Pillai's 0.00606 0.140 4 92 0.067Roy's 0.00610
MANOVA for AGEs = 1 m = 0.5 n = 45.0
Test DFCriterion Statistic F Num Denom PWilks' 0.92475 2.496 3 92 0.005Lawley-Hotelling 0.08138 2.496 3 92 0.005Pillai's 0.07525 2.496 3 92 0.005Roy's 0.08138
General Linear Model: RATING versus SKU, INCOME
MANOVA for SKUs = 1 m = 1.0 n = 45.5
Test DFCriterion Statistic F Num Denom PWilks' 0.98974 0.241 4 93 0.914Lawley-Hotelling 0.01036 0.241 4 93 0.914Pillai's 0.01026 0.241 4 93 0.914Roy's 0.01036
MANOVA for INCOMEs = 1 m = 0.0 n = 45.5
Test DFCriterion Statistic F Num Denom PWilks' 0.98932 0.502 2 93 0.607Lawley-Hotelling 0.01080 0.502 2 93 0.607Pillai's 0.01068 0.502 2 93 0.607
Roy's 0.01080
Before we go to interpretation of the results obtained, we like to provide some basic
information about the output obtained.
Approx. F - This is the approximate F statistic for the given effect and test statistic.
Hypoth. DF - This is the number of degrees of freedom in the model.
Error DF - This is the number of degrees of freedom associated with the model errors.
There are instances in MANOVA when the degrees of freedom may be a non-integer.
Sig. of F - This is the p-value associated with the F statistic and the hypothesis and error
degrees of freedom of a given effect and test statistic. The null hypothesis that a given
predictor has no effect on either of the outcomes is evaluated with regard to this p-value. For
a given alpha level, if the p-value is less than alpha, the null hypothesis is rejected. If not,
then we fail to reject the null hypothesis. In this example, we reject the null hypothesis that
group has no effect on the three different ratings at alpha level .05 because the p-values are
all less than .05.
Pillais - This is Pillai's Trace, one of the four multivariate criteria test statistics used in
MANOVA. We can calculate Pillai's trace using the generated eigenvalues (see superscript
m). Divide each eigenvalue by (1 + the eigenvalue), then sum these ratios.
Hotellings - This is Lawley-Hotelling's Trace. It is very similar to Pillai's Trace. It is the sum
of the eigenvalues (see superscript m) and is a direct generalization of the F statistic in
ANOVA. We can calculate the Hotelling-Lawley Trace by summing the characteristic roots
listed in the output
Wilks - This is Wilk's Lambda. This can be interpreted as the proportion of the variance in
the outcomes that is not explained by an effect. To calculate Wilks' Lambda, for each
eigenvalue, calculate 1/(1 + the eigenvalue), then find the product of these ratios.
Roys - This is Roy's Largest Root. We can calculate this value by dividing the largest
eigenvalue by (1+largest eigenvalue). Because it is based only on the maximum eigenvalue,
it can behave differently from the other three test statistics. In instances where the other
three are not significant and Roy's is significant, the effect should be considered not
significant.
Note - This indicates that the F statistic for Wilk's Lambda was calculated exactly. For the
other test statistics, the F values are approximate (as indicated by the column heading).
Here too we will be interested in the values of the probability. When observe the values we
can see that the results of MANOVA for rating versus SKU and Age show significant results.
MANOVA was also done for 95% confidence level. Thus MANOVA results for Rating versus
SKU and Age shows that the probability value as 0.005. this shows that the null hypothesis
that there is no effect of the age on the rating can be rejected. Thus it can be inferred that
there is some effect of the age on the rating of the amount of satisfaction derived by
consuming pepsi.
Therefore we can conclude that there five factors which show some effect on the ratings.
They are SKU, age, nature, peer and place.
In the final stage of the analysis we tried to cluster the respondents depending upon the
factors affecting in deriving satisfaction by drinking pepsi.
The first step in cluster analysis is to find the approximate number of clusters in the sample.
This can be found out by the hierarchical clustering. The results of hierarchical clustering is
tabulated below.
Table 7: Hierarchical Clustering
Agglomeration Schedule
Stage
Cluster Combined
Coefficients
Stage Cluster First Appears
Next StageCluster 1 Cluster 2 Cluster 1 Cluster 2
1 69 88 .000 0 0 28
2 36 63 .000 0 0 72
3 18 55 .000 0 0 29
4 5 51 .000 0 0 26
5 8 42 .000 0 0 55
6 38 40 .000 0 0 12
79 6 25 4.500 19 64 86
80 11 17 4.694 61 75 89
81 5 12 4.731 71 72 92
82 2 30 4.750 76 0 95
83 7 28 4.778 68 58 87
84 9 59 5.000 74 0 90
85 10 52 5.000 70 0 88
86 3 6 5.156 73 79 89
87 7 21 5.444 83 46 90
88 10 15 6.133 85 63 91
89 3 11 6.549 86 80 93
90 7 9 6.883 87 84 97
91 10 14 7.375 88 62 94
92 5 34 7.508 81 78 94
93 3 83 7.630 89 39 95
94 5 10 8.250 92 91 96
95 2 3 9.042 82 93 98
96 5 76 9.118 94 0 97
97 5 7 9.244 96 90 98
98 2 5 12.467 95 97 99
99 1 2 13.295 77 98 0
Agglomeration schedule gives the maximum number of the clusters that can be present
within the data. From this table we can find out approximate number of clusters present
in the population. This can be found by taking the differences of the values in the
coefficients column. In the table given above we can the maximum value of difference
between coefficients can be found in between the coefficients of 97 and 98. The value of
this difference is 3.322. Thus the approximate number of the clusters present in the
population can be defined as three. Now. K-means cluster analysis asks for number of
clusters present. This is where the initial number cluster found by the hierarchical cluster
comes into action. The results of K-means cluster analysis is being tabulated below.
Table 8: Initial Cluster Centers
Initial Cluster Centers
Cluster
1 2 3
stock keeping unit 3.00 1.00 5.00
gender 2.00 1.00 1.00
age group 4.00 1.00 3.00
income 3.00 1.00 2.00
nature 1.00 1.00 1.00
peer 2.00 2.00 2.00
place 1.00 4.00 4.00
Table 9: Number of Cases in each Cluster
Table 10: ANOVA
ANOVA
Cluster Error
F Sig.Mean Square df Mean Square df
stock keeping unit 52.979 2 .753 97 70.355 .000
gender .139 2 .252 97 .550 .579
age group 13.362 2 .908 97 14.709 .000
income .675 2 .598 97 1.128 .328
nature .225 2 .062 97 3.600 .031
peer .371 2 .386 97 .961 .386
place 19.114 2 .542 97 35.292 .000
Table 8 shows the initial clusters present int the data. Based on this table we can draw
conclusions about the cluster.
Cluster 1
The respondents in cluster one have given the maximum rating for the SKU of 1 lit. The
cluster one mainly consists of female respondents. Also this cluster mainly consists of
Number of Cases in each Cluster
Cluster 1 31.000
2 39.000
3 30.000
Valid 100.000
Missing .000
respondents who are above 30 years of age. The average income of the respondents of this
cluster is more than Rs150000. The respondents like to consume pepsi when it is chilled.
Moreover the respondents like to enjoy pepsi the most with friends and at their home.
Cluster 2
The respondents of cluster 2 have liked the pepsi my can the most. This cluster mainly
consists of male respondents. The average age of the members of this cluster is less than
18 years of age whereas the average income is less than Rs 50000. The respondents like
pepsi when it is chilled and like to enjoy it most went they are with their friends and at
hangout centres like malls, theatres, etc.
Cluster 3
The respondents of cluster 3 have liked the 2 lit pack the most. This cluster mainly consists
of the male respondents with an average age of cluster one is 24 to 30 years of age and an
income of 50000 to 150000. The respondents of this cluster also like pepsi when it is chilled
and like to enjoy it with friends at different hang out centres.
Therefore we have classified the respondents into different clusters. now when we look at
the ANOVA table of the output, at 95% confidence level SKU, age group, place and nature
are the factors that seem to be insignificant. This can be interpreted as, the above factors
are those factors which are not significant in clustering, or in other terms the differences
between the clusters arise due to other factors. This is very well evident too. Since all the
clusters like pepsi chilled it does not make a criteria for differentiating between the clusters.
And when we compare these results with the results that we have obtained in ANOVA and
MANOVA, the results are consistent, ie, the same factors are significant there. Which means
that as a company, we have to give more importance to these factors because irrespective
of income, gender and peer all the customers like to enjoy pepsi when its served chilled, at
hangout centres when they are with friends.
4. Results
It was found by the analyses performed on the data collected that the SKU, nature, peer and
place showed significant effect on the ratings given for the level of satisfaction derived from
the consumption derived from pepsi. These factors, thus should be taken care of and new
promotion plans should be made for providing pepsi to the customers which satisfy these
requirements. Another intensive market research should be conducted to find out which type
of SKU is in most demand and how is it correlated with the demand and other factors like
age of the customers, the point of sale etc. after getting the results of these surveys some of
the existing SKU can be dropped off and others can be introduced.
It should companies responsibility to provide chilled pepsi to its customers. Of this some
methods should be found out like providing complimentary refrigerators to its high rolling
sales out lets like colleges, malls, multiplexes etc . more over and intensive ad campaigning
should be done to exploit the effect of peers on the sale of product. Ads should clearly be
able to relate to the younger generation so as to capitalize on their sentiments of friendship.
The pepsi should be easily made available at all those hangout centres in the city were the
youngsters usually meet and party.
Other factor that can be noticed is that the age group too is also an important factor. This
also should be taken into account and should be carefully analysed to see which age group
is the largest consumer and depending on that data we can plan strategies in future.
5. Conclusion
The results and the analysis of the results have been discussed in detail in above sections.
In this section we like to conclude our work with few suggestions for the scope of future
work. As it is evident a lot of other research data is also needed for to draw a comprehensive
and complete conclusion. We have to check the impact of the age group on the sales of the
pepsi. For this a survey has to be conducted to find the important data. Next we also need to
find which kind of ads has the maximum impact and to see if the media also has some
impact on the sales. We have check for the correlation of these factors and sales.
References
• Levin, R. I. and Rubin, D.S., Statistics for Management (Pearson Education )
Anderson, Sweeney, Williamson, Statistics for Business and Economics, Cengage
Learning
Narkundkar, R , Marketing Research-Text and Cases, Tata McGraw Hill
Appendix
SL N O SKU RATING GENDER
AGE GROUP
INCOME
NATURE PEER PLACE
1 4 9 1 4 2 2 2 12 5 9 2 3 2 2 2 33 4 6 1 2 3 2 2 2
4 3 5 2 4 3 2 2 15 2 7 2 3 3 2 2 46 5 9 1 1 3 2 3 47 2 4 2 1 3 2 3 38 4 8 2 3 3 2 2 29 2 3 2 2 2 2 2 1
10 2 7 2 4 3 2 2 211 4 1 2 2 1 2 2 312 3 10 1 3 1 2 2 413 5 9 2 1 3 2 2 314 2 8 1 3 3 2 4 215 1 5 2 4 1 2 3 216 4 4 1 2 2 2 2 317 3 6 2 1 2 2 2 418 3 5 2 2 3 2 2 419 5 1 1 4 1 2 2 420 2 6 1 3 3 2 2 421 1 8 2 2 1 2 2 322 3 9 1 4 2 2 2 423 4 10 1 3 3 2 2 424 3 2 2 1 2 2 2 425 5 9 1 2 2 1 2 426 5 5 1 4 3 2 2 427 5 7 2 2 2 2 2 428 1 6 1 3 2 2 3 429 3 4 2 1 1 2 1 330 5 2 1 4 1 2 1 331 4 3 2 1 3 2 2 332 2 9 1 3 1 2 3 333 1 4 1 2 3 2 4 334 3 6 1 4 2 2 4 335 5 5 1 2 3 2 2 336 2 7 1 3 2 1 2 337 4 2 1 3 1 1 2 438 2 1 1 2 2 2 2 239 5 1 2 4 3 2 2 240 2 10 1 2 2 2 2 241 1 8 1 1 2 2 2 342 4 9 2 3 3 2 2 243 2 5 2 3 2 2 3 344 2 3 2 2 3 2 3 345 4 2 2 4 2 2 3 346 2 9 2 4 3 2 3 347 4 2 2 1 1 2 3 348 3 7 2 2 3 2 3 44 3 8 2 4 2 2 2 4
50 3 5 2 2 1 2 2 451 2 4 2 3 3 2 2 452 1 3 2 4 2 2 2 453 1 2 2 1 1 2 2 454 3 2 1 3 2 2 2 455 3 1 2 2 3 2 2 456 2 7 2 3 1 2 2 457 1 5 1 1 3 2 2 358 1 4 2 1 3 2 2 359 1 3 2 1 3 2 2 160 3 5 1 4 2 2 2 161 2 9 2 2 3 1 2 262 3 4 1 1 3 1 2 263 2 7 1 3 2 1 2 364 3 3 2 1 1 2 2 465 5 6 1 3 2 2 2 466 3 9 1 3 3 2 2 467 5 2 1 4 2 2 2 468 3 3 2 2 2 2 2 469 4 5 2 1 3 2 2 470 5 8 2 3 1 2 2 471 1 6 1 2 3 2 3 472 5 3 2 3 2 2 3 473 5 4 2 1 2 2 3 474 4 4 2 3 1 2 2 475 3 7 1 4 3 2 2 476 2 4 1 2 1 2 4 477 5 2 1 1 3 2 4 478 3 9 1 2 2 2 2 379 3 10 1 3 1 2 2 380 4 10 1 1 3 2 2 381 2 3 2 1 2 2 2 382 5 8 2 3 2 2 2 183 5 1 1 2 1 2 2 284 1 6 2 3 3 2 2 285 5 8 2 4 2 2 2 286 4 9 2 1 2 2 2 487 3 3 1 1 2 2 2 488 4 5 2 1 3 2 2 489 3 6 1 1 1 2 2 490 5 3 2 3 3 2 1 491 4 2 2 4 3 2 3 192 2 6 1 2 2 2 3 293 1 9 1 3 1 2 3 294 5 10 1 2 2 2 3 295 3 7 2 3 3 2 3 4
96 2 8 2 2 2 2 2 497 4 8 2 2 1 2 2 498 1 4 2 2 3 2 2 499 2 7 1 4 2 2 2 4
100 3 4 2 4 3 1 4 4