big data opinion analisis sabm latam 201404

30
[Big Data] Simple Exercise of Consumer Preferences Analysis Based on Twits for SAB Miller LATAM Brands By Gustavo Pabón May 2014

Upload: gustavo-pabon

Post on 09-Apr-2017

126 views

Category:

Documents


3 download

TRANSCRIPT

[Big Data] Simple Exercise of Consumer Preferences Analysis Based on Twits for SAB Miller LATAM Brands

By Gustavo Pabón – May 2014

Our group mission:

To own and nurture local and

international brands that are the

first choice of the consumer

Our group mission:

To own and nurture local and

international brands that are the

first choice of the consumer

How to measure the

consumer preferences?

Our group mission:

To own and nurture local and

international brands that are the

first choice of the consumer

How to measure the

consumer preferences? Twitter may help

Our group mission:

To own and nurture local and

international brands that are the

first choice of the consumer

How to measure the

consumer preferences? Twitter may help

Here it is presented the result of a simple exercise of consumer preferences analysis based on twits from 2nd of April of 2014 to 26th of April of 2014.

Here it is presented the result of a simple exercise of consumer preferences analysis based on twits from 2nd of April of 2014 to 26th of April of 2014.

On a scale from 1 to 5*, the weighted average of SAB

Miller LATAM consumer preference was:

4.76

* The scale will be explained in next slide

© SABMiller plc 2012

Internal Use / Confidential / Secret

Exercise Summary

Twits sample streaming range of dates and filter

Twits were streamed from 04/02/2014 to 04/26/2014 with a GAP from 04/12/2014 to 04/15/2014. The GAP was due to a technical issue on the streaming program. The filter used was based on keywords presented on the next slide.

Twits sample size

The raw data size was 33.853 twits.

First step of data selection was filtering twits not related to SAB Miller LATAM / Global Brands using bag of words technique. The result was a reduction to 20.044 twits.

Second step of data selection was filtering twits not related to consumer preferences using crowd sourcing (Amazon Mechanical Turk). The result was a reduction to 3669 twits.

Consumer preference scale from 1 to 5

Using crowd sourcing, each twit were classified by 10 different people in three categories: 1 negative preference, 5 positive preference, 3 neutral. If more than 6 people agree on the preference, the twit was classified on such preference, if not, the twit was classified as neutral.

Why crowd sourcing?

It is very difficult for an automatic sentiment analysis program to work with twits. They are usually not well written, have a lot of slangs and sarcasms. In addition Spanish internet language is not as studied as English. Human raters typically agree 79% of the time*, while a program is at most 70% accurate. The first run of an automatic sentiment analysis were able to classify just 151 twits.

8 Presentation information in footer

* Taken from Ogneva, M. "How Companies Can Use Sentiment Analysis to Improve Their Business". Retrieved 2012-12-13.

© SABMiller plc 2012

Internal Use / Confidential / Secret

Keywords used for streaming filter (1 of 2)

Global brands

@Grolsch, #Grolsch, @Miller_Global, @MillerCoors, #MillerGenuineDraft, @Birra_Peroni, @peroniclub, #PeroniNastroAzzurro, #peroni, @Pilsner_Urquell, #PilsnerUrquell, @MillerLite, #millerlite, @MGD_Argentina, @MillerLiteAR, @MillerLiteCol, @millerlitehn, @MillerPanama, @MillerLitepa, @Miller_SLV.

Argentina’s brands

@CervezaIsenbeck, #isenbeck, @Warsteiner, @WarsteinerAR, #Warsteiner.

Colombia’s brands

@CervezaAguila, #AguilaLight, #aguila, #CervezaAguila, @clubcolombia, #ClubColombia, #clubcolombiadorada, #clubcolombiaroja, #clubcolombianegra, @cervezacostena, #CervezaCosteña, #CervezaCostena, @PilsenCerveza, #Pilsen, @CervezaPoker, #CervezaPoker, @pokerligera, #pokerligera, #colaypola, @ReddsColombia, #redds.

Ecuador’s brands

@ClubPremiumEc, #ClubPremium, #ClubPremiumRoja, #ClubPremiumNegra, @cervezaconquer, @PilsenerEcuador, @Miller_Ecuador

9 Presentation information in footer

© SABMiller plc 2012

Internal Use / Confidential / Secret

Keywords used for streaming filter (2 of 2)

Salvador’s brands

@BarenaHN, #Barena, @Barena_Peru, @PilsenerSV, @PilsenerLiteSV, #PilsenerLite, #Pilsener, @RegiaSV, #regiaextra, @SupremaSV, #cervezasuprema, @GoldenSV, #cervezagolden

Honduras’ brands

@ImperialHN, #CervezaImperial, #imperialhn, @PortRoyalhn_com, @SalvaVidaHn, #salvavida, #salvavidahn, #cervezasalvavida, @BarenaHN, #Barena

Panamá’s brands

@cervezaaltlas, #cervezaatlas, @Cerveza_BALBOA, #cervezabalboa

Peru’s brands

@cerarequipena, #cervezaarequipena, #cervezaarequipeña, #arequipeña, #arequipena, #Barena, @Barena_Peru, @CristalPeru, #cervezacristal, @cusquenaperu, #cusqueñaperu, #cusquenaperu, #cervezacusqueña, #cervezacusquena, #cusqueñamalta, @Pilsen_Callao, #PilsenCallao, @Pilsen_Trujillo, #PilsenTrujillo, #CervezaSanJuan, @Backus_Ice, #BackusIce

10 Presentation information in footer

Result Analysis

© SABMiller plc 2012

Internal Use / Confidential / Secret

Consumer preference rating consolidated by country

12 Presentation information in footer

© SABMiller plc 2012

Internal Use / Confidential / Secret

Consumer preference rating consolidated by country

13 Presentation information in footer

Number of twits

© SABMiller plc 2012

Internal Use / Confidential / Secret

Consumer preference rating consolidated by country

14 Presentation information in footer

Average consumer

preference rate

© SABMiller plc 2012

Internal Use / Confidential / Secret

Consumer preference rating consolidated by country

15 Presentation information in footer

Using SSD (sum of squared distance) from “Number of twits”

and “rate”, Salvador did have the highest rate: 4,83.

© SABMiller plc 2012

Internal Use / Confidential / Secret

Consumer preference rating consolidated by country

16 Presentation information in footer

Using SSD, Argentina did

have the lowest rate: 4,32

© SABMiller plc 2012

Internal Use / Confidential / Secret

Consumer preference rating consolidated by brand

17 Presentation information in footer

© SABMiller plc 2012

Internal Use / Confidential / Secret

Consumer preference rating consolidated by brand

18 Presentation information in footer

Using SSD, Pilsener

Salvador did have the

highest rate: 4,85

© SABMiller plc 2012

Internal Use / Confidential / Secret

Consumer preference rating consolidated by brand

19 Presentation information in footer

Using SSD, Barena did

have the lowest rate: 3,94

© SABMiller plc 2012

Internal Use / Confidential / Secret

Consumer preference rating consolidated by date and day of week

20 Presentation information in footer

© SABMiller plc 2012

Internal Use / Confidential / Secret

Consumer preference rating consolidated by date

21 Presentation information in footer

GAP due to a technical issue

on the streamer program

© SABMiller plc 2012

Internal Use / Confidential / Secret

Consumer preference rating consolidated by day of week

22 Presentation information in footer

© SABMiller plc 2012

Internal Use / Confidential / Secret

Consumer preference rating consolidated by day of week

23 Presentation information in footer

From Wednesday to Friday the

number of twits increases as

well as the rate.

© SABMiller plc 2012

Internal Use / Confidential / Secret

Consumer preference rating consolidated by day of week

24 Presentation information in footer

From Friday to Sunday the

number of twits decreases as

well as the rate.

© SABMiller plc 2012

Internal Use / Confidential / Secret

Consumer preference rating consolidated by hour

25 Presentation information in footer

© SABMiller plc 2012

Internal Use / Confidential / Secret

Consumer preference rating consolidated by hour

26 Presentation information in footer

From 10 am to 1 am, most

of twits were posted. But,

in average, rates do not

change much.

© SABMiller plc 2012

Internal Use / Confidential / Secret

Word picture of common words of positive preference twits.

27 Presentation information in footer

© SABMiller plc 2012

Internal Use / Confidential / Secret

Word picture of common words of negative preference twits.

28 Presentation information in footer

Conclusion

I could conclude from this simple exercise that

sentiment and opinion analyses on twits related

to SAB Miller LATAM brands can be an

alternative tool to effectively measure

customer preferences.

[Big Data] Simple Exercise of Consumer Preferences Analysis Based on Twits for SAB Miller LATAM Brands

By Gustavo Pabón – May 2014