bringing big data to life

Post on 15-Apr-2017

470 Views

Category:

Business

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Jeroen Hardon

Bringing big data to life

1.3Exabytes

2.9 Million

Per second

375 Megabytes

Per day

24Petabytes

Per day

50Million

Per day

700Billion

Minutes per month

73Items

Per second

Big data is everywhere

20Hours

Per minute

A journey in segmentation with data scientists and big data.

What was the problem?

What was the solution?

How well did it work?

Needs-basedsegmentation

7 segments created

Classifier tool build, using 10

questions

Original segmentation study

This resulted in a happy client.

“Let’s tag a segment to each person in our database of 40 million“

12.000 people from the database

answered the classifier questions

Those 12.000 were classified

in 1 of the 7 segments

Attitudinal segments not explained by

demographics

Attitudes ≠ Demographics

Revised segments should align better with

big data

Must predict original

segments in segmentation

study

Merging the 2 types of

data

New classification tool

The database and survey

demographics did not match

We build classifiersby matching survey

data to resemble the database

We generated many samples of our survey data and

built an ensemble of classifiers

Ensembles

While building ensembles of classifiers helped, it was still inadequate.

We needed to strengthen the demographic / behavioral signal

Expectation Maximization

?

Expectation Maximization

5

Expectation Maximization

How do I "assign" each of

the individual fruits to a tree

type?

What are the characteristics of the fruit of each

tree type?

Expectation Maximization

Expectation Maximization

Expectation Maximization

Observed DataInitial segmentation data

6500 respondents

Augment of 12000from Big Data

Knownfixed

segment

Unknown segment

+ Model 1

Expectation Maximization

Observed DataInitial segmentation data

6500 respondents

Augment of 12000from Big Data

Knownfixed

segment

Unknown segment

+Big data variables

Model 2

We got classifiers that were slightly less optimal in predicting survey data, but much more aligned with the big data.We made sure to not let the predictive accuracy drop below 70% (originally 80%)

How well did it work?

Seg 1 Seg 2 Seg 3 Seg 4 Seg 5 Seg 6 Seg 7Seg 1 564 84 15 56 36 14 18Seg 2 68 844 84 13 7 13 10Seg 3 33 72 561 2 3 1 5Seg 4 34 8 0 567 5 81 29Seg 5 27 12 1 6 635 50 57Seg 6 21 27 6 76 43 873 30Seg 7 18 28 9 50 59 52 1193

Initi

al c

lass

ifier

se

gmen

t

Revised classifier segment

Only 19% changed

Data Source: Survey Data of 6500

How well did it work?

Seg 1 Seg 2 Seg 3 Seg 4 Seg 5 Seg 6 Seg 7Seg 1 135 102 18 66 207 157 45Seg 2 119 545 171 58 174 203 101Seg 3 55 113 316 44 240 219 72Seg 4 90 67 4 283 233 287 69Seg 5 303 169 41 216 1994 925 205Seg 6 325 259 36 261 646 1591 127Seg 7 52 26 3 90 193 191 156

Initi

al c

lass

ifier

se

gmen

t

Revised classifier segment

Over 58% changed

Data Source: Augment of 12000

Conclusions

Big data cannot predict

everything

No need to be scared of big data.

Surveys and big data can coexist

Expectation maximization

provides a framework for joint modeling

So what?

So what?

So what?

So what?

top related