set-top box analytics - dell · 2020-06-06 · boston demographics and tv viewing patterns ......

31
SET-TOP BOX ANALYTICS Srinivasan Sivaramakrishnan Dell EMC [email protected] Amarendra Tummala Dell EMC [email protected] Luciano Tozato Dell EMC [email protected] Wei Lin Dell EMC [email protected]

Upload: others

Post on 12-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

SET-TOP BOX ANALYTICS

Srinivasan SivaramakrishnanDell [email protected]

Amarendra TummalaDell [email protected]

Luciano TozatoDell [email protected]

Wei Lin Dell [email protected]

Page 2: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 2

Table of Contents

Overview ...........................................................................................................................................3

Introduction ......................................................................................................................................3

Key Findings and Benefits ......................................................................................................................... 3

Methodology ............................................................................................................................................. 4

Data Sources and Discovery ...............................................................................................................5

Boston Demographics and TV Viewing Patterns ..................................................................................6

Total Duration Watched by Ethnicity ........................................................................................................ 7

Breakdown of Duration Watched and Size of Household by Region ........................................................ 7

Breakdown of Duration Watched by Channel and Gender ...................................................................... 8

Box Plot of Total Duration ......................................................................................................................... 8

Breakdown of Duration Watched by Age, Household Size and Income group ........................................ 9

TV Commercial Analysis ................................................................................................................... 10

Commercial Types ................................................................................................................................... 10

Commercial Popularity ............................................................................................................................ 11

Channel Analysis .............................................................................................................................. 13

Primetime TV View .................................................................................................................................. 13

Daily Channel Popularity ......................................................................................................................... 14

Most Watched ......................................................................................................................................... 16

Subscriber Watch Pattern ....................................................................................................................... 17

Subscriber Propensity Index ............................................................................................................. 18

Findings I: Predict likely to Watch Commercials ................................................................................ 20

Findings II: Subscriber Viewership Profiling ....................................................................................... 22

Exploring High Performers of Cluster 5 ................................................................................................... 26

Findings III: Predict and Classify Subscribers...................................................................................... 28

Return on Investment ...................................................................................................................... 29

Conclusion ....................................................................................................................................... 30

References ....................................................................................................................................... 30

Page 3: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 3

Overview

This article discusses applying data science on TV subscriber behavior and viewership patterns. It

illustrates how the TV broadcasting industry can benefit from using data analytics on TV audience

behavioral patterns. It describes how a TV viewer channel switching behavior can be analyzed to

generate numerous data analytics results. Generally, TV Broadcasting and Advertisement industries use

Nielsen ratings as the audience measurement to determine the audience size and composition of

television programming in the United States. Instead of relying on Nielsen ratings, now TV broadcasters

can generate their own metrics by applying data analytics on TV viewers channel switching behavior.

This is just one example. There are many other uses for this type of data analytics.

The recent digital revolution has expanded TV viewing from traditional TV sets to other internet-based

devices. Correspondingly, TV signal transmission is also expanded from traditional set-top box (STB) to

other streaming devices such as IPTV devices and Over-the-Top (OTT) services. Regardless of how the TV

program signal is transmitted or on what device it is watched, underneath there are always various

watching patterns. Still, the basic behavior of the TV subscriber is still the same; they switch channels to

watch programming of their liking. This data science can be applied to all types of TV watching models

as long as there is a way capture subscriber click streams. This article is a study that focuses on STB data

and data analytics on it.

Introduction

Set-Top-Box (STB) Analytics is a data science on TV subscriber behavior and viewership patterns. In

recent years, the media industry has embraced digital technology in big – and many – ways. One of the

noticeable changes is TV and Cable industries sending their broadcast signals in digital form. On the

receiving end, households are getting equipped with two-way communication capable STB devices. Not

only do these devices receive broadcast signals, they also enable TV viewers to request on-demand

programing. Additionally, these devices are capable of collecting clicking behavior of viewers. This has

opened up opportunities to collect and analyze second-by-second channel clicking behavior from

millions of households. Combining this data with detailed TV broadcaster airing logs provides a wealth of

insights into TV audience behavior [8], a veritable goldmine of data on TV audience. Applying data

science on this data opens up many opportunities to TV broadcasters.

Key Findings and Benefits

STB Analytics will help TV broadcasters change their business models from broad audiences to individual/localized content consumers.

Provided valuable insights into subscriber viewing patterns

o Derived a new metric called Viewer Propensity index (PI) that measures an

uninterrupted TV viewing pattern

o Predicted and Classified the subscriber population into Avid and Normal viewers based

on propensity index

Page 4: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 4

o Created subscriber profiles and segments based on demographics attributes and TV

viewing patterns through clustering

o Used collaborative filtering to promote the subscribers from a lower to higher

propensity index within the same cluster

o Analyzed popularity of programing content and commercials by timeslots

o Built a subscriber – program – commercial value chain to create a viewership behavior

profile for subscribers

o Predicted future programing and recommended ‘likely to watch’ commercials

These insights enabled targeted programing and commercials based on viewer segments and

behavior, thus enabling better campaign management.

TV broadcasters can generate their own metrics instead of relying on Nielsen’s ratings. These

metrics help reduce cost and increase revenue by enabling them to:

o Negotiate lower content/programing fees and thus reduce the overall cost

o Negotiate higher advertisement rates on popular and likely to watch programs and thus

increase revenue

This analytics can help to realize a potential Return on Investment value of above 100% over a

five year period, with increased advertisement revenue and decreased programing cost.

Methodology

Various analytics models can be used to analyze STB’s second-by-second clicking behavior from millions

of households and combining that data with detailed TV Broadcaster airing logs and viewer demographic

information.

Using Decision Tree [9] based Model, this analytics can classify subscribers into Avid and Normal

viewers based on the viewer propensity index, which in turn helps TV broadcasters generate

their own metrics and match them against Nielsen ratings in negotiating lower programing fees.

Using Association Rules [5] and Confidence Metrics, this analytics can recommend ‘likely to

watch’ commercials with a percentage of probability. For example, subscribers who have

watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

recommendations along with the higher propensity index can be used to negotiate higher

advertisement rates.

Using K-Means clustering [4], the population was segmented into cluster groups based on

demographic attributes, propensity index and viewing duration. These clusters can help in

profiling subscribers based on the above characteristics, which helps in better campaign

management and targeted commercials.

Using Link Analysis [10], customized subscriber – program – commercial value chain segments

can be created to understand the subscriber watch preferences for Avid viewers.

Page 5: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 5

By applying collaborative filtering [11] on Avid subscribers’ key Association Rules from each

cluster, TV broadcasters can target the Normal viewers within that cluster to promote them as

potential “Avid viewers”. This increases revenue by targeting this untapped potential.

Data Sources and Discovery

The input data source for this STB Analytics comes from both TV broadcasting processes and STB device

click stream data. Here is the list of simulated primary input data sets.

Subscriber Data: Subscriber data includes subscriber demographics information including

socioeconomic attributes [1] of TV viewers. This will help in understanding more about subscribers.

TV Playlist: This dataset contains TV broadcasters’ internal TV guide [3] which includes program,

commercial, promo, etc. and airing details. This will help us understand the types of content viewers

are exposed to.

STB Data: This contains click stream viewership activity, i.e. channel watching behavior details for

each subscriber. This data can then be joined with the TV playlist to see which subscribers have

watched which content.

Content: Content and commercial [2] including details such as Type, Genre, etc.

Below is a snapshot of each simulated input data set.

Figure 1: Sample Input Data

This analysis is done on the viewing area in Boston, using that city’s demographic and economic profiles.

Analysis was performed on a small set of sample data created to the specifications below:

5000 TV Subscribers

4 TV Channels – CBS, NBC, FOX and ABC

Page 6: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 6

100+ Commercials in 30+ different categories

8500+ TV Playlist incidents for one month of primetime TV schedule

225,000 TV Viewing incidents from STB

4.7M+ rows of TV Commercial viewership

900K+ rows of TV Program viewership

Time frame from August 3 to 30th, 2015.

Primetime is considered to be 8 pm to 10 pm

STB Metrics

The sections below paper describe STB Analytics process and its findings in more detail. Before we get

there, it is important to understand these STB metrics terminologies.

Total Duration – The amount of time a program or commercial is watched on a STB.

Click count – The number of times a subscriber has switched channels back and forth.

Minutes/Click – Amount of Minutes Watched / Count of clicks.

Popularity – Number of times the AD/Commercial has been aired.

Boston Demographics and TV Viewing Patterns

Neighborhoods of Boston: The picture shows the different neighborhoods of Boston. The subscribers

from the following neighborhoods were considered for analysis.

Figure 2: Neighborhoods of Boston [6]

Page 7: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 7

Total Duration Watched by Ethnicity

The pie chart shows the percentage of TV Duration watched by different ethnicities of Boston

subscribers. We can see that White people have watched the highest % of TV – close to 47%. Asians

have watched 9% and African Americans have watched 23.15 % of Net Duration for August.

Figure 3: Duration Watched by Ethnicity

Breakdown of Duration Watched and Size of Household by Region

Figure 4: Duration Watched and Size of Household by Region

Page 8: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 8

Figure 4 above shows the breakdown of Net TV duration watched across all regions. The upper part

represents the size of household for each Boston neighborhood by region. We can see from the visual

that there are more people in Dorchester(household size) and hence have high total viewing duration.

Breakdown of Duration Watched by Channel and Gender

Figure 5: Duration Watched by Channel and Gender

In Figure 5, we see the breakdown of TV viewership by each gender and channel for all the

neighborhoods/regions of Boston. It can be seen that, compared to men, there are lot of women who

like to watch CBS. Dorchester again has a high proportion of female population watching CBS compared

to other regions as a whole for the month of August.

Box Plot of Total Duration

Figure 6 represents the box plot of total duration by region. It is seen again that Dorchester is an outlier

as it has a substantial major share of viewers who have watched more TV.

Page 9: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 9

Figure 6: Box Plot of Total Duration

Breakdown of Duration Watched by Age, Household Size and Income group

The next three charts show the breakdown of TV Duration watched by Age, Household Size and Income

group.

Figure 7 shows the sum of duration watched across all ages. We can see that the distribution is a bit

skewed toward the population of lower age as they have more share of duration.

Figure 7: Duration by age

Page 10: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 10

Figure 8: Duration by Household size

From Figure 8, we see that a household size of 2 has the highest amount of TV duration compared to

other household groups. Meanwhile, Figure 9 shows the breakdown of Total Duration by all income

groups. Income group “Less than $20,000” had a high share of total duration watched compared to

other income groups.

Figure 9: Duration by income group

TV Commercial Analysis

The next set of analysis focuses on the different type of Commercials and their popularity.

Commercial Types

Figure 10 is a bubble chart where each bubble is a commercial category sized by the number of times

they are aired during the month of August. The bigger the bubble, the larger the number of times they

were aired in August. We can see that Health and Beauty and Beverage are two of the most relayed

commercial types in August 2015.

Page 11: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 11

Figure 10: Commercial categories sized by popularity

Commercial Popularity

A deep drill of the commercials gets to the specific ads popular during the month of August in the

decreasing order of AD counts. Marriot and Princeton University were aired the most times for August.

Page 12: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 12

Figure 11: Commercials by popularity

The chart below shows the links between different commercial categories.

Figure 12: Links between commercials

Page 13: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 13

Channel Analysis

There are primarily four channels considered for TV viewership in August 2015; CBS, NBC, FOX and ABC.

Figure 13 shows the number of times viewers have turned onto each of the four channels. We can see

that CBS seems the most sought after channel as it had been sought more than 60,000 times during

primetime in August 2015. This is followed by NBC, ABC and FOX.

Figure 13: Clicker count by channel

Primetime TV View

Figure 14 shows that more content were aired between 8 pm to 9 pm than from 9 pm to 10 pm. Content

here refers to either program or commercial. Also, we see in the subsequent trend chart that only FOX

Page 14: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 14

has aired more content during 9 pm to 10 pm segment whereas all other channels have reduced the

primetime content as they move to 9 pm segment.

Figure 14: Primetime channel popularity

As seen in Figure 15, apart from FOX, the number of content segments (programs or commercials) has

gone down at 9 pm primetime when compared to 8 pm for all other channels.

Figure 15: Primetime channel popularity trend

Daily Channel Popularity

Figure 16 shows the count of commercials aired for each day in August for each channel. It can be seen

from the chart that as we move along the month of August, we see that CBS had an upward blip around

Page 15: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 15

August 12th and 18th when most of the other channel have aired relatively fewer commercials. Also, it is

clear that all four channels had a similar number of commercials aired over time in August. NBC is

relatively flat whereas ABC and CBS and FOX had more variations.

Figure 16: Daily channel popularity

Figure 17 depicts what is known as a Tree Map which is similar to a heat map. The darker the color the

greater the popularity in terms of number of commercials played. It can be seen that on Mondays CBS

and FOX are more popular than NBC and ABC. Similarly, ABC is more popular on Sundays when

compared to other channels. Thursday CBS primetime is the least popular.

Page 16: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 16

Figure 17: Heat map of commercial popularity

Most Watched

Figure 18: Heat map of program viewership

Page 17: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 17

Figure 18 shows the most watched program/commercials across all the subscriber population in Boston

for the month of August. We can infer that America’s Got Talent is most watched in terms of viewing

duration across all subscribers followed by 48 Hours and American Ninja Warriors.

Subscriber Watch Pattern

The two visuals below shows the subscriber viewing pattern based on minutes watched for the month of

August 2015. The shaded portion on the right represents the forecasted duration for the month of

September.

Subscriber 1

Figure 19: Subscriber 1 watch pattern

Subscriber 2

Figure 20: Subscriber 2 watch pattern

Page 18: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 18

Subscriber Propensity Index

Propensity Index quantifies and captures the uninterrupted TV viewership of a subscriber and which is

usually not measurable [7]. It is a measure of a subscriber’s viewing behavior which is calculated based

on the weighted sum of Age (Age Propensity), Household Size (House Propensity), and Minutes per Click

to give one standardized value for each subscriber. Propensity Index ranges from 0 to 1. Thus if a

subscriber has a value of 0.99, it means that subscriber has the best uninterrupted TV viewing behavior.

Propensity Index =

+

+

The summation of the attributes in the above three pictures gives us the value of Propensity Index. It is

very useful in the current digital TV phase where just flipping channels during commercials in their

TV/Set Top Boxes turned on doesn’t always translate to viewership. This index penalizes or flags

subscribers who just hop on to other channels during commercial breaks and not effectively a viewer of

commercials.

The snapshot below classifies them into Avid or Normal Watchers based on Propensity Index score. A

score greater than 0.5 indicates an avid watcher while less than 0.5 indicates a normal watcher.

Page 19: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 19

Figure 21: Subscribers by watch category

The above bar chart shows each subscriber colored by which category they fall into. There are 1556 Avid

Watchers and 3444 Normal Watchers out of a 5000 subscriber sample population overall.

Correlation with other attributes

The correlation matrix below shows there is extremely high positive correlation between Age and

Propensity Index when compared to Duration Watched and Propensity Index. Strong correlation

represents more area in the pie chart.

Page 20: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 20

Figure 22: Correlation graph

Findings I: Predict likely to Watch Commercials

We first map the Subscribers watch time to the commercials aired on that date as shown below in the

table snapshot. This gives us the viewership data for each subscriber.

Table 1: Subscriber STB click stream data

We then use the methodology of Association Rule Mining [5] to create key association across

commercials. Table 2 shows some top association rules based on confidence for Commercials on CBS

Primetime Saturdays for the Month of August in 2015. Some inference from key rules say there is 87.5 %

chance that subscribers who watched a commercial on Holland America Line is also likely to watch the

commercial on Nissan Motor Corp.

Page 21: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 21

Table 2: Key association rules

Page 22: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 22

Findings II: Subscriber Viewership Profiling

Through K-means Clustering [4], we created cluster segments to profile the subscribers into ten

different target groups as shown below in Figure 23. The cluster segments were driven by the attributes

duration watched, propensity index and other demographic attributes. The scatter plot of Propensity

Index (PI) Vs Duration Watched below shows each cluster segment in different colors. Each point

denotes a subscriber with labels for Age, PI and Ethnicity.

Figure 23: Scatter plot of subscriber cluster segments

Figure 24 below shows the Number of subscribers in each cluster colored by Watch Category. Cluster 5

has almost equal proportion of Avid and Normal watchers when compared to other clusters.

Page 23: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 23

Figure 24: Number of subscribers in each cluster

Figure 25: Zooming on low performers of cluster 5

The above snapshot shows subscribers in cluster 5 who have high watching duration but low

propensity index

Hence, within cluster 5, we can focus these subscribers as the new focus group who we can

promote to high propensity index

The link graph [10] on the bottom shows the commercial categories links among top performers

in Cluster 5

Page 24: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 24

Figure 26: Link graph of top subscriber commercial categories

Page 25: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 25

Table 3 lists the rules generated from running association rule mining on high performers of cluster 5.

Table 3: Association rules from cluster 5 high performers

High Performers in Cluster 5

With this information, we can see the top associations between the program and commercials

There is a 90 % chance that subscribers who have watched the program “48 Hours” are also

likely to watch the commercial on “Red Lobster” within 47% of these two item transactions

Top Rules:

We can use the recommendation rules of the high performers to treat, incubate and promote the low

performers within the same cluster. So, by repeating this process for the other 9 clusters, all the viewers

Page 26: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 26

with low PI score can be moved to high level within each corresponding clusters. This technique is very

similar to collaborative filtering [11] since it bases the recommendation based on his/her peers behavior

and habits.

Figure 27 shows the support and confidence for the key rules.

Figure 27: Association rules

Exploring High Performers of Cluster 5

The link graph [10] on the bottom shows the interaction between TV Program and high

performers from cluster 5

Each Program on the left is colored differently and the pie chart shows the subscriber watching

pattern for every node

Subscriber 3148 only watches “Extant” and “Bachelor in Paradise” whereas Subscriber 906

watches all 4 shows

Page 27: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 27

Figure 28: High performers Subscriber-Program link graph

Figure 29 shows some of interactions between Subscriber and commercial types

Figure 29: High performers Subscriber-Program-Commercial link graph

Page 28: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 28

Findings III: Predict and Classify Subscribers

One can predict and classify subscribers based on key attributes using decision tree model to determine

whether they will be an Avid Watcher or Normal watcher. In the classification tree [9] below, we see

that total duration and Clicker count as the key splitting attributes decide the Propensity Index apart

from Age and Household Size as they were directly used in creating the PI metric. So we try to predict

the Propensity Index for the new set of subscribers when we don’t have any information on their

demographics (Age, Household Size), etc. Hence, a fully mature model with an exhaustive training set

can definitely predict and classify subscribers into Avid or Normal watchers by just knowing Total

Duration they have watched TV and the amount of clicks.

In Figure 30, we see that the major split was at total duration of 1622 minutes. The next split is based on

whether if the clicker count falls above or below 39 or 45. These criteria decide whether one is going to

be an Avid or Norma Watcher. In the first bin (bottom left of the chart), we see that once you are a

subscriber with less than 142 minutes of total duration and a clicker count of less than 39, then 75% of

the time (probability of 0.75) one will be a Normal watcher and only 25% of the time one will be an Avid

Watcher. Similarly, the other 14 bins are constructed based on the splits.

Figure 30: Classification tree

Training set: 4000 subscribers

Test set: 1000 subscribers

Page 29: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 29

Classification Matrix

From the above matrix we see that 43 Avid Watchers and 636 Normal Watchers have been classified

correctly in the test set. Only 321 (278+43) subscribers have been classified incorrectly.

Table 4: Test set after classification

Test Misclassification rate: 321/1000 = 0.321

Model Accuracy with sample test data : 67.9%

The model can be matured as the training set increases

Return on Investment

There are measurable returns on this STB Analytics investment. These returns can be realized across the

board from small to larger size TV broadcasters. Along with the operational benefits, there is a

substantial financial return on investment. For example, with an investment of $8 million over five years,

we can predict additional revenue of $20 million with net return of $11 million. This is based on 1%

Page 30: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 30

increase every year on existing $200 million AD revenue. Similarly, reduce $1 per viewer on programing

cost on a 2 million viewer base every year.

Conclusion

This paper showcases the various data science/analytical methods that can be leveraged to address

some of most common challenges in the Set Top Box industry. With the newly devised metric in

Propensity Index, it is now possible to quantify the uninterrupted TV viewership pattern. Through

clustering, we were able to profile and build customer segments based on subscriber preferences and

habits. Using association rules, we can treat, incubate and promote them with the right set of programs

and commercials that can make them an Avid Subscriber. In this away, we can tap in to the untapped

potential hidden inside each cluster segment. By further dissecting and analyzing the subscriber-

program-commercial linkage value chain, it is now possible to build personalized offers and

recommendations for every subscriber. This analytics can change the business model from broader

content management to more customized content creation and marketing. This not only results in

better campaign management but the analytics also help broadcasters negotiate higher advertising

rates, thus increasing revenue. The analytics can be enriched further by bringing in additional external

data sources from Census, Zillow etc. and also from social media sources like Facebook, Twitter, Yelp,

etc. in order to build a holistic analytic solution.

References

[1] http://www.bostonredevelopmentauthority.org/research-maps/research/overview

[2] http://www.ispot.tv/browse

[3] http://www.tvguide.com/listings/

[4] [https://en.wikipedia.org/wiki/K-means_clustering

[5] https://en.wikipedia.org/wiki/Association_rule_learning

[6] [http://www.overdosesolutions.net/?page_id=19

[7] Consumer Micro-Behavior and TV Viewership Patterns: Data Analytics for the Two-Way Set-Top Box-

ICEC_2012_CATV_viewership_analytics by Ray M. Chang, Robert J. Kauffman and Insoo Son

[8] Evaluating TV Ad Campaigns Using Set-Top Box Data – Google, Inc.by Sundar Dorai-Raj, Yannet

Interian, and Dan Zigmond

[9] Decision Trees for Predictive Modeling – SAS Institute Inc – by Padraic G. Neville

[10] http://www.csc.ncsu.edu/faculty/samatova/practical-graph-mining-with-

R/sample/chapter_5_LinkAnalysis.pdf

[11] https://en.wikipedia.org/wiki/Collaborative_filtering

Page 31: SET-TOP BOX ANALYTICS - Dell · 2020-06-06 · Boston Demographics and TV Viewing Patterns ... watched “Old Navy Store” are likely to watch “Rolex” with 88% probability. These

2016 EMC Proven Professional Knowledge Sharing 31

Dell EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” DELL EMC MAKES NO RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying and distribution of any Dell EMC software described in this publication requires an applicable software license.

Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries.