take-away tv: recharging work commutes with greedy and predictive preloading of tv content
TRANSCRIPT
Mining TV-on-demand Services
EPSRC project
Dmytro Karamshuk
Users - 32 M/month
IP address – 20 M/month
Sessions - 1.9 Billion
May 2013 – Jan 2014
≈ 50% of population
Large-scale study of BBC iPlayer
UK Population – 64M
2 x INFOCOM’2015, ToN’2015, JSAC’2016
Longitudinal View across ISPs
Fixed-line Internet market (5 representative providers)
Mobile market is more dynamic than the fixed-line Internet market
Mobile Internet market (5 representative providers)
Data caps decrease market share
All-you-can-eat data(M1, M5)
Limited-cap data packages(M2 – M4)
All-you-can-eat plans boost user consumption
Temporal Patterns in different ISPsFixed-line accesses (F1-F5) peaks
in the evening hours
Mobile users watch more during commutes
Fixe
d Li
ned
ISPs
Mob
ile, l
imite
d da
ta
caps
There is a problem…
Internet on trains in the UK is no good
A study shows that 23.2% 3G packets and 37.2% 4G packets on the major train routes failed
A useful insight: users watch across networks
Users complete watching across different sessions and networks
Fixed-line ISPs Mobile ISPs
Per user completion ratio
Speculative Content Pre-fetching
Pre-fetch at home Watch during commutes
Speculative Content Pre-fetching
Not very efficient…
Per-user mobile savings with pre-fetching
Can we do better with predictive preloading?
Towards Predicting User PreferencesFeatured content
Most Popular Content
How important are UI guidance?
For 20% of users > 60% of their access are from the Front Page
Content Types
11 channels
11 categories and 172 genres
thousands shows
1 channel 2 channels 3 channels
20%0% 40% 60% 100%
1 category 2 category 3 categories
30%0% 75%55% 100%
1 genre 2 gen. 3 gen.
15%0% 40% 50%30%
4 gen.
100%
1 sh. 2 sh. 3
10%0% 25%20%
4 sh.
100%35%
User Focus on Different Content Types
share of users with all their sessions from:
out of 11 channels
out of 171 genres
out of thousands shows
out of 11 categ.
importancecontent category 0.038
content genre 0.063 category affinity 0.042
genre affinity 0.103show affinity 0.179
channel affinity 0.043 content age 0.087
User PreferencesTotal importance: 0.555
importancefeatured content 0.061
featured position 0.061
content popularity rank 0.071
popularity position 0.008
featured probability 0.091
UI GuidanceTotal importance: 0.292
importancepreviously watched 0.066
completion ratio 0.081 probability of re-watching 0.007
Repeatedly Watched ContentTotal importance: 0.154
Engineering Features
Supervised Learning
Problem: For a given user U and an episode E predict whether U will watch E
Binary Classification Problem f(U,E) -> {0,1}
Random Forest: fast, good performance on high dimensional data
Negative Examples: randomly sample from what users did not watch
Predictions: Predict probability, rank all episodes by probability
Accuracy of Personalized Predictions
For 50% of users over 70% chance of fitting in Top-10 predictions
When do we do predictions?
Front Pages are updated over night…
When do we do predictions?
… and remain largely unchanged for 24h
How much traffic can be saved?
Predictive pre-fetching can potentially save near 71% of mobile usage
We made mobile users happy!How about the rest?
Access PatternsAverage per-user # sessions Correlation with Internet speed
Content Delivery for Home Broadband
Install more distributed caches
May requires significant investments
Any alternatives?
Problem: how to handle peak load from 32M users
Alternative: Peer-assisted Content Delivery
Content Serversuser
user user
user
useruser
average of 5K users online every sec in the first day after release
5K duplicates every second!!!
Ask users for assistance
Elegant Theoretical Model for very Complex Behavior
around 88% of savings can be achieved
Data AnalysisTh
eore
tical
Mod
el
G c 1 e c
Why it works?
Top-5% of the content corpus accounts for 80% of traffic
Most of accesses happen in the first day after release
Yes, it’s all about very popular content
Dmytro KaramshukKing’s College London
“True genius resides in the capacity for evaluation of uncertain, hazardous, and conflicting information” -
Winston Churchill