personalizing a stream of content

28
Personalizing a Stream of Content Saving a Legacy Broadcaster from the Graying of Radio May 21, 2016

Upload: anthea-watson-strong

Post on 15-Apr-2017

135 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Personalizing a Stream of Content

Personalizing a Stream of ContentSaving a Legacy Broadcaster from the Graying of Radio

May 21, 2016

Page 2: Personalizing a Stream of Content

NB: Our Team signed an NDA and will refer to our partner organization as the “Broadcaster” within

published materials.

Page 3: Personalizing a Stream of Content

Understanding the problemCompany

The Broadcaster is a legacy

media organization that

produces and distributes

audio content to radio

stations around the

country.

Context

Younger Americans have

different media

consumption habits,

expectations, and aesthetic

inclinations than the

millions of loyal listeners

our partner has served

since it was founded.

Problem

A drop in younger

listeners has made a dent

in the Broadcaster’s

audience. Without a large

younger audience to

replace older Americans

who die, the future of the

organization is in jeopardy.

Page 4: Personalizing a Stream of Content

The median age of our partner’s radio audience has steadily climbed in the last two decades.

Page 5: Personalizing a Stream of Content

Nora Smith● Her parents listen to the

Broadcaster and when the

station is on at home.

● She doesn’t own a radio at home.

● She doesn’t use the car radio.

● Listens to spotify via or local

music files on phone.

● She gets her news and audio

stories from podcasts.

Page 6: Personalizing a Stream of Content

For any given user, for any given hour, should we serve

them a Podcast or News?

Page 7: Personalizing a Stream of Content

Hypothesis:Listening Sessions

Everyone wants news, unless

interactions with the app in previous

listening sessions show the user

prefers podcasts at this time of day.

We hypothesize that user preference

can be inferred from users’

interactions with the app during

previous listening session.

Page 8: Personalizing a Stream of Content

Ingestion and Wrangling

Page 9: Personalizing a Stream of Content

Start time

Completion

Data deep-dive

Raw Data Implicit User Signals Explicit User Signals

Dated 8/2014-2/2016

614,000+ Unique Users

98,000,000+ Records

Shared

Search Begin

Search Complete

Skipped

Thumbs Up

Page 10: Personalizing a Stream of Content

Ingesting the Data

Page 11: Personalizing a Stream of Content

App Interactions over Time

Page 12: Personalizing a Stream of Content

User Trends

Interactions by Day Interactions by Hour

Page 13: Personalizing a Stream of Content

Time

User interaction

Types of Interaction: Complete, Start, Skip, Search Begin, Search Complete, Thumbs Up, Share

Defining Listening Sessions

Page 14: Personalizing a Stream of Content

Calculate Story Duration

Duration

Page 15: Personalizing a Stream of Content

If gap is ≤ 10 seconds,

assume next story is part

of same listening session

8 3 120 1 1 80

Measure the Gap Length Between Content

Page 16: Personalizing a Stream of Content

Session One Session Two Session Three

Define Sessions

Page 17: Personalizing a Stream of Content

User Trends

Users over Total Listening Time in Seconds Total Actions by Type

Page 18: Personalizing a Stream of Content

Data for Machine Learning

prev_duration

prev_num_ratings

prev_avg_rating_news

prev_avg_rating_podcast

prev_shift

prev_num_news

prev_num_podcast

prev_num_complete

prev_num_thumbup

prev_num_skip

prev_num_searchcomplete

time_diff_hr

Sample One: All sessions related to

randomly selected 10K users

Sample Two: Randomly chosen

20K sessions

Sample Three: A set of 20K sessions

that reflected the total population’s

behavior

Sample Four: A set of 50K sessions

that reflected the total population’s

behavior

Twelve Features Four Sample Sets

Page 19: Personalizing a Stream of Content

Feature Analysis and Modeling

Page 20: Personalizing a Stream of Content

20K Randomly

Selected Sessions

20K Reflecting

Total Pop.

10K Users with All

Sessions

50K Reflecting

Total Pop.

Extra Trees Clf1 0.191344

precision 0.291667

recall 0.142373

f1 0.190164

precision 0.305263

recall 0.138095

f1 0.250842

precision 0.330313

recall 0.202195

f1 0.221719

precision 0.316129

recall 0.170732

Random Forest Clf1 0.170213

precision 0.271186

recall 0.124031

f1 0.170648

precision 0.308642

recall 0.117925

f1 0.253954

precision 0.369869

recall 0.193357

f1 0.204947

precision 0.390135

recall 0.138978

GaussianNBf1 0.293173

precision 0.287402

recall 0.299180

f1 0.257757

precision 0.248848

recall 0.267327

f1 0.311551

precision 0.305672

recall 0.317661

f1 0.288052

precision 0.294807

recall 0.281600

BernoulliNBf1 0.285156

precision 0.295547

recall 0.275472

f1 0.234987

precision 0.258621

recall 0.215311

f1 0.300412

precision 0.313837

recall 0.288088

f1 0.280107

precision 0.317172

recall 0.250799

SVMf1 0.030651

precision 0.666667

recall 0.015686

f1 0.000000

precision 0.000000

recall 0.000000

Cannot Compute f1 0.015723

precision 0.555556

recall 0.007974

LRf1 0.040268

precision 0.400000

recall 0.021201

f1 0.010101

precision 0.142857

recall 0.005236

f1 0.050543

precision 0.441729

recall 0.026805

f1 0.032949

precision 0.714286

recall 0.016863

Page 21: Personalizing a Stream of Content

Feature 0 prev_duration

Feature 1 prev_num_ratings

Feature 2 prev_avg_rating_news

Feature 3 prev_avg_rating_podcast

Feature 4 prev_shift

Feature 5 prev_num_news

Feature 6 prev_num_podcast

Feature 7 prev_num_complete

Feature 8 prev_num_thumbup

Feature 9 prev_num_skip

Feature 10 prev_num_searchcomplete

Feature 11 time_diff_hr

Machine Learning Results of Random Forest Classifier

Page 22: Personalizing a Stream of Content

Model Refinement and Tuning

20K Randomly

Selected Sessions

20K Reflecting

Total Pop.

10K Users with

All Sessions

50K Reflecting

Total Pop.

GaussianNBf1 0.308000

precision 0.331897

recall 0.287313

f1 0.359091

precision 0.389163

recall 0.333333

f1 0.292221

precision 0.313606

recall 0.273565

f1 0.269928

precision 0.290448

recall 0.252115

LRf1 0.084507

precision 0.461538

recall 0.046512

f1 0.090535

precision 0.354839

recall 0.051887

f1 0.046577

precision 0.423762

recall 0.024643

f1 0.037037

precision 0.354839

recall 0.019538

Page 23: Personalizing a Stream of Content

Broadcaster Data without Sessions GaussianNB

f1 0.967882

precision 0.937957

recall 0.999780

LRf1 0.968729

precision 0.940750

recall 0.998425

Page 24: Personalizing a Stream of Content

Conclusion: Session Data v Non Session Data

Precision Recall F1-score Support

News

0.88 0.9 0.89 8497

Podcast

0.32 0.28 0.3 1435

Total

0.8 0.81 0.8 9932

Precision Recall F1-score Support

News

0.99 0.99 0.99 3656335

Podcast

0.92 0.85 0.89 323856

Total

0.98 0.98 0.98 3980191

Validation of provided data with the Broadcaster validation set and Broadcaster model

Validation of extracted data with 10K sample and GaussianNB model

Page 25: Personalizing a Stream of Content

Conclusions and Future Investigations

Page 26: Personalizing a Stream of Content

Some about our Troubles and Lessons Learned

● Hosting on Dreamhost.

● We used up a lot of time trying SVM.

● Feature weighting can be performed with tree classifiers.

● During the beginning of the project, staying flexible.

● User segmentation.

● Don’t be afraid to network to find real world problems with real world data.

● When in doubt, Google it!

Page 28: Personalizing a Stream of Content

The End