machine learning at netflix scale

Post on 06-May-2015

332 Views

Category:

Engineering

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Netflix is the world’s leading Internet television network with over 48 million members in more than 40 countries enjoying more than one billion hours of TV shows and movies per month, including original series. Netflix uses machine learning to deliver a personalized experience to each one of our 48 million users. In this talk you will hear about the machine learning algorithms that power almost every part of the Netflix experience, including some of our recent work on distributed Neural Networks on AWS GPUs. You will also get an insight into the innovation approach that includes offline experimentation and online AB testing. Finally, you will learn about the system architectures that enable all of this at a Netflix scale.

TRANSCRIPT

Machine Learning At Netflix Scale

Aish Fenton Manager - Research Engineering @aishfenton

Everything is a recommendation

4

Top Picks for Aish

Movies based on books

Because you watched Bob’s Burgers

Rank based on your taste

Ran

k ba

sed

on y

our

tast

e

75% of plays come from homepage

Back Story…

Proxy question: ▪ Accuracy in predicted rating ▪ Improve by 10% = $1million!

What we were interested in: ▪ High quality recommendations

predicted

actual

SVD RBMs

Top two results still used in production!

>

2006 2013

• > 44M members

• > 40 countries

• > 5B hours in Q3 2013

• Log 100B events/day

• 31.62% of peak US downstream traffic

Data and Models

▪ > 40M subscribers ▪ Ratings: ~5M/day ▪ Searches: >3M/day ▪ Plays: > 50M/day ▪ Streamed hours: o 5B hours in Q3 2013

Geo Info

Time

Impressions

Device Info

Metadata

Social

Ratings

Demographics

Member Behavior

Plays

Aish House of Cards

Latent User Vector

Latent Item Vector

3.53

RU

M

u1 u2 u3

m1 !m2!m3

House of Cards

Aish Aish

House of Cards

Mean Rating My Bias

Movie Bias

Interaction

Mean Rating My Bias

Movie Bias

Interaction

3.55 = 2.50 + -1.5 + 1.2 + pq

My rating for House of Cards

R3.53

U

M

u1 u2 u3

m1 !m2!m3

House of Cards

Aish

2.35

1.34

Time

T

t1 t2 t3 Time

▪ Matrix/Tensor Factorization ▪ Regression models (Logistic, Linear, Elastic nets) ▪ Factorization Machines ▪ Restricted Boltzmann Machines ▪ Markov Chains & other graph models ▪ Clustering / Topic Models ▪ Neural Networks ▪ Association Rules ▪ GBDT/RF ▪ …

Popularity

+ Ratings

+ More Features & Optimized Models

0% 50%

100%

150%

200%

250%

300%

Improvement Over Baseline

Anatomy of a Machine Learning

Platform

Problem

Data

Experiment Offline

Produce Model

Test / Metrics

Near-line

Online

UI Clients

Event Distribution

Online Algs

Model Trainer

Pre-compute

AB Test Metrics

API Layer

Monitoring

Offline

Hadoop / Data Warehouse

Experimentation Platform

S3 / HDFS

Offline MetricsQuery Tools

Models

Models

Near-line

Online

UI Clients

Event Distribution

Online Algs

Model Trainer

Pre-compute

AB Test Metrics

API Layer

Monitoring

Offline

Hadoop / Data Warehouse

Experimentation Platform

S3 / HDFS

Offline MetricsQuery Tools

Models

Models

▪ App Logs ▪ User Actions

▪ Ratings ▪ Plays ▪ Queue Adds

▪ Algo Actions ▪ Impressions (Presentation Bias)

▪ Context ▪ Device Info ▪ User Demographics ▪ Social ▪ Time

▪ …

Many different types of data…

Near-line

Online

UI Clients

Event Distribution

Online Algs

Model Trainer

Pre-compute

AB Test Metrics

API Layer

Monitoring

Offline

Hadoop / Data Warehouse

Experimentation Platform

S3 / HDFS

Offline MetricsQuery Tools

Models

Models

Embedded

Embedded

Weights

Real-time popularity of movie

Example: Neural Network Training

θ

Input OutputHidden Layer

Input OutputHidden Layers

Neural Network Training

1,536 cores

G2 Instances $0.60 p/h

But… things can go astray

Near-line

Online

UI Clients

Event Distribution

Online Algs

Model Trainer

Pre-compute

AB Test Metrics

API Layer

Monitoring

Offline

Hadoop / Data Warehouse

Experimentation Platform

S3 / HDFS

Offline MetricsQuery Tools

Models

Models

RU

MPre-compute

u1 u2 u3Online

Near-line

Online

UI Clients

Event Distribution

Online Algs

Model Trainer

Pre-compute

AB Test Metrics

API Layer

Monitoring

Offline

Hadoop / Data Warehouse

Experimentation Platform

S3 / HDFS

Offline MetricsQuery Tools

Models

Models

Aish played HoC

Publish new model for Aish

Aish Fenton @aishfenton https://www.linkedin.com/profile/view?id=47917219

top related