Page 1: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Improving the power of a picture via A/B testingGopal Krishnan Director of EngineeringDale Elliott Senior Software EngineerKenny Xie Senior Data Scientist

Page 2: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain
Page 3: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain
Page 4: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain
Page 5: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

TV is a lean back experience

Page 6: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

90 seconds

Page 8: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Pop Quiz

Page 9: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

A round plane figure whose boundary (the

circumference) consists of points equidistant from a fixed point (the center).

Page 10: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain
Page 11: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

A round plane figure whose boundary (the

circumference) consists of points equidistant from a fixed point (the center).

Page 12: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain
Page 13: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Can we do better?

Page 14: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain
Page 15: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Sensitivity test

Page 16: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

The Short Game

Page 17: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Single title A/B test result

14% better 6% better

Page 18: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Testable Hypothesis

Page 19: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Displaying better artwork will result in greater engagement and retention by helping members discover stories they will enjoy even faster.

Page 20: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Data Driven

Page 21: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Netflix API serviceBeacon (telemetry collection service)

Hive (computes artwork performance metrics for every title/country/locale


Netflix Image Library

Device (PS3, website, etc.)

Feedback loop

Serve artwork based on A/B logic

Feed with artwork based on perf metric

Collect plays & client impressions

Page 22: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Anatomy of artwork

Page 23: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Stable Image id for ground truth data

source-file-id-1 source-file-id-3source-file-id-2


Page 24: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Diversity matters

Page 25: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Diversity matters

Page 26: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Pop Quiz

1 2

4 5 6


Page 27: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Building the A/B tests


Page 28: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Pairs of Explore and Exploit Tests

Explore Test

Current production explore

New explore

Exploit Test

Current production exploit

New exploit



● No member overlap● Explore and exploit allocation happens


Page 29: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Multi-title explore allocation test

Page 30: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Cell 1 Cell 2 Cell 3 Cell 4 Cell 5 Cell 6

Title 1 Control Image Test Image 1 Test Image 2 Test Image 3 Test Image 4 Test Image 5

Title 2 Control Image Test Image 1 Test Image 2 Test Image 3 Test Image 4 Test Image 5

... ... ... ... ... ... ...

Title n Control Image Test Image 1 Test Image 2 Test Image 3 Test Image 4 Test Image 5

Test Evolution: Single Title to Multiple Titles

Single title, multi-cell test

Page 31: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Engineering implementation / complexity

• Our A/B infrastructure is optimized for comparing test cells to each other

• Need to compare data across cells for one title of many

• Avoid creating hundreds of tests (one per title)

Page 32: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Solution:• Treat all the members who see a title’s images as a virtual test

• Impression tracking -- not just test cell allocation -- defines test population per title

Engineering implementation / complexity

Allocated Members

Title A impres-sions

Title B impres-sions

Page 33: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Problems with multi-title, multi-cell test

• Cohorts of testers who all saw the same set of images

• Same number of images for every title

Page 34: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Single-cell explore allocation test

Page 35: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Title 1

“Cells” 1 2 3 4 5 6

Image Control Image 1 Image 2 Image 3 Image 4 Image 5

Title 2

“Cells” 1 2 3 4

Image Control Image 1 Image 2 Image 3

Test Evolution: Images per titleMulti-cell explore evolves to Single-cell explore


Virtual Tests inside one test cell

Page 36: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Engineering implementation / complexity

Goals• No cohorts

• Image stickiness

• No persistent storage

We used a deterministic, pseudo-random calculation• new Random(memberID * titleId).nextInt(numImages)

Page 37: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Netflix API Service

Engineering implementation / complexity

No persistence neededCells Cell 1 Cell 2

Title 1

Ctrl Image Random of [Ctrl, Test 1, ... Test X1]

Title 2

Ctrl Image Random of [Ctrl, Test 1, ... Test X2]

... ... ...

Title n Ctrl Image

Random of [Ctrl, Test 1, ... Test Xn]

Image Data Feed

(Title ID, Image Lists)

Netflix Image Lib.

Random assignment to all test members.

Single-cell explore test

Page 38: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

● No more cohorts

● Flexible

● Clear winners for many titles

● Overall win based on key metrics

Can we do better?


Page 39: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain


• Over exposure of under-performing images

• Under exposure of niche titles

• Unfair burden on testers

Page 40: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Title-level allocation test

Page 41: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Solution: Title-Level Allocation

• Limit allocated members per title

• Less exposure of under-performing images

• Still get enough data to determine winner

• Allocate from a gigantic pool

• More exposure for niche titles

• Spreads testing burden

Page 42: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Test Evolution: Testers per titleC

Title A

Title B

Title C

Title A

Title B

● Some titles have few testers in the small pool

● Most titles have full testing allocation from larger pool

Page 43: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Engineering implementation / complexity• Goals from previous test

• No cohorts• Image stickiness• No persistent storage

• New goals• Less exposure for under-performing images• More exposure for niche titles• Faster decision and rollout of winning images

• This time, we needed to persist the allocations

Page 44: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Netflix API Service


Image Data Feed

Yellow Square


Netflix Image LibraryMember


Title fully Allocated


Allocate with Random Assignment

Log and storeAllocation

SelectAssigned Image

SelectControl Image

SelectAssigned Image





Title Metadata Service (VMS)


Page 45: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain


● Underestimated traffic

● Many titles allocated per member at once

● Write to Y2 for every allocation

Result: Service disruption; we had to turn off the test

Page 46: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Netflix API Service

ScalingImage Data Feed

Yellow Square


Netflix Image Library

Allocate with Random Assignment

Log and storeAllocation



1 write per member every 30 sec.

Storing allocations as they occurred overloaded Yellow Square.

Now, we log them to a stream and consolidate many writes into one.

Page 47: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain
Page 48: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Who to Test on?

Test on the same population you are planning to rollout the changes to

Page 49: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Two Member Cohorts

• New Members are assigned to the experimental condition at the time of sign-up

• Existing Members are assigned to the experimental condition any time after free trial ended

Page 50: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Decision Focuses More on New Members

• A “pure” sample which is not tainted by a previous Netflix experience

• A more sensitive sample (“on the fence”)

Page 51: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Tiers of Metrics• Primary: Customer retention• Secondary: Streaming hours• Tertiary: all other customer engagement metrics

• Play rate• Number of Netflix visits• ...

Page 52: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

How to Pick the Winner in Explore?

• Take fraction = (number of users played the title) / (number of users been seen the title)

• Correlated with retention

• Measurable from day one

Page 53: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

What is a Play?

Page 54: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

What is a Play?

Page 55: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

What is a Play?

Page 56: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Does Impression Location Matter?

Page 57: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Does Impression Location Matter?

Page 58: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Does Impression Location Matter?

Page 59: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Does it Matter How Many Impressions it Takes to Play?

Netflix just recommended an awesome show to me and I am going to watch it!!!

Page 60: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Does it Matter How Many Impressions it Takes to Play?

I have seen the show on Netflix a few times. Maybe, I should try it...

Page 61: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Take Fraction is NOT as trivial as its definition implies.

Page 62: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

How to Make the Final Decision?

Final decision is based on the exploit test• Retention movement

• Streaming hours movement

• Engagement with titles explored in the test, titles not explored in the test

• ….

Page 63: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Our Image Selection Test is a Win!

• Improved customer retention

• Improved customer engagement

Page 64: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Some Learnings

Page 65: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Emotions excellent to convey complex nuances

Page 66: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Great stories travel - but regional nuances can be powerful

Page 67: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Nice Guys Often Finish Last

Page 68: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Contact:Gopal KrishnanDale ElliottKenny Xie

More details available at Netflix techblog.

Talk to us outside at the booth.

Top Related