chen tian richard alimi yang richard yang david zhang aug. 16, 2012

64
Yale LANS ShadowStream: Performance Evaluation as a Capability in Production Internet Live Streaming Networks Chen Tian Richard Alimi Yang Richard Yang David Zhang Aug. 16, 2012

Upload: meadow

Post on 10-Feb-2016

32 views

Category:

Documents


0 download

DESCRIPTION

ShadowStream : Performance Evaluation as a Capability in Production Internet Live Streaming Networks. Chen Tian Richard Alimi Yang Richard Yang David Zhang Aug. 16, 2012. Live Streaming is a Major Internet App. Poor Performance After Updates. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

ShadowStream: Performance Evaluation as a Capability in

Production Internet Live Streaming Networks

Chen TianRichard Alimi

Yang Richard YangDavid Zhang

Aug. 16, 2012

Page 2: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Live Streaming is a Major Internet App

Page 3: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Poor Performance After Updates

Lacking sufficient evaluation before release

Page 4: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Don’t We Already Have …

•Emulab

•PlanetLab

•….

Testbeds

•Gradually rolling out

Testing Channels

They are not enough !

Page 5: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Live Streaming Background

We focus on hybrid live streaming systems: CDN + P2P

Page 6: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Live Streaming Background

We focus on hybrid live streaming systems: CDN + P2P

Page 7: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

With Connection Limit

Testbed: Misleading Results at Small Scale

Production Default

Small-Scale Large-ScalePiece Missing Ratio

3.7%

0.7%

64.8%

3.5%

Live streaming performance can be highly non-linear.

Page 8: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Testbed: Misleading Results due to Missing Features

Piece Missing Ratio

# Timed-out Requests

# Received Duplicate Packets

# Received Outdated Packets

LAN Style(Same BW)

1.5%

1404.25

0

5.65

ADSL Style (Same BW)

7.3%

2548.25

633

154.20

Realistic features can have large performance impacts.

Page 9: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Testing Channel: Lacking QoE Protection

Page 10: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Testing Channel: Lacking Orchestration

What we want is … What we have is …

0 5000 10000 150000

1000

2000

3000

4000

5000

6000

Time (Seconds)

Num

ber o

f Pee

rs

Expected

0 5000 10000 150000

1000

2000

3000

4000

5000

6000

Time (Seconds)

Num

ber o

f Pee

rs

ExpectedProvided

Page 11: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

ShadowStream Design Goal

•Protection of real user QoE

•Transparent orchestration of testing conditions

Use production network for testing with

Page 12: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Roadmap

Motivation

Protection Design

Orchestration Design

Evaluations

Conclusions and Future Work

Page 13: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Protection: Basic Scheme

Note: R denotes Repair, E denotes Experiment

Page 14: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Example Illustration: E Success

Page 15: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Example Illustration: E Success

Page 16: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Example Illustration: E Success

Page 17: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Example Illustration: E Fail

Page 18: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Example Illustration: E Fail

Page 19: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Example Illustration: E Fail

Page 20: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Example Illustration: E Fail

Page 21: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

How to Repair?Choice 1: dedicated CDN resources (R=rCDN)

– Benefit: simple– Limitations

• requires resource reservation, –e.g., 100,000 clients x 1 Mbps = 100 Gbps

• may not work well when there is network bottleneck

Page 22: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

How to Repair?

• Choice 2: production machine (R=production)

– Benefit 1: Larger resource pool

– Benefit 2: Fine-tuned algorithms

– Benefit 3: A unified approach to protection & orchestration (later)

Page 23: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

R= Production: Resource Competition

Competition leads to underestimation on

Experiment performance

Repair and Experiment compete on client

upload bandwidth

Page 24: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

x=θ

y=m(θ )

R= Production: Misleading Result

missing

ratio

θ 0

O

x

x+y=θ0

θ R

Ox

accurate

result

repair demand

θ L

xO

θ *

Ox

misleading

result

Page 25: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Putting Together: PCE

E

100%

95%5% successfailure

tasks

P

Page 26: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Putting Together: PCE

Page 27: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Implementing PCE

•Streaming machine transparent of testing state

•Streaming machines are isolated from each other

Requirements

Page 28: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Implementing PCE: base observation

• A simple partitioned sliding window to partition downloading tasks among PCE automatically

unavailableunavailable piece missingresponsibility transferred

Page 29: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Client Components

Page 30: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Roadmap

Motivation

Protection Design

Orchestration Design

Evaluations

Conclusions and Future Work

Page 31: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Orchestration Challenges

• How to start an Experiment streaming machine– Transparent to real viewers

• How to control the arrival/departure of each Experiment machine in a scalable way

Orchestrator

PStreaming Hypervisor

C Eclient

Page 32: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Transparent Orchestration Idea

Viewer Enters Channel

Page 33: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

real playpoint virtual playpoint

ER

Transparent Orchestration Idea

Experiment Enters Testing

Page 34: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Transparent Orchestration Idea

real playpoint virtual playpoint

ER

Experiment Leaves Testing

Page 35: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Distributed Activation of Testing• Orchestrator distributes parameters to clients• Each client independently generates its arrival time

according to the same distribution function F(t)• Together they achieve global arrival pattern

– Cox and Lewis Theorem

Page 36: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Orchestrator Components

Page 37: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Roadmap

Motivation

Protection Design

Orchestration Design

Evaluations

Conclusions and Future Work

Page 38: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Software Implementation• Compositional Runtime

– Modular design, including scheduler, dynamic loading of blocks, etc.

– 3400 lines of code• Pre-packaged blocks

– HTTP integration, UDP sockets and debugging– 500 lines of code

• Live streaming machine– 4200 lines of code

Page 39: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Experimental Opportunities

Page 40: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Protection and Accuracy

Virtual Playpoint

Real Playpoint

Buggy

8.73%

N/A

R=rCDN

8.72%

0%

R=rCDN w/ bottleneck

8.81%

5.42%

Piece Missing Ratio

Page 41: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Protection and Accuracy

Virtual Playpoint

Real Playpoint

PCE bottleneck

9.13%

0.15%

PCE w/ higher bottleneck

8.85%

0%

Piece Missing Ratio

Page 42: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Orchestration: Distributed Activation

Page 43: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Utility on Top: Deterministic Replay

•Event

•Message

•Random seeds

Control non-deterministic inputs

Practical per-client log size

Log Size

100 clients; 650 seconds 223KB

300 clients; 1,800 seconds 714KB

Page 44: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Roadmap

Motivation

Protection Design

Orchestration Design

Evaluations

Conclusions and Future Work

Page 45: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Contributions• Design and implementation of a novel live

streaming network that introduces performance evaluation as an intrinsic capability in production networks– Scalable (PCE) protection of QoE despite large-

scale Experiment failures– Transparent orchestration for flexible testing

Page 46: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Future Work

• Large-scale deployment and evaluation

• Apply the Shadow (Experiment->Validation->Repair) scheme to other applications

• Extend the Shadow (Experiment->Validation->Repair) scheme– E.g., repair does not mean do the same job as

Experiment, as long as it masks visible failures

Page 47: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Adaptive Rate Streaming Repair

Accuracy

Protected QoE

Protection Overhead

Follow1.26x

1.59x

1.49 Kbps

Base1.26x

1.42x

3.69 Kbps

Adaptive1.26x

1.58x

1.39 Kbps

Page 48: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Thank you!

Page 49: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Questions?

Page 50: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

backup

Page 51: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Poor Performance After Updates

Lacking sufficient evaluation before release

Page 52: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Related Work• Debugging and evaluation of distributed systems

– e.g., ODR, Friday, DieCast• Based on a key observation• Allows scenarios customization

• FlowVisor– Allocate a fixed portion of tasks and resources

Page 53: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Why Not Testing Channel: orchestration

What we want is … What we have is …

Page 54: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Experiment Specification & Triggering

• A testing should define:– One or more classes of clients– Client-wide arrival rate functions– Client-wide life duration function

• Triggering Condition: prediction based

Page 55: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Experiment Transition• Connectivity Transition• Playbuffer State Transition

• More details in the paper: Replace Early Departed Clients, Independent Departure Control

P C E

sourcepoint ae,i

(a)

P C E

sourcepointae,i - elag

(b)

P C Eae - Tspread

sourcepoint ae

Tspread(c)Data

plag + clag

P C E

sourcepoint in [ae – Tspread - elag , ae]

Tspread(d)Data

plag + clag

Data

plag + clag

Page 56: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

ShadowStream Design GoalProduction networks

By adding protection and orchestration into

production networks, we have ….

Live Testing !

Testbeds

Page 57: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

State of Art: Hybrid Systems

Page 58: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Putting Together : ShadowStream• The first system, in the context of live streaming,

that can perform live testing with both protection and orchestration

• Design the Repair system that can simultaneously provide protection and experiment accuracy

• Fully implemented and evaluated

Page 59: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Problem: Resource Competition• Repair and Experiment compete on key resource

(client upload bandwidth)• Competition may lead to systematic

underestimation on Experiment performance

How to get around ?

Page 60: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Experiment Orchestration• list Experiment Specification & Triggering

• Independent Arrivals Control

• Experiment Transition

• Replace Early Departed Clients

• Independent Departure Control

Page 61: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Example Illustration

Download WindowRepair Window

Page 62: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

From Idea to System

Page 63: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Extended Works

• Dynamic Streaming

• Deterministic Replay

Page 64: Chen  Tian Richard  Alimi Yang  Richard  Yang David  Zhang Aug. 16, 2012

Yale LANS

Example Illustration XX