on managing continuous media data edward chang hector garcia-molina stanford university

On Managing Continuous Media Data

Edward Chang Hector Garcia-MolinaStanford University

Challenges

Large Volume of DataMPEG2 100 Minute Movie: 3-4 GBytes

Large Data Transfer RateMPEG2: 4 to 6 MbpsHDTV: 19.2 Mbps

Just-in-Time Data RequirementSimultaneous Users

...Challenges

Traditional Optimization Objectives:Maximizing Throughput!Maximizing Throughput!!Maximizing Throughout!!!

How about Cost?How about Initial Latency?

Related Work

USC (S. Ghandeharizadeh)UCLA (R. Muntz)UBC (Raymond Ng)Bell Labs. (B. Ozden)IBM Tom Watson Labs. (P. Yu)etc.

OutlineServer (Single Disk)

Revisiting Conventional Wisdom Minimizing CostMinimizing Initial Latency

Server (Parallel Disks)Balancing WorkloadMinimizing Cost & Initial Latency

ClientHandling VBRSupporting VCR-like Functions

Conventional Wisdom(for Single Disk)

Reducing Disk Latency leads to Better Disk Utilization

Reducing Disk Latency leads to Higher Throughput

Increasing Disk Utilization leads to Improved Cost Effectiveness

Is Conventional Wisdom Right?

Does Reducing Disk Latency lead to Better Disk Utilization?

Does Reducing Disk Latency lead to Higher Throughput?

Does Increasing Disk Utilization lead to Improved Cost Effectiveness?

Tseek: Disk Latency

TR: Disk Transfer Rate

DR: Display RateS: Segment Size (Peak Memory Use per Request)T: Service Cycle Time

S = DR × T

T = N × (Tseek + S/TR)

N × TR × DR × Tseek

TR - N × DR

S is directly proportional to Tseek

Dutil =S/TR

S/TR + Tseek

Dutil is Constant!

Disk Utilization

Does Reducing Disk Latency lead to Better Disk Utilization? NO!

Does Reducing Disk Latency lead to Higher Throughput?

What Affects Throughput?

Disk Latency

Memory Utilization

Disk Utilization

Throughput

Memory Requirement

We Examine Two Disk Scheduling Policies’ Memory RequirementSweep (Elevator Policy): Enjoys

the Minimum Seek OverheadFixed-Stretch: Suffers from High

Seek Overhead

TR - N × DR =S

Per User Peak Memory Use

Sweep (Elevator)

Disk Latency: MinimumIO Time Variability: Very High

Sweep (Elevator)

Memory Sharing: PoorTotal Memory Requirement:

2 * N * Ssweep

Fixed-Stretch

Disk Latency: High (because of Stretch)IO Variability: No (because of Fixed)

Fixed-Stretch

Memory Sharing: GoodTotal Memory Requirement:

1/2 * N * Sfs

Throughput

Sweep2 * N * Ssweep

Available Memory = 40 Mbytes

N = 40

Fixed Stretch1/2 * N * Ssf

Available Memory = 40 Mbytes

N= 42Higher Throughput

* Based on A Realistic Case Study Using Seagate Disks

What Affects Throughput?

Disk Latency

Memory Utilization

Disk Utilization

Throughput

Does Reducing Disk Latency lead to Higher Throughput? NO!

Per Stream Cost

Cm × N × TR × DR × Tseek

TR - N × DR =Cm × S

Per-Stream Memory Cost

Example

Disk Cost: $200 a unit Memory Cost: $5 each MBytes Supporting N = 40 Requires 60 MBytes Memory

$200 + 300 = $500 Supporting N = 50 Requires 160 MBytes

Memory$200 + 800 = $1,000

For the same cost $1,000, it’s better to buy 2 Disks and 120 Mbytes to support N = 80 Users!

Memory Use is Critical

Does Reducing Disk Latency lead to Higher Throughput? NO!

Does Increasing Disk Utilization lead to Improved Cost Effectiveness? NO!

So What?

Initial Latency

What is it?The time between when a request arrives

at the server to the time when the data is available in the server’s main memory

Where is it important?Interactive applications (e.g., video

game)Interactive features (e.g., fast-scan)

Sweep (Elevator)

Fixed-Stretch

Space Out IOs

Fixed-Stretch

Our Contribution: BubbleUp

Fixed-Stretch Enjoys Fine Throughput

BubbleUp Remedies Fixed-Stretch to Minimize Initial Latency

Schedule Office Work

8am: Host a Visitor9am: Do Email10am: Write Paper11am: Write PaperNoon: Lunch

BubbleUp

Empty Slots are Always Next in Time

No additional Memory RequiredFill the Buffer up to the Segment Size

No additional Disk Bandwidth RequiredThe Disk Is Idle Otherwise

Evaluation

Fast-Scan

Data Placement Policies

Please refer to our publications

Chunk Allocation

Allocate Memory in ChunksA Chunk = k * S

Replicate the Last Segment of a Chunk in the Beginning of Next Chunk

ExampleChunk 1: s1, s2, s3, s4, s5Chunk 2: s5, s6, s7, s8, s9

Chunk Allocation

Largest-Fit FirstBest Fit (Last Chunk)

18 Segment Placement

Largest-Fit First

Best Fit

Unbalanced Workload

Balanced Workload

TR - N × DR =S

Per Stream Memory Use (Use M Disks Independently)

M × N

Per Stream Memory Use (Use M Disks As One Disk)

M × N

TR - N × DR =S

S’ =N × M × TR × M × DR × Tseek

TR × M - N × M × DR

S’ = M × N × TR × DR × Tseek

TR - N × DR= M × S

…Continue

Challenges

Using M Disks Independently:Unbalanced WorkloadLow Per-Stream Memory Cost

Using M Disks As One Virtual Disk (i.e., Employing Fine-Grained Striping):Balanced WorkloadHigh Per-Stream Memory Cost

Our Approach (2DB)

Use Disks IndependentlyTo Minimize Cost

Replicate Hot Movies (20% Movies)To Balance Workload

Use BubbleUpTo Minimize Initial Latency

2D BubbleUp (2DB)

Intelligent Data PlacementEfficient Request SchedulingFODO, 1998

2DB Data Placement: Chunk Allocation

2DB Scheduling

Formally, This is a Bipartite Weighted Matching problemCan be solved using Hungarian method

in O(V^3), where V = NMWe use a Greedy Method to reduce

the problem to a Bipartite Unweighted Matching problemCan be solved in O(M^2)

Why 2DB Works?

n balls n urns, finite n:

ln n / ln ln n(1 + o(1))

ln ln n / ln 2 + O(1)

m balls n urns, m > n and infinite m and n:

d: number of possible destinations

ln ln n / ln d (1 + o(1)) + O(m/n)

What 2DB Costs?

Storage CostAddition disk cost = % hot moviesTypically 20% of movies subscribed

80% of timeThroughput

Throughput is scaled back by a fraction to achieve balanced work

Evaluation

2DB Achieves Balanced Workload with High ThroughputCompared to e.g., some dynamic load

balancing schemes 2DB Incurs Low Additional Storage

Cost2DB Enjoys Minimum Initial Latency

Media Client

Most Studies Assume Dumb ClientsWe Propose Smart Clients for

Handling VBRSupporting VCR-like Functions

Handling VBR

Server Can Handle VBRFrame rate fluctuates but the moving

average does not fluctuate as muchRates are even out when N is large,

which is typically the case

...VBR

But, the Server Cannot Eliminate Bitrate MismatchPacketization and Channel Delay

can change the bitrateThe Solution Must Be at the Client

Supporting VCR-like Functions

Pause Phone call interruptionsBiological needs

Fast ForwardCatching up the program after a pause

Instant Replay

How to Pause A Movie?

Broadcast TV Cannot Be PausedPausing Via a Point-to-point Link

Affects the Server’s Scheduling

Caching!!!Main Memory Caching?

Too expensive! (19.2 mbps * 20 min = 2 GBytes)

Buffer Management

Challenges

Must Ensure Arriving Bits Do Not Overflow the Network Buffer

Must Ensure Decoder Buffer Does Not Underflow

Must Work for Any Off-the-shelf Disks, CPU Box

Our Contribution: MEDIC

MEDIC: MEmory & Disk Integrated Cache

MEDIC Manages IOs Between Memory and Disk Efficiently Only 4 Mbytes main memory needed!!!Make a set-top box affordable

MEDIC Adapts to Hardware Configuration

Regular PlaybackPauseResume Regular PlaybackFast ForwardInstant Replay (not shown)

Visualize MEDIC

Conclusions (Contributions in Blue)

Server (Single Disk)Revisiting Conventional Wisdom Minimizing CostMinimizing Initial Latency

…Conclusions

Our Server SupportsLow Latency Playback and Fast Forward

Our Client SupportsPause and Low Latency Instance Replay

Together, We Propose A Complete End-to-end Solution for Continuous Media Data Delivery!

Future Work

Enhancing MEDIC for Managing Heterogeneous Data, from Both Broadcast & Internet ChannelsVideo PanoramasInteractive TV

Indexing Videos for ReplayVideo/Image databases

on managing continuous media data edward chang hector garcia-molina stanford university

disk utilization slide

disk latency lead

disk latency tr

disk utilization lead

fixed slide

high slide

overhead slide

s sweep slide

Documents

l3: ddbs design (tailored from the slides by prof. hector...

in re oracle corporation derivative …...2019/12/06 ·...

mor naaman, yee jiun song, andreas paepcke, hector...

synchronizing a database to improve freshness junghoo cho...

1 failure recovery checkpointing undo/redo logging source:...

1 yet more on indexes hash tables source: our textbook,...

query compilation contains slides by hector garcia-molina

merging ranks from heterogeneous internet sources hector...

copyright by hector andres chang lara 2013

chapter 121 chapter 12: representing data elements (slides...

1 failure recovery introduction undo logging redo logging...

how to crawl the web junghoo cho hector garcia-molina...

cs 245notes 21 cs 245: database system principles notes 02:...

data warehousing overview cs245 notes 11 hector...

chapters 15-16a1 (slides by hector garcia-molina,...

1 indexes on sequential files source: our textbook, slides...

extracting structured data from web page arvind arasu,...

crawling the hidden web sriram raghavan hector garcia-molina...

beyond just data privacy bobji mungamuru hector...

cs 245notes 41 cs 245: database system principles notes 4:...