design and implementation of a generic resource-sharing virtual-time dispatcher

30
Design and Implementation of Design and Implementation of a Generic Resource-Sharing a Generic Resource-Sharing Virtual-Time Dispatcher Virtual-Time Dispatcher Tal Ben-Nun Scl. Eng & CS Hebrew University Yoav Etsion CS Dept Barcelona SC Ctr Dror Feitelson Scl. Eng & CS Hebrew University Supported by the Israel Science Foundation, grant no. 28/09

Upload: skah

Post on 16-Mar-2016

36 views

Category:

Documents


3 download

DESCRIPTION

Tal Ben-Nun Scl. Eng & CS Hebrew University. Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher. Yoav Etsion CS Dept Barcelona SC Ctr. Dror Feitelson Scl. Eng & CS Hebrew University. Supported by the Israel Science Foundation, grant no. 28/09. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Design and Implementation ofDesign and Implementation ofa Generic Resource-Sharinga Generic Resource-Sharing

Virtual-Time DispatcherVirtual-Time Dispatcher

Tal Ben-NunScl. Eng & CS

Hebrew University

Yoav EtsionCS Dept

Barcelona SC Ctr

Dror FeitelsonScl. Eng & CS

Hebrew University

Supported by the Israel Science Foundation, grant no. 28/09

Page 2: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Design and Implementation ofa Generic Resource-Sharing

Virtual-Time Dispatcher

Goal is to control share of resources, not to optimize performance – important in virtualization

Page 3: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Design and Implementation ofa Generic Resource-Sharing

Virtual-Time Dispatcher

Goal is to control share of resources, not to optimize performance – important in virtualization

Same module used for diverse resources

Page 4: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Design and Implementation ofa Generic Resource-Sharing

Virtual-Time Dispatcher

Goal is to control share of resources, not to optimize performance – important in virtualization

Same module used for diverse resourcesMechanism used: dispatch the most deserving client at each instant

Page 5: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Design and Implementation ofa Generic Resource-Sharing

Virtual-Time Dispatcher

Goal is to control share of resources, not to optimize performance – important in virtualization

Same module used for diverse resourcesMechanism used: dispatch the most deserving client at each instant

Selection of deserving client using virtual time formalism

Page 6: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Design and Implementation ofa Generic Resource-Sharing

Virtual-Time Dispatcher

Goal is to control share of resources, not to optimize performance – important in virtualization

Same module used for diverse resourcesMechanism used: dispatch the most deserving client at each instant

Selection of deserving client using virtual time formalism

Implemented and measured in Linux

Page 7: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Motivation

Context: VMM for server consolidation Multiple legacy servers share physical platform Improved utilization and easier maintenance Flexibility in allocating resources to virtual machines Virtual machines typically run a single application

(“appliances”)

Page 8: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Motivation

Assumed goal: enforce predefined allocation of resources to different virtual machines(“fair share” scheduling) Based on importance / SLA Can change with time or due to external events

Problem: what is “30% of the resources” when there are many different resources, and diverse requirements?

Page 9: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Global Scheduling

“Fair share” usually applied to a single resource But what if this resource is not a bottleneck?

Global scheduling idea:1) Identify the system bottleneck resource2)Apply fair share scheduling on this resource3)This induces appropriate allocations on other

resources This paper: how to apply fair-share scheduling

on any resource in the system

Page 10: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Previous Work I: Virtual Time

Accounting is inversely proportional to allocation Schedule the client that is farthest behind

Page 11: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Previous Work II: Traffic Shaping

• Leaky bucket– Variable requests– Constant rate transmission– Bucket represent buffer

• Token bucket– Variable requests– Constant allocations– Bucket represents stored

capacity

Page 12: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Putting them Together: RSVT

• “Resource sharing”: all clients make progress continuously– Generalization of processor sharing

• Each job has its ideal resource sharing progress– This is considered to be the allocation ai

– Grows at constant rate• Each job has its actual consumption ci

– Grows only when job runs• Scheduling priority is the difference:

pi = ai – ci

Page 13: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

ExampleThree clientsAllocations roughly 50%, 30%, 20%Consumption always occur in resource time

Wallclock time

Con

sum

ed re

sour

ce ti

me

Page 14: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Bookkeeping

• The set of active jobs is A• The relative allocation of job i is ri

• During an interval T job k has run• Update allocations:

• Update consumptions:

Tr

raAj j

ii

otherwise

kiTci 0

Page 15: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

The Active Set

• Active jobs (the set A) are those that can use the resource now

• Allocations are relative to the active set• The active set may change

– New job arrives– Job terminates– Job stops using resource temporarily– Job resumes use of resource

Page 16: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Grace Period

• Intermittent activity: process data / send packet• should retain allocations even when inactive• Thus ai continues to grow during grace period

after it becomes inactive• Grace period reflects notion of continuity• Sub-second time scale

Page 17: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Rebirth

• Resumption after very long inactive periods should be treated as new arrivals

• Due to grace period, job that becomes inactive accrues extra allocation

• Forget this extra allocation after rebirth period

(set ai = ci)• Two order of magnitude larger than grace period

Page 18: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Implementation

• Kernel module with generic functionality– Create / destroy module– Create / destroy client– Make request / set active / set inactive– Make allocations– Dispatch– Check-in (note resource usage)

• Glue code for specific subsystems– Currently networking and CPU– Plan to add disk I/O

Page 19: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Networking Glue Code

Use the Linux QoS framework: create RSVT queueing discipline

IP

QoS

NIC

TCP

App

queueingdiscipline

Page 20: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Networking Glue Code

Non-RSVT traffic has priority (e.g. NFS traffic) and is counted as dead time

IP

NIC

TCP

App

RSVT?

sendimmediately

no enqueue

selectand send

yes

Page 21: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

CPU Scheduling Glue Code

• Use Linux modular scheduling core• Add an RSVT scheduling policy

– RSVT module essentially replaces the policy runqueue

– Initial implementation only for uniprocessors

• CFS and possibly other policies also exist and have higher priority– When they run, this is considered dead time

Page 22: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Timer Interrupts

• Linux employs timer interrupts (250 Hz)• Allocations are done at these times

– Translate time into microseconds– Subtract known dead time (unavailable to us)– Divide among active clients according to relative

allocations– Bound divergence of allocation from consumption

• Also handling of grace period (mark as inactive)• Also handling of rebirth (set ai = ci)

Page 23: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Multi-Queue

• At dispatch, need to find client with highest priority

• But priorities change at different rates• Solution: allow only a limited discrete set of

relative priorities• Each priority has a separate queue• Maintain all clients in each queue in priority

order• Only need to check the first in each queue to

find the maximum

Page 24: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Experiment – Basic Allocations

rate bandwidth1 30.890.0

52 61.410.0

2

Page 25: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Experiment – Basic Allocations

rate bandwidth1 15.690.1

12 30.810.0

33 46.100.0

3

Page 26: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Experiment – Active Set

Page 27: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Experiment – Grace Period

Page 28: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Experiment – Rebirth

Page 29: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Experiment – Throttling

•Two competing MPlayers

•The one with higher allocation does not need all of it– Allocation

tracks consumption

Page 30: Design and Implementation of a Generic Resource-Sharing Virtual-Time Dispatcher

Conclusions

• Demonstrated generic virtual-time based resource sharing dispatcher

• Need to complete implementation– Support for I/O scheduling– More details, e.g. SMP support

• Building block of global scheduling vision