model-based resource provisioning in a web service utility

14
Model-Based Resource Provisioning in a Web Service Utility Ronald P. Doyle IBM Research Triangle Park [email protected] Jeffrey S. Chase Omer M. Asad Wei Jin Amin M. Vahdat Department of Computer Science Duke University chase, jin, vahdat @cs.duke.edu Abstract Internet service utilities host multiple server applications on a shared server cluster. A key challenge for these systems is to provision shared resources on demand to meet service qual- ity targets at least cost. This paper presents a new approach to utility resource management focusing on coordinated pro- visioning of memory and storage resources. Our approach is model-based: it incorporates internal models of service behav- ior to predict the value of candidate resource allotments under changing load. The model-based approach enables the system to achieve important resource management goals, including differentiated service quality, performance isolation, storage- aware caching, and proactive allocation of surplus resources to meet performance goals. Experimental results with a pro- totype demonstrate model-based dynamic provisioning under Web workloads with static content. 1 Introduction The hardware resources available to a network service determine its maximum request throughput and—under typical network conditions—a large share of the re- sponse delays perceived by its clients. As hardware performance advances, emphasis is shifting from server software performance (e.g., [19, 26, 27, 37]) to improv- ing the manageability and robustness of large-scale ser- vices [3, 5, 12, 31]. This paper focuses on a key subproblem: automated on-demand resource provision- ing for multiple competing services hosted by a shared server infrastructure—a utility. It applies to Web-based services in a shared hosting center or a Content Distri- bution Network (CDN). The utility allocates each service a slice of its resources, R. Doyle is also a PhD student at Duke Computer Science. O. Asad is currently at Sun Microsystems. This work is supported by the U.S. National Science Foundation through grants CCR-00-82912, ANI-126231, and EIA-9972879, and by IBM, Network Appliance, and HP Laboratories. including shares of memory, CPU time, and available throughput from storage units. Slices provide perfor- mance isolation and enable the utility to use its resources efficiently. The slices are chosen to allow each hosted service to meet service quality targets (e.g., response time) negotiated in Service Level Agreements (SLAs) with the utility. Slices vary dynamically to respond to changes in load and resource status. This paper ad- dresses the provisioning problem: how much resource does a service need to meet SLA targets at its projected load level? A closely related aspect of utility resource al- location is assignment: which servers and storage units will provide the resources to host each service? Previous work addresses various aspects of utility re- source management, including mechanisms to enforce resource shares (e.g., [7, 9, 36]), policies to provision shares adaptively [12, 21, 39], admission control with probabilistically safe overbooking [4, 6, 34], scheduling to meet SLA targets or maximize yield [21, 22, 23, 32], and utility data center architectures [5, 25, 30]. The key contribution of this paper is to demonstrate the potential of a new model-based approach to provisioning multiple resources that interact in complex ways. The premise of model-based resource provisioning (MBRP) is that internal models capturing service workload and behavior can enable the utility to predict the effects of changes to the workload intensity or resource allot- ment. Experimental results illustrate model-based dy- namic provisioning of memory and storage shares for hosted Web services with static content. Given adequate models, this approach may generalize to a wide range of services including complex multi-tier services [29] with interacting components, or services with multiple func- tional stages [37]. Moreover, model-based provisioning is flexible enough to adjust to resource constraints or sur- pluses exposed during assignment. This paper is organized as follows. Section 2 motivates the work and summarizes our approach. Section 3 out-

Upload: others

Post on 12-Sep-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Model-Based Resource Provisioning in a Web Service Utility

Model-Based Resource Provisioning in a Web Service Utility

Ronald P. Doyle�

IBMResearch Triangle [email protected]

Jeffrey S. ChaseOmer M. Asad

�Wei Jin

Amin M. VahdatDepartment of Computer Science

�Duke University�

chase, jin, vahdat � @cs.duke.edu

Abstract

Internet service utilities host multiple server applications on ashared server cluster. A key challenge for these systems is toprovision shared resources on demand to meet service qual-ity targets at least cost. This paper presents a new approachto utility resource management focusing on coordinated pro-visioning of memory and storage resources. Our approach ismodel-based: it incorporates internal models of service behav-ior to predict the value of candidate resource allotments underchanging load. The model-based approach enables the systemto achieve important resource management goals, includingdifferentiated service quality, performance isolation, storage-aware caching, and proactive allocation of surplus resourcesto meet performance goals. Experimental results with a pro-totype demonstrate model-based dynamic provisioning underWeb workloads with static content.

1 Introduction

The hardware resources available to a network servicedetermine its maximum request throughput and—undertypical network conditions—a large share of the re-sponse delays perceived by its clients. As hardwareperformance advances, emphasis is shifting from serversoftware performance (e.g., [19, 26, 27, 37]) to improv-ing the manageability and robustness of large-scale ser-vices [3, 5, 12, 31]. This paper focuses on a keysubproblem: automated on-demand resource provision-ing for multiple competing services hosted by a sharedserver infrastructure—a utility. It applies to Web-basedservices in a shared hosting center or a Content Distri-bution Network (CDN).

The utility allocates each service a slice of its resources,�R. Doyle is also a PhD student at Duke Computer Science.�O. Asad is currently at Sun Microsystems.�This work is supported by the U.S. National Science Foundation

through grants CCR-00-82912, ANI-126231, and EIA-9972879, andby IBM, Network Appliance, and HP Laboratories.

including shares of memory, CPU time, and availablethroughput from storage units. Slices provide perfor-mance isolation and enable the utility to use its resourcesefficiently. The slices are chosen to allow each hostedservice to meet service quality targets (e.g., responsetime) negotiated in Service Level Agreements (SLAs)with the utility. Slices vary dynamically to respond tochanges in load and resource status. This paper ad-dresses the provisioning problem: how much resourcedoes a service need to meet SLA targets at its projectedload level? A closely related aspect of utility resource al-location is assignment: which servers and storage unitswill provide the resources to host each service?

Previous work addresses various aspects of utility re-source management, including mechanisms to enforceresource shares (e.g., [7, 9, 36]), policies to provisionshares adaptively [12, 21, 39], admission control withprobabilistically safe overbooking [4, 6, 34], schedulingto meet SLA targets or maximize yield [21, 22, 23, 32],and utility data center architectures [5, 25, 30].

The key contribution of this paper is to demonstrate thepotential of a new model-based approach to provisioningmultiple resources that interact in complex ways. Thepremise of model-based resource provisioning (MBRP)is that internal models capturing service workload andbehavior can enable the utility to predict the effectsof changes to the workload intensity or resource allot-ment. Experimental results illustrate model-based dy-namic provisioning of memory and storage shares forhosted Web services with static content. Given adequatemodels, this approach may generalize to a wide range ofservices including complex multi-tier services [29] withinteracting components, or services with multiple func-tional stages [37]. Moreover, model-based provisioningis flexible enough to adjust to resource constraints or sur-pluses exposed during assignment.

This paper is organized as follows. Section 2 motivatesthe work and summarizes our approach. Section 3 out-

Page 2: Model-Based Resource Provisioning in a Web Service Utility

� � �� ����

� ���� � ������ � � ���� ������

� ��� � ���"!$#��%&�'��()��� *��

+-,/.10/2�.13�45 687 4:9�,;. 7 < 9�=�>

? 3)@ 7 .A3�45 3�B)=19�4 < 2�. 7 3)@/=/>

Figure 1: A utility OS. A feedback-controlled policy moduleor executive monitors load and resource status, determines re-source slices for each service, and issues directives to configureserver shares and direct request traffic to selected servers.

lines simple models for Web services; Section 4 de-scribes a resource allocator based on the models, anddemonstrates its behavior in various scenarios. Section 5describes our prototype and presents experimental re-sults. Section 6 sets our approach in context with relatedwork, and Section 7 concludes.

2 Overview

Our utility provisioning scheme is designed to run withina feedback-controlled resource manager for a servercluster, as depicted in Figure 1. The software that con-trols the mapping of workloads to servers and storageunits is a utility operating system; it supplements theoperating systems on the individual servers by coordi-nating traditional OS functions (e.g. resource manage-ment and isolation) across the utility as a whole. An ex-ecutive policy component continuously adjusts resourceslices based on smoothed, filtered observations of traf-fic and resource conditions, as in our previous work onthe Muse system [12]. The utility enforces slices byassigning complete servers to services, or by partition-ing the resources of a physical server using a resourcecontrol mechanism such as resource containers [9] or avirtual machine monitor [36]. Reconfigurable redirect-ing switches direct request traffic toward the assignedservers for each service.

This paper deals with the executive’s policies for provi-sioning resource slices in this framework. Several fac-tors make it difficult to coordinate provisioning for mul-tiple interacting resources:

C Bottleneck behavior. Changing the allotment ofresources other than the primary bottleneck has lit-tle or no effect. On the other hand, changes at abottleneck affect demand for other resources.

C Global constraints. Allocating resources underconstraint is a zero-sum game. Even when data cen-ters are well-provisioned, a utility OS must copewith constraints when they occur, e.g., due to ab-

normal load surges or failures. Memory is oftenconstrained; in the general case there is not enoughto cache the entire data set for every service. Mem-ory management can dramatically affect serviceperformance [26].

C Local constraints. Assigning service components(capsules) to specific nodes leads to a bin-packingproblem [3, 34]. Solutions may expose local re-source constraints or induce workload interference.

C Caching. Sizing of a memory cache in one com-ponent or stage can affect the load on other com-ponents. For example, the OS may overcome a lo-cal constraint at a shared storage unit by increas-ing server memory allotment for one capsule’s I/Ocache, freeing up storage throughput for use by an-other capsule.

The premise of MBRP is that network service loads havecommon properties that allow the utility OS to predicttheir behavior. In particular, service loads are streams ofrequests with stable average-case behavior; the model al-lows the system to adapt to changing resource demandsat each stage by continuously feeding observed requestarrival rates to the models to predict resource demandsat each stage. Moreover, the models enable the systemto account for resource interactions in a comprehensiveway, by predicting the effects of planned resource allot-ments and placement choices. For example, the mod-els can answer questions like: “how much memory isneeded to reduce this service’s storage access rate by20%?”. Figure 2 depicts the use of the models withinthe utility OS executive.

MBRP is a departure from traditional resource manage-ment using reactive heuristics with limited assumptionsabout application behavior. Our premise is that MBRPis appropriate for a utility OS because it hosts a smallernumber of distinct applications that are both heavilyresource-intensive and more predictable in their averageper-request resource demands. In many cases the work-loads and service behavior have been intensively stud-ied. We propose to use the resulting models to enabledynamic, adaptive, automated resource management.

For example, we show that MBRP can provision foraverage-case response time targets for Web services withstatic content under dynamically varying load. First,the system uses the models to generate initial candidateresource allotments that it predicts will meet responsetime targets for each service. Next, an assignment phasemaps service components to specific servers and stor-age units in the data center, balancing affinity, migra-tion cost, competing workloads, and local constraints onindividual servers and storage units. The system may

Page 3: Model-Based Resource Provisioning in a Web Service Utility

DFEHG'IKJKLM&ENPOQO IHM NSR ITL

U�V�WYX-Z[X�V�\�]VS^�^ _�\Y`a]SW�\�b

c ]�d)e�_�d�`$V�W�U�]c d�]�X-Z�U�\8Z _�WYb

fhg-g Oji M NSRki IFlmnIKoHE O G

Figure 2: Using models to evaluate candidate resource allot-ments in a utility OS. The executive refines the allotments untilit identifies a resource assignment that meets system goals.

adjust candidate allotments to compensate for local con-straints or resource surpluses discovered during the as-signment phase. In this context, MBRP meets the fol-lowing goals in an elegant and natural way:

C Differentiated service quality. Since the systemcan predict the effects of candidate allotments onservice quality, it can plan efficient allotments tomeet response time targets, favoring services withhigher value or more stringent SLAs.

C Resource management for diverse workloads.Each service has a distinct profile for referencelocality and the CPU-intensity of requests. Themodel parameters capture these properties, en-abling the executive to select allotments to meetperformance goals. These may include allocatingsurplus resources to optimize global metrics (e.g.,global average response time, yield, or profit).

C Storage-aware caching. Storage units vary in theirperformance due to differences in their configura-tions or assigned loads. Models capture these dif-ferences, enabling the resource scheduler to com-pensate by assigning different amounts of memoryto reduce the dependence on storage where appro-priate. Recent work [16] proposed reactive kernelheuristics with similar goals.

C On-line capacity planning. The models can deter-mine aggregate resource requirements for all hostedservices at observed or possible load levels. This isuseful for admission control or to vary hosting ca-pacity with offered load.

This flexibility in meeting diverse goals makes MBRPa powerful basis for proactive resource management inutilities.

CPUp�q�q&r s8pHt&q�p�u8v λ

S t o r a g e

O b j e c t c a c h e ( M )

D w

λ x

Figure 3: A simple model for serving static Web content. Re-quests arrive at rate y (which varies with time) and incur anaverage service demand z|{ in a CPU. An in-memory cache ofsize } absorbs a portion ~ of the requests as hits; the missesgenerate requests to storage at rate yk� .

Parameter Meaning� Zipf locality parameter�Offered load in requests/s

S Average object sizeT Total number of objectsM Memory size for object cache���

Average per-request CPU demand�F� , � Peak storage throughput in IOPS

Table 1: Input parameters for Web service models.

3 Web Service Models

This section presents simplified analytical models to pre-dict performance of Web services with static content,based on the system model depicted in Figure 3. Table 1summarizes the model parameters and other inputs. Ta-ble 2 summarizes the model outputs, which guide thechoices of the resource allocator described in Section 4.

The models are derived from basic queuing theory andrecent work on performance modeling for Web services.They focus on average-case behavior and abstract awaymuch detail. For example, they assume that the networkwithin the data center is well-provisioned. Each of themodels could be extended to improve its accuracy; whatis important is the illustration they provide of the poten-tial for model-based resource provisioning.

3.1 Server Memory

Many studies indicate that requests to static Web objectsfollow a Zipf-like popularity distribution [11, 38]. Theprobability �K� of a request to the � th most popular objectis proportional to �8��H� , for some parameter � . Many re-quests target the most popular objects, but the distribu-tion has a heavy tail of unpopular objects with poor ref-erence locality. Higher � values increase the concentra-

Page 4: Model-Based Resource Provisioning in a Web Service Utility

Parameter Meaning� �CPU response time�Object cache hit ratio� � Storage request load in IOPS� � Average storage response time�Average total response time

Table 2: Performance measures predicted by Web models.

tion of requests on the most popular objects. We assumethat object size is independent of popularity [11], andthat size distributions are stable for each service [35].

Given these assumptions, a utility OS can estimate hitratio for a memory size � from two parameters: � , and�

, the total size of the service’s data (consider � and�

to be in units of objects). If the server effectively cachesthe most popular objects (i.e., assuming perfect LeastFrequently Used or LFU replacement), and ignoring ob-ject updates, the predicted object hit ratio

�is given by

the portion of the Zipf-distributed probability mass thatis concentrated in the � most popular objects. We canclosely approximate this

�by integrating over a con-

tinuous form of the Zipf probability distribution func-tion [11, 38]. The closed form solution reduces to:

��� ���������T���� � ���H� (1)

Zipf distributions appear to be common in a large num-ber of settings, so this model is more generally ap-plicable. While pure LFU replacement is difficult toimplement, a large body of Web caching research hasyielded replacement algorithms that approximate LFU;even poor schemes such as LRU are qualitatively simi-lar in their behavior.

3.2 Storage I/O Rate

Given an estimate of cache hit ratio�

, the service’s stor-age I/O load is easily derived: for a Web load of

�re-

quests per second, the I/O throughput demand� � for

storage (in IOPS, or I/O operations per second) is:

� � � �K��� ��� ���(2)

The average object size�

is given as the number of I/Ooperations to access an object on a cache miss.

3.3 Storage Response Time

To determine the performance impact of disk I/O, wemust estimate the average response time

� � from stor-age. We model each storage server as a simple queuing

0 100 200 300 400 5005

10

15

20

25

30

35

40

45

Offered Load λS (IOPS)

Res

pons

e Ti

me

RS

(mse

c) o: Measured response time

*: Calculated by the model

µS = 453

µS = 587

µS = 660

Figure 4: Predicted and observed � � for an NFS server (Dell4400, �¡ £¢ Seagate Cheetah disks, 256MB, BSD/UFS) un-der three synthetic file access workloads with different filesetsizes.

center, parameterized for ¤ disks; to simplify the pre-sentation, we assume that the storage servers in the util-ity are physically homogeneous. Given a stable requestmix, we estimate the utilization of a storage server at agiven IOPS request load

� � as� � � �F� , where �F� is its

saturation throughput in IOPS for that service’s file setand workload profile. We determine �¥� empirically for agiven request mix. A well-known response time formulathen predicts the average storage response time as:

� � � ¤S� �F���� �¦� � � � � � (3)

This model does not reflect variations in cache behaviorwithin the storage server, e.g., from changes to � or

� � .Given the heavy-tailed nature of Zipf distributions, Webserver miss streams tend to show poor locality when theWeb server cache ( � ) is reasonably provisioned [14].

Figure 4 shows that this model closely approximatesstorage server behavior at typical utilization levels andunder low-locality synthetic file access loads (randomreads) with three different fileset sizes (

�). We used the

FreeBSD Concatenated Disk Driver (CCD) to stripe thefilesets across ¤ �¨§

disks, and the fstress tool [2] togenerate the workloads and measure

� � .

Since storage servers may be shared, we extend themodel to predict

� � when the service receives an al-lotment � of storage server resources, representing themaximum storage throughput in IOPS available to theservice within its share.

Page 5: Model-Based Resource Provisioning in a Web Service Utility

� � � ¤S������ �¦� � ��� � (4)

This formula assumes that some scheduler enforces pro-portional shares �T� � � for storage. Our prototype usesRequest Windows [18] to approximate proportional shar-ing. Other approaches to differentiated scheduling forstorage [23, 33] are also applicable.

3.4 Service Response Time

We combine these models to predict average total re-sponse time

�for the service, deliberately ignoring con-

gestion on the network paths to the clients (which theutility OS cannot control). Given a measure

�©�of

average per-request service demand on the Web serverCPU, CPU response time

� �is given by a simple queu-

ing model similar to the storage model above; previouswork [12] illustrates use of such a model to adaptivelyprovision CPU resources for Web services. The ser-vice’s average response time

�is simply:

�ª�«� �n¬ � � � ��� ���(5)

This ignores the CPU cost of I/O and the effects ofprefetching for large files. The CPU and storage modelsalready account (crudely) for these factors in the averagecase.

3.5 Discussion

These models are cheap to evaluate and they capture thekey behaviors that determine application performance.They were originally developed to improve understand-ing of service behavior and to aid in static design ofserver infrastructures; since they predict how resourcedemands change as a function of offered load, they canalso act as a basis for dynamic provisioning in a sharedhosting utility. A key limitation is that the models as-sume a stable average-case per-request behavior, andthey predict only average-case performance. For exam-ple, the models here are not sufficient to provision forprobabilistic performance guarantees. Also, since theydo not account for interference among workloads us-ing shared resources, MBRP depends on performanceisolation mechanisms (e.g., [9, 36]) that limit this in-terference. Finally, the models do not capture over-load pathologies [37]; MBRP must assign sufficient re-sources to avoid overload, or use dynamic admissioncontrol to prevent it.

Importantly, the models are independent of the MBRPframework itself, so it is possible to replace them withmore powerful models or extend the approach to a wider

range of services. For example, it is easy to model sim-ple dynamic content services with a stable average-caseservice demand for CPU and memory. I/O patterns fordatabase-driven services are more difficult to model, buta sufficient volume of requests will likely reflect a stableaverage-case behavior.

MBRP must parameterize the models for each servicewith the inputs from Table 1.

�and

�parameters

and average-case service demands are readily obtainable(e.g., as in Muse [12] or Neptune [32]), but it is an openquestion how to obtain � and �¥� from dynamic obser-vations. The system can detect anomalies by comparingobserved behavior to the model predictions, but MBRPis “brittle” unless it can adapt or reparameterize the mod-els when anomalies occur.

4 A Model-Based Allocator

This section outlines a resource provisioning algorithmthat plans least-cost resource slices based on the mod-els from Section 3. The utility OS executive periodi-cally invokes it to adjust the allotments, e.g., based onfiltered load and performance observations. The out-put is an allotment vector for each service, represent-ing a CPU share together with memory and storage al-lotments ­ �¯®��k° . The provisioning algorithm comprisesthree primitives designed to act in concert with an as-signment planner, which maps the allotted shares ontospecific servers and storage units within the utility. Theresource provisioning primitives are as follows:

C Candidate plans initial candidate allotment vectorsthat it predicts will meet SLA response time targetsfor each service at its load

�.

C LocalAdjust modifies a candidate vector to adapt toa local resource constraint or surplus exposed dur-ing assignment. For example, the assignment plan-ner might place some service on a server with insuf-ficient memory to supply the candidate � ; Local-Adjust constructs an alternative vector that meetsthe targets within the resource constraint, e.g., byincreasing � to reduce

� � .

C GroupAdjust modifies a set of candidate vectors toadapt to a resource constraint or surplus exposedduring assignment. It determines how to reduce oraugment the allotments to optimize a global metric,e.g., to minimize predicted average response time.For example, the assignment planner might assignmultiple hosted services to share a network stor-age unit; if the unit is not powerful enough to meetthe aggregate resource demand, then GroupAdjustmodifies each vector to conform to the constraint.

Page 6: Model-Based Resource Provisioning in a Web Service Utility

We have implemented these primitives in a prototype ex-ecutive for a utility OS. The following subsections dis-cuss each of these primitives in turn.

4.1 Generating Initial Candidates

To avoid searching a complex space of resource combi-nations to achieve a given performance goal, Candidatefollows a simple principle: build a balanced system. Theallocator configures CPU and storage throughput ( � )allotments around a predetermined average utilizationlevel ±&²´³)µ;¶)·A² . The ±�²´³�µ�¶)·A² may be set in a “sweet spot”range from 50-80% that balances efficient use of stor-age and CPU resources against queuing delays incurredat servers and storage units. The value for ±�²´³�µ;¶)·A² is aseparate policy choice. Lower values for ±�²´³�µ�¶)·A² providemore “headroom” to absorb transient load bursts for theservice, reducing the probability of violating SLA tar-gets. The algorithm generalizes to independent ± ²´³�µ�¶)·A²values for each

�¦¸�¹�º8»�¼1½)¹ ® º¹¸�¾8¿Kº½)¹ �pair. ,

The Candidate algorithm consists of the following steps:

C Step 1. Predict CPU response time�À�

at the con-figured ± ²´³�µ;¶)·A² , as described in Section 3.4. Selectinitial � � � � .

C Step 2. Using � and ±�²´³)µ;¶)·A² , predict storage re-sponse time

� � using Equation (4). Note that theallocator does not know

� � at this stage, but it isnot necessary because

� � depends only on � andthe ratio of

� � / � , which is given by ±�²´³�µ�¶)·A² .C Step 3. Determine the required server memory hit

ratio (�

) to reach the SLA target response time�

,using Equation (5) and solving for

�as:

�Á� ���� � � �

� � (6)

C Step 4. Determine the storage arrival rate� � from�

and�

, using Equation (2). Determine and as-sign the resulting candidate storage throughput al-lotment � =

� � / ± ²´³�µ�¶)·A² .C Step 5. Loop to step 2 to recompute storage re-

sponse time� � with the new value of � . Loop until

the difference of � values is within a preconfiguredpercentage of �P� .

C Step 6. Determine and assign the memory allot-ment � necessary to achieve the hit ratio

�, using

Equation (1).

Note that the candidate � is the minimum required tomeet the response time target. Given reasonable tar-gets, Candidate leaves memory undercommitted. To il-lustrate, Figure 5 shows candidate storage and memory

0

50

100

150

200

250

0 100 200 300 400Arrival Rate (Â )

Sto

rage

Allo

tmen

t (

à )

0

10

20

30

40

50

60

70

Mem

ory

(MB

)

Storage Allotment ( Ä )

Memory Allotment

Figure 5: Using Candidate and LocalAdjust to determinememory and storage allotments for a service. As service load( y ) increases, Candidate increases the storage share Å to meetresponse time targets. After encountering a resource constraintat Å© ÇÆYÈ�È IOPS, LocalAdjust transforms the candidate allot-ments to stay on target by adding memory instead.

allotments ­ �¯®��k° for an example service as offered Webload

�increases along the � -axis. Candidate responds

to increasing load by increasing � rather than � . Thisis because increasing

�does not require a higher hit ra-

tio�

to meet a fixed response time target. For a fixed� and corresponding

�,

� � grows linearly with�

, andso the storage allotment � also grows linearly following� � �'� � ±&²´³�µ�¶)·A² .Figure 5 also shows how the provisioning scheme ad-justs the vectors to conform to a resource constraint.This example constrains � to 200 IOPS, ultimately forc-ing the system to meet its targets by increasing the can-didate � . Candidate itself does not consider resourceconstraints; the next two primitives adapt allotment vec-tors on a local or group basis to conform to resourceconstraints, or to allocate surplus resources to improveperformance according to system-wide metrics.

4.2 Local Constraint or Surplus

The input to LocalAdjust is a candidate allotment vectorand request arrival rate

�, together with specified con-

straints on each resource. The output of LocalAdjust isan adjusted vector that achieves a predicted average-caseresponse time as close as possible to the target, whileconforming to the resource constraints. Since this pa-per focuses primarily on memory and storage resources,we ignore CPU constraints. Specifically, we assume thatthe expected CPU response time

� �for a given ±�²´³�µ�¶)·A²

Page 7: Model-Based Resource Provisioning in a Web Service Utility

0

5

10

15

20

25

30

35

40

0 20 40 60 80 100Available Memory (MB)

Mem

ory

Allo

tmen

t (M

B)

É = .6, T= 20000 ObjectsÉ = .6, T= 40000 ObjectsÉ = .9, T = 20000 ObjectsÉ = .9, T = 40000 Objects

0

5

10

15

20

25

30

35

40

45

50

0 20 40 60 80 100

Available Memory (MB)

Mem

ory

Allo

tmen

t (M

B)

Ê = 25Ê = 50Ê = 75Ê = 100

Figure 6: Memory allotments by GroupAdjust for four competing services with different caching profiles (left) and differentrequest arrival rates (right).

is fixed and achievable. CPU allotments are relativelystraightforward because memory and storage allotmentsaffect per-request CPU demand only minimally. For ex-ample, if the CPU is the bottleneck, then the allotmentsfor other resources are easy to determine: set

�to the

saturation request throughput rate, and provision otherresources as before.

If the storage constraint falls below the candidate stor-age allotment � , then LocalAdjust assigns the maximumvalue to � , and rebalances the system by expanding thememory allotment � to meet the response time targetgiven the lower allowable request rate

� � for storage.Determine the allowable

� � � �S± ²´³�µ�¶)·A² at the precon-figured ± ²´³)µ;¶)·A² . Determine the hit ratio

�needed to

achieve this� � using Equation (2), and the memory al-

lotment � to achieve�

using Equation (1).

Figure 5 illustrates the effect of LocalAdjust on the can-didate vector under a storage constraint at � �ÌË�Í&ÍIOPS. As load

�increases, LocalAdjust meets the re-

sponse time target by holding � to the maximum andgrowing � instead. The candidate � varies in a(slightly) nonlinear fashion because

�grows as � in-

creases, so larger shares of the increases to�

are ab-sorbed by the cache. This effect is partially offset by thedynamics of Web content caching captured in Equation(1): due to the nature of Zipf distributions,

�grows log-

arithmically with � , requiring larger marginal increasesto � to effect the same improvement in

�.

If memory is constrained, LocalAdjust sets � to themaximum and rebalances the system by expanding �(if possible). The algorithm is as follows: determine

�and

� � at � using Equations (1) and (2), and use� � to determine the adjusted storage allotment as � �� � �± ²´³)µ;¶)·A² . Then compensate for the reduced

�by in-

creasing � further to reduce storage utilization levels be-low ±�²´³�µ�¶)·A² , improving the storage response time

� � .

If both � and � are constrained, assign both to theirmaximum values and report the predicted response timeusing the models in the obvious fashion. LocalAdjust ad-justs allotments to consume a local surplus in the sameway. This may be useful if the service is the only loadcomponent assigned to some server or set of servers.Surplus assignment is more interesting when the systemmust distribute resources among multiple competing ser-vices, as described below.

4.3 Group Constraint or Surplus

GroupAdjust adjusts a set of candidate allotment vec-tors to consume a specified amount of memory and/orstorage resource. GroupAdjust may adapt the vectors toconform to resource constraints or to allocate a resourcesurplus to meet system-wide goals.

For example, GroupAdjust can reprovision availablememory to maximize hit ratio across a group of hostedservices. This is an effective way to allocate surplusmemory to optimize global response time, to degradeservice fairly when faced with a memory constraint, orto reduce storage loads in order to adapt to a storage con-straint. Note that invoking GroupAdjust to adapt to con-straints on specific shared storage units is an instance ofstorage-aware caching [16], achieved naturally as a sideeffect of model-based resource provisioning.

Page 8: Model-Based Resource Provisioning in a Web Service Utility

0

5

10

15

20

25

30

45 55 65 75 85Increase Total Memory (MB) ->

Mem

ory

Allo

tmen

t (M

B)

Highest SLA Response Time2nd Highest SLA Response3rd Highest SLA ReponseLowest SLA Response Time

Figure 7: Using Candidate and GroupAdjust to allocate mem-ory to competing services; in this example, the services areidentical except for differing SLA response time targets. Can-didate determines the minimum memory to meet each ser-vice’s target; GroupAdjust allocates any surplus to the serviceswith the least memory.

GroupAdjust assigns available memory to maximize hitratio across a group as follows. First, multiply Equation(1) for each service by its expected request arrival rate�

. This gives a weighted value of service hit rates (hitsor bytes per unit of time) for each service, as a functionof its memory allotment � . The derivative of this func-tion gives the marginal benefit for allocating a unit ofmemory to each service at its current allotment � . Aclosed form solution is readily available, since Equation(1) was obtained by integrating over the Zipf probabil-ity distribution (see Section 3.1). GroupAdjust uses agradient-climbing approach (based on the Muse MSRPalgorithm [12]) to assign each unit of memory to the ser-vice with the highest marginal benefit at its current � .This algorithm is efficient: it runs in time proportional tothe product of the number of candidate vectors to adjustand the number of memory units to allocate.

To illustrate, Figure 6 shows the memory allotments ofGroupAdjust to four competing services with differentcache behavior profiles

� � ® � ® �h�, as available memory

increases on the � -axis. The left-hand graph varies the� and�

parameters, holding offered load�

constantacross all services. Services with higher � concentratemore of their references on the most popular objects,improving cache effectiveness for small � , while ser-vices with lower

�can cache a larger share of their ob-

jects with a given � . The graph shows that for serviceswith the same � values, GroupAdjust prefers to allocate

memory to services with lower�

. In the low-memorycases, it prefers higher-locality services (higher � ); astotal memory increases and the most popular objects ofhigher � services fit in cache, the highest marginal ben-efit for added memory moves to lower � services. Theseproperties of Web service cache behavior result in thecrossover points for services with differing � values.

The right-hand graph in Figure 6 shows four serviceswith equal � and

�but varying offered load

�. Higher�

increases the marginal benefit of adding memory for aservice, because a larger share of requests go to that ser-vice’s objects. GroupAdjust effectively balances thesecompeting demands. When the goal is to maximizeglobal hit ratio, as in this example, a global perfect-LFU policy in the Web servers would ultimately yieldthe same � shares devoted to each service. Our model-based partitioning scheme produces the same result asthe reactive policy (if the models are accurate), but italso accommodates other system goals in a unified way.For example, GroupAdjust can use the models to opti-mize for global response time in a storage-aware fash-ion; it determines the marginal benefit in response timefor each service as a function of its � , factoring in thereduced load on shared storage. Similarly, it can accountfor service priority by optimizing for “utility” [12, 21] or“yield” [32], composing the response time function foreach service with a yield function specifying the valuefor each level of service quality.

4.4 Putting It All Together

Candidate and GroupAdjust work together to combinedifferentiated service with other system-wide perfor-mance goals. Figure 7 illustrates how they allocate sur-plus memory to optimize global response time for fourexample services. In this example, the hosted serviceshave identical

�and caching profiles

� � ® � ® �h�, but their

SLAs specify different response time targets. Candidateallocates the minimum memory (total 46 MB) to meetthe targets, assigning more memory to services withmore stringent targets. GroupAdjust then allocates sur-plus memory to the services that offer the best marginalimprovement in overall response time. Since the ser-vices have equivalent behavior, the surplus goes to theservices with the smallest allotments.

Figure 8 further illustrates the flexibility of MBRP toadapt to observed changes in load or system behavior.It shows allotments ­ �¯®��S° for three competing servicess0, s1, and s2. The system is configured to consolidateall loads on a shared server and storage unit, meet min-imal response time targets when possible, and use anysurplus resources to optimize global response time. Theservices begin with identical arrival rates

�, caching pro-

files� � ® � ® �"�

, and response time targets. The experi-

Page 9: Model-Based Resource Provisioning in a Web Service Utility

1 Services Equal

2 Svc 0: increase Î * 2

3 Svc 1: increase Î * 3

4 Svc 2: reduce α from .9 to .6

5 Svc 2: reduce SLA 40%

6 Reduce total memory 10%

Storage Bandwidth Memory

Service 0 Service 1 Service 2

Figure 8: Allotments Ï }�Ð;Å�Ñ for three competing services in a multiple step experiment. For each step, the graphs represent theallotment percentage of each resource received by each service.

ment modifies a single parameter in each of six steps andshows its effect on the allotments; the changes throughthe steps are cumulative.

C In the base case, all services receive equal shares.

C Step 2 doubles�

for s0. The system shifts memoryto s0 to reflect its higher share of the request load,and increases its � to rebalance storage utilizationto ±�²´³�µ�¶�·1² , compensating for a higher storage load.

C Step 3 triples the arrival rate for s1. The systemshifts memory to s1; note that the memory sharesmatch each service’s share of the total request load.The system also rebalances storage by growing s1’sshare at the expense of the lightly loaded s2.

C Step 4 reduces the cache locality of s2 by reducingits � from Ò Ó to Ò Ô . This forces the system to shiftresources to s2 to meet its response time target, atthe cost of increasing global average response time.

C Step 5 lowers the response time target for s2 by§�Í�Õ. The system further compromises its global

response time goal by shifting even more memoryto s2 to meet its more stringent target. The addi-tional memory reduces the storage load for s2, al-lowing the system to shift some storage resource tothe more heavily loaded services.

C The final step in this experiment reduces theamount of system memory by � Í�Õ

. Since s2 holdsthe minimum memory needed to meet its target, thesystem steals memory from s0 and s1, and rebal-ances storage by increasing s1’s share slightly.

0

20

40

60

80

100

120

140

0 10 20 30 40 50Time (Minutes)

Cac

he S

ize

(MB

)

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Thou

sand

s

Ö S (I

OP

S)

Allotted Memory

Consumed Memory

Predicted × Storage (IOPS)

Storage IOPS Moving Average

Figure 9: Predicted and observed storage I/O request ratevary with changing memory allotment for a Dash server un-der a typical static Web load (a segment of a 2001 trace ofwww.ibm.com).

5 Prototype and Results

To evaluate the MBRP approach, we prototyped keycomponents of a Web service utility (as depicted inFigure 1) and conducted initial experiments using Webtraces and synthetic loads. The cluster testbed consistsof load generating clients, a reconfigurable L4 redirect-ing switch (from [12]), Web servers, and network stor-age servers accessed using the Direct Access File Sys-tem protocol (DAFS [13, 24]), an emerging standard fornetwork storage in the data center. We use the DAFS im-plementation from [24] over an Emulex cLAN network.

Page 10: Model-Based Resource Provisioning in a Web Service Utility

0

20

40

60

80

100

120

140

160

0 20 40 60 80 100

Ø (Req

uest

s/S

ec)

Service 1 ÙService 1 Smoothed ÙService 2 ÙService 2 Smoothed Ù

Figure 10: Arrival rate y for two services handling syntheticload swells under the Dash Web server.

The prototype utility OS executive coordinates resourceallocation as described in Section 4. It periodicallyobserves request arrival rates (

�) and updates resource

slices to adapt to changing conditions. The executiveimplements its actions through two mechanisms. First,it issues directives to the switch to configure the activeserver sets for each hosted service; the switch distributesincoming requests for each service evenly across its ac-tive set. Second, it controls the resource shares allocatedto each service on each Web server.

To allow external resource control, our prototype usesa new Web server that we call Dash [8]. Dash acts asa trusted component of the utility OS; it provides a pro-tected, resource-managed execution context for services,and exports powerful resource control and monitoringinterfaces to the executive. Dash incorporates a DAFSuser-level file system client, which enables user-level re-source management in the spirit of Exokernel [19], in-cluding full control over file caching and and data move-ment [24]. DAFS supports fully asynchronous accessto network storage, enabling a single-threaded, event-driven Web server structure as proposed in the FlashWeb server work [27]—hence the name Dash. In addi-tion, Dash implements a decentralized admission controlscheme called Request Windows [18] that approximatesproportional sharing of storage server throughput. Thedetails and full evaluation of Dash and Request Windowsare outside the scope of this paper.

For our experiments, the Dash and DAFS servers runon SuperMicro SuperServer 6010Hs with 866 MHzPentium-III Xeon CPUs; the DAFS servers use one 9.1GB 10,000 RPM Seagate Cheetah drive. Dash con-

0

10

20

30

40

50

60

70

80

0 20 40 60 80 100

Cac

he S

ize

(MB

)

Svc 1 Memory (M)

Svc 2 Memory (M)

Figure 11: Memory allotments } for two services handlingsynthetic load swells under the Dash Web server. As load in-creases, LocalAdjust provisions additional memory to the ser-vices to keep the response time within SLA limits. Service 1is characterized by higher storage access costs and thereforereceives more memory to compensate.

trols memory usage as reported in the experiments. Webtraffic originates from a synthetic load generator ([10])or Web trace replay as reported; the caching profiles� � ® � ® �h�

are known a priori and used to parameterizethe models. All machines run FreeBSD 4.4.

We first present a simple experiment to illustrate theDash resource control and to validate the hit ratio model(Equation (2)). Figure 9 shows the predicted and ob-served storage request rate

� � in IOPS as the service’smemory allotment � varies. The Web load is an accel-erated 40-minute segment of a 2001 IBM trace [12] withsteadily increasing request rate

�. Larger � improves

the hit ratio for the Dash server cache; this tends to re-duce

� � , although� � reflects changes in

�as well as

hit ratio. The predicted� � approximates the observed

I/O load; the dip at Ú �ÜÛ&Íminutes is due to a tran-

sient increase in request locality, causing an unpredictedtransient improvement in cache hit ratio. Although themodels tend to be conservative in this example, the ex-periment demonstrates the need for a safety margin toprotect against transient deviations from predicted be-havior.

To illustrate the system’s dynamic behavior in storage-aware provisioning, we conducted an experiment withtwo services with identical caching profiles

� � ® � ® �"�and response time targets, serving identical syntheticload swells on a Dash server. The peak IOPS through-

Page 11: Model-Based Resource Provisioning in a Web Service Utility

0

50

100

150

200

250

Time

Req

uest

s P

er S

econ

dService 1 Arrival, Moving Average

Service 2 Arrival, Moving Average

Service 3 Arrival, Moving Average

0

50

100

150

200

250

300

350

Time

Res

pons

e Ti

me

(ms)

Service 2, switching

Service 2, no switching

Figure 13: Arrival rates for competing services (left) and client response time with and without a bin-packing assignment phase toswitch Web servers (right) handling the service.

0

200

400

600

800

1000

1200

1400

1600

1800

0 20 40 60 80 100

Ý S (I

OP

S)

Svc 1 Storage Þ (IOPS)

Svc 2 Storage Þ (IOPS)

Figure 12: I/O rate ( yk� ) for two services handling syntheticload swells under the Dash Web server. As additional memoryis allocated to the services to meet SLA targets, the storage ar-rival rate decreases. Service 1 storage load reduces at a greaterrate due to the additional memory allocated to this service.

puts available at the storage server for each service (re-flected in the �Pß parameters) are constrained at differentlevels, with a more severe constraint for service 1. Fig-ure 10 shows the arrival rates

�and the values smoothed

by a “flop-flip” stepped filter [12] for input to the exec-utive. Figure 11 shows the memory allotments for eachservice during the experiments, and Figure 12 shows theresulting storage loads

� � . The storage constraints forcethe system to assign each service more memory to meet

its target; as load increases, it allocates proportionallymore memory to service 1 because it requires a higher�

to meet the same target. As a result, service 1 showsa lower I/O load on its more constrained storage server.This is an example of how the model-based provisioningpolicies (here embodied in LocalAdjust) achieve similargoals to storage-aware caching [16].

The last experiment uses a rudimentary assignment plan-ner to illustrate the role of assignment in partitioningcluster resources for response time targets. We com-pared two runs of three services on two Dash serversunder the synthetic loads shown on the left-hand sideof Figure 13, which shows a saturating load spike forservice 3. In the first run, service 1 is bound to serverà

and services 2 and 3 are bound to server á . Thisresults in a response time jump for service 2, shownin the right-hand graph in Figure 13; since the systemcannot meet targets for both services, it uses GroupAd-just to provision á ’s resources for the best average-caseresponse time. The second run employs a simple bin-packing scheme to assign the provisioned resource slicesto servers. In this run, the system reassigns service 2 toà

when the load spike for service 3 exposes the localresource constraint on á ; this is possible because Can-didate determines that there are sufficient resources on

àto meet the response time targets for both services 1 and2. To implement this choice, the executive directs theswitch to route requests for service 2 to

àrather than

á . This allows service 2 to continue meeting its tar-get. This simple example shows the power of the model-based provisioning primitives as a foundation for com-prehensive resource management for cluster utilities.

Page 12: Model-Based Resource Provisioning in a Web Service Utility

6 Related Work

Adaptive resource management for servers. Aron [6]uses kernel-based feedback schedulers to meet latencyand throughput goals for network services running ona single shared server. Abdelzaher et. al. [1] haveaddressed the control-theoretical aspects of feedback-controlled Web server performance management.

SEDA [37] reactively provisions CPU resources (byvarying the number of threads) across multiple servicestages within a single server. SEDA emphasizes re-quest admission control to degrade gracefully in over-load. A utility can avoid overload by expanding re-source slices and recruiting additional servers as trafficincreases. SEDA does not address performance isolationfor shared servers.

Resource management for cluster utilities. Severalstudies use workload profiling to estimate the resourcesavings of multiplexing workloads in a shared utility.They focus on the probability of exceeding performancerequirements for various degrees of CPU overbook-ing [4, 6, 34]. Our approach varies the degree of over-booking to adapt to load changes, but our current modelsconsider only average-case service quality within eachinterval. The target utilization parameters ( ± ²´³)µ;¶)·A² ) al-low an external policy to control the “headroom” to han-dle predicted load bursts with low probability of violat-ing SLA targets.

There is a growing body of work on adaptive re-source management for cluster utilities under time-varying load. Our work builds on Muse [12], whichuses a feedback-controlled policy manager and redirect-ing switch to partition cluster resources; [39] describesa similar system that includes a priority-based admis-sion control scheme to limit load. Levy [21] presents anenhanced framework for dynamic SOAP Web services,based on a flexible combining function to optimize con-figurable class-specific and cluster-specific objectives.Liu [22] proposes provisioning policies (SLAP) for e-commerce traffic in a Web utility, and focuses on max-imizing SLA profits. These systems incorporate analyt-ical models of CPU behavior; MBRP extends them toprovision for multiple resources including cluster mem-ory and storage throughput.

Several of these policy-based systems rely on resourcecontrol mechanisms—such as Resource Containers [9]or VMware ESX [36]— to allow performance-isolatedserver consolidation. Cluster Reserves [7] extends aserver resource control mechanism (Resource Contain-ers) to a cluster. These mechanisms are designed to en-force a provisioning policy, but they do not define thepolicy. Cluster Reserves adjusts shares on individualservers to bound aggregate usage for each hosted ser-

vice. It is useful for utilities that do not manage requesttraffic within the cluster. In contrast, our approach usesserver-level (rather than cluster-level) resource controlmechanisms in concert with redirecting switches.

The more recent Neptune [32] work proposes an alter-native to provisioning (or partitioning) cluster resources.Neptune maximizes an abstract SLA yield metric, simi-larly to other utility resource managers [12, 21, 22]. LikeSLAP, each server schedules requests locally to max-imize per-request yield. Neptune has no explicit pro-visioning; it distributes requests for all services evenlyacross all servers, and relies on local schedulers to max-imize global yield. While this approach is simple andfully decentralized, it precludes partitioning the clus-ter for software heterogeneity [5, 25], memory local-ity [14, 26], replica control [17, 31], or performance-aware storage placement [3].

Virtual Services [29] proposes managing services withmultiple tiers on different servers. This paper showshow MBRP coordinates provisioning of server and stor-age tiers; we believe that MBRP can extend to multi-tierservices.

Memory/storage management. Kelly [20] proposes aWeb proxy cache replacement scheme that considers theorigin server’s value in its eviction choices. Storage-Aware Caching [16] develops kernel-based I/O cachingheuristics that favor blocks from slower storage units.MBRP shares the goal of a differentiated caching ser-vice, but approaches it by provisioning memory sharesbased on a predictive model of cache behavior, ratherthan augmenting the replacement policy. MBRP sup-ports a wide range of system goals in a simple and directway, but its benefits are limited to applications for whichthe system has accurate models.

Several systems use application-specific knowledge tomanage I/O caching and prefetching. Patterson et.al. [28] uses application knowledge and a cost/benefitalgorithm to manage resources in a shared I/O cacheand storage system. Faloutsos [15] uses knowledge ofdatabase access patterns to predict the marginal benefitof memory to reduce I/O demands for database queries.

Hippodrome [3] automatically assigns workload compo-nents to storage units in a utility data center. Our workis complementary and uses a closely related approach.Hippodrome is model-based in that it incorporates de-tailed models of storage system performance, but it hasno model of the applications themselves; in particular,it cannot predict the effect of its choices on applicationservice quality. Hippodrome employs a backtracking as-signment planner (Ergastulum); we believe that a similarapproach could handle server assignment in a utility datacenter using our MBRP primitives.

Page 13: Model-Based Resource Provisioning in a Web Service Utility

7 Conclusion

Scalable automation of large-scale network services isa key challenge for computing systems research in thenext decade. This paper deals with a software frame-work and policies to manage network services as utilitieswhose resources are automatically provisioned and soldaccording to demand, much as electricity is today. Thiscan improve robustness and efficiency of large servicesby multiplexing shared resources rather than staticallyoverprovisioning each service for its worst case.

The primary contribution of this work is to demon-strate the potential of model-based resource provision-ing (MBRP) for resource management in a Web hostingutility. Our approach derives from established models ofWeb service behavior and storage service performance.We show how MBRP policies can adapt to resourceavailability, request traffic, and service quality targets.These policies leverage the models to predict the perfor-mance effects of candidate resource allotments, creatinga basis for informed, policy-driven resource allocation.MBRP is a powerful way to deal with complex resourcemanagement challenges, but it is only as good as themodels. The model-based approach is appropriate forutilities running a small set of large-scale server applica-tions whose resource demands are a function of load andstable, observable characteristics.

Acknowledgments

Darrell Anderson and Sara Sprenkle contributed to the ideasin this paper, and assisted with some experiments. We thankMike Spreitzer, Laura Grit, and the anonymous reviewers fortheir comments, which helped to improve the paper.

References[1] T. F. Abdelzaher, K. G. Shin, and N. Bhatti. Per-

formance guarantees for Web server end-systems: Acontrol-theoretical approach. IEEE Transactions on Par-allel and Distributed Systems, June 2001.

[2] D. C. Anderson and J. S. Chase. Fstress: A flexible net-work file service benchmark. Technical Report CS-2002-01, Duke University, Department of Computer Science,January 2002.

[3] E. Anderson, M. Hobbs, K. Keeton, S. Spence, M. Uysal,and A. Veitch. Hippodrome: running circles around stor-age administration. In Proceedings of the First UsenixConference on File and Storage Technologies (FAST),January 2002.

[4] A. Andrzejak, M. Arlitt, and J. Rolia. Bounding the re-source savings of utility computing models. TechnicalReport HPL-2002-339, HP Labs, November 2002.

[5] K. Appleby, S. Fakhouri, L. Fong, G. Goldszmidt,M. Kalantar, S. Krishnakumar, D. Pazel, J. Pershing, and

B. Rochwerger. Oceano - SLA based management of acomputing utility. In Proceedings of the 7th IFIP/IEEEInternational Symposium on Integrated Network Man-agement, May 2001.

[6] M. Aron. Differentiated and Predictable Quality of Ser-vice in Web Server Systems. PhD thesis, Department ofComputer Science, Rice University, October 2000.

[7] M. Aron, P. Druschel, and W. Zwaenepoel. Clus-ter reserves: A mechanism for resource managementin cluster-based network servers. In Proceedings ofthe Joint International Conference on Measurement andModeling of Computer Systems (ACM SIGMETRICS2000), pages 90–101, June 2000.

[8] O. M. Asad. Dash: A direct-access Web server withdynamic resource provisioning. Master’s thesis, DukeUniversity, Department of Computer Science, December2002.

[9] G. Banga, P. Druschel, and J. C. Mogul. Resource Con-tainers: A new facility for resource management in serversystems. In Proceedings of the Third Symposium onOperating Systems Design and Implementation (OSDI),February 1999.

[10] P. Barford and M. E. Crovella. Generating representa-tive Web workloads for network and server performanceevaluation. In Proceedings of Performance ’98/ACMSIGMETRICS ’98, pages 151–160, June 1998.

[11] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker.Web caching and Zipf-like distributions: Evidence andimplications. In Proceedings of IEEE Infocom ’99,March 1999.

[12] J. S. Chase, D. C. Anderson, P. N. Thakar, A. M. Vahdat,and R. P. Doyle. Managing energy and server resourcesin hosting centers. In Proceedings of the 18th ACM Sym-posium on Operating System Principles (SOSP), pages103–116, October 2001.

[13] M. DeBergalis, P. Corbett, S. Kleiman, A. Lent,D. Noveck, T. Talpey, and M. Wittle. The Direct AccessFile System. In Proceedings of the 2nd USENIX Con-ference on File and Storage Technologies (FAST), March2003.

[14] R. P. Doyle, J. S. Chase, S. Gadde, and A. M. Vah-dat. The trickle-down effect: Web caching and server re-quest distribution. Computer Communications: SelectedPapers from the Sixth International Workshop on WebCaching and Content Delivery (WCW), 25(4):345–356,March 2002.

[15] C. Faloutsos, R. T. Ng, and T. K. Sellis. Flexible andadaptable buffer management techniques for databasemanagement systems. IEEE Transactions on Computers,44(4):546–560, April 1995.

[16] B. C. Forney, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Storage-aware caching: Revisiting caching forheterogeneous storage systems. In Proceedings of theFirst Usenix Conference on File and Storage Technolo-gies (FAST), January 2002.

Page 14: Model-Based Resource Provisioning in a Web Service Utility

[17] S. D. Gribble, E. A. Brewer, J. M. Hellerstein, andD. Culler. Scalable, distributed data structures for In-ternet service construction. In Proceedings of the FourthSymposium on Operating Systems Design and Implemen-tation (OSDI), October 2000.

[18] W. Jin, J. S. Chase, and J. Kaur. Proportional sharing for astorage utility. Technical Report CS-2003-02, Duke Uni-versity, Department of Computer Science, January 2003.

[19] M. F. Kaashoek, D. R. Engler, G. R. Ganger, H. M.Briceno, R. Hunt, D. Mazieres, T. Pinckney, R. Grimm,J. Jannotti, and K. Mackenzie. Application performanceand flexibility on Exokernel systems. In Proceedings ofthe 16th ACM Symposium on Operating Systems Princi-ples, October 1997.

[20] T. P. Kelly, S. Jamin, and J. K. MacKie-Mason. VariableQoS from shared Web caches: User-centered design andvalue-sensitive replacement. In Proceedings of the MITWorkshop on Internet Service Quality Economics (ISQE99), December 1999.

[21] R. Levy, J. Nagarajarao, G. Pacifici, M. Spreitzer,A. Tantawi, and A. Youssef. Performance managementfor cluster-based Web services. In Proceedings of the 8thInternational Symposium on Integrated Network Man-agement (IM 2003), March 2003.

[22] Z. Liu, M. S. Squillante, and J. L. Wolf. On maximiz-ing service-level-agreement profits. In Proceedings ofthe ACM Conference on Electronic Commerce (EC’01),October 2001.

[23] C. R. Lumb, A. Merchant, and G. A. Alvarez. Facade:Virtual storage devices with performance guarantees. InProceedings of the 2nd Usenix Conference on File andStorage Technologies (FAST), March 2003.

[24] K. Magoutis, S. Addetia, A. Fedorova, M. Seltzer,J. Chase, R. Kisley, A. Gallatin, R. Wickremisinghe, andE. Gabber. Structure and performance of the Direct Ac-cess File System. In USENIX Technical Conference,pages 1–14, June 2002.

[25] J. Moore, D. Irwin, L. Grit, S. Sprenkle, and J. Chase.Managing mixed-use clusters with Cluster-on-Demand.Technical report, Duke University, Department of Com-puter Science, November 2002.

[26] V. S. Pai, M. Aron, G. Banga, M. Svendsen, P. Druschel,W. Zwaenopoel, and E. Nahum. Locality-aware requestdistribution in cluster-based network servers. In Proceed-ings of the Eighth International Conference on Architec-tural Support for Programming Languages and Operat-ing Systems (ASPLOS VIII), October 1998.

[27] V. S. Pai, P. Druschel, and W. Zwaenepoel. Flash: Anefficient and portable Web server. In Proceedings of the1999 Annual Usenix Technical Conference, June 1999.

[28] R. H. Patterson, G. A. Gibson, E. Ginting, D. Stodolsky,and J. Zelenka. Informed prefetching and caching. InProceedings of the Fifteenth ACM Symposium on Operat-ing Systems Principles, pages 79–95, Copper Mountain,CO, December 1995. ACM Press.

[29] J. Reumann, A. Mehra, K. G. Shin, and D. Kandlur. Vir-tual Services: A new abstraction for server consolidation.In Proceedings of the Usenix 2000 Technical Conference,June 2000.

[30] J. Rolia, S. Singhal, and R. Friedrich. Adaptive Inter-net data centers. In Proceedings of the InternationalConference on Advances in Infrastructure for ElectronicBusiness, Science, and Education on the Internet (SSGRR’00), July 2000.

[31] Y. Saito, B. N. Bershad, and H. M. Levy. Manageabil-ity, availability and performance in Porcupine: A highlyscalable cluster-based mail service. In Proceedings of the17th ACM Symposium on Operating Systems Principles(SOSP), pages 1–15, Kiawah Island, December 1999.

[32] K. Shen, H. Tang, T. Yang, and L. Chu. Integrated re-source management for cluster-based Internet services.In Proceedings of the 5th Symposium on Operating Sys-tems Design and Implementation (OSDI ’02), December2002.

[33] P. Shenoy and H. M. Vin. Cello: A disk schedulingframework for next-generation operating systems. InProceedings of the International Conference on Mea-surement and Modeling of Computer Systems (SIGMET-RICS), June 1998.

[34] B. Urgaonkar, P. Shenoy, and T. Roscoe. Resource over-booking and application profiling in shared hosting plat-forms. In Proceedings of the 5th Symposium on Oper-ating Systems Design and Implementation (OSDI ’02),December 2002.

[35] A. Venkataramani, P. Yalagandula, R. Kokku, S. Sharif,and M. Dahlin. The potential costs and benefits oflong-term prefetching for content distribution. ComputerCommunications: Selected Papers from the Sixth Interna-tional Workshop on Web Caching and Content Delivery(WCW), 25(4):367–375, March 2002.

[36] C. A. Waldspurger. Memory resource management inVMware ESX server. In Proceedings of the 5th Sympo-sium on Operating Systems Design and Implementation(OSDI ’02), December 2002.

[37] M. Welsh, D. Culler, and E. Brewer. SEDA: An architec-ture for well-conditioned, scalable Internet services. InProceedings of the Eighteenth Symposium on OperatingSystems Principles (SOSP-18), Banff, Canada, October2001.

[38] A. Wolman, G. Voelker, N. Sharma, N. Cardwell, A. Kar-lin, and H. Levy. On the scale and performance of coop-erative Web proxy caching. In Proceedings of the 17thACM Symposium on Operating Systems Principles, De-cember 1999.

[39] H. Zhu, H. Tang, and T. Yang. Demand-driven servicedifferentiation in cluster-based network servers. In Pro-ceedings of IEEE Infocom 2001, April 2001.