utility driven optimization of real time data …iray/research/papers/asoc12.pdfutility driven...

23
Utility Driven Optimization of Real Time Data Broadcast Schedules Rinku Dewri a,* , Indrakshi Ray b , Indrajit Ray b , Darrell Whitley b a Department of Computer Science, University of Denver, Denver, CO 80208, USA b Department of Computer Science, Colorado State University, Fort Collins, CO 80523, USA Abstract Data dissemination in wireless environments is often accomplished by on-demand broadcasting. The time critical nature of the data requests plays an important role in scheduling these broadcasts. Most research in on-demand broadcast scheduling has focused on the timely servicing of requests so as to minimize the number of missed deadlines. However, there exists many environments where the utility of the received data is an equally important criterion as its timeliness. Missing the deadline may reduce the utility of the data but does not necessarily make it zero. In this work, we address the problem of scheduling real time data broadcasts with such soft deadlines. We investigate search based optimization techniques to develop broadcast schedulers that make explicit attempts to maximize the utility of data requests as well as service as many requests as possible within an acceptable time limit. Our analysis shows that heuristic driven methods for such problems can be improved by hybridizing them with local search algorithms. We further investigate the option of employing a dynamic optimization technique to facilitate utility gain, thereby eliminating the requirement of a heuristic in the process. An evolution strategy based stochastic hill-climber is investigated in this context. Keywords: broadcast scheduling, data utility, local search, evolution strategy, quality of service 1. Introduction Wireless computing involves a network of portable computing devices thoroughly embedded in our day-to-day work and personal life. The de- vices interact with each other and with other com- puting nodes by exchanging rapid and continuous streams of data. To facilitate almost impercepti- ble human-computer interaction, data access times in such environments must be maintained within a specified quality-of-service (QoS) level. Challenges in doing so arise from the fact that wireless band- width is typically a limited resource, and thus it is not always possible to meet the quality require- ments of every device. This constraint not only makes wireless data access a challenging problem, but also identifies “optimal resource allocation” as one of the major research problems in this domain. * Corresponding author: Tel. +1-303-871-4189; Fax. +1- 303-871-3010 Email addresses: [email protected] (Rinku Dewri), [email protected] (Indrakshi Ray), [email protected] (Indrajit Ray), [email protected] (Darrell Whitley) A wireless environment encompasses both peer- to-peer and client-server modes of data dissemina- tion. For example, a typical pervasive health care system may involve multiple sensor nodes dissem- inating data on the monitored vital signs of a pa- tient to a personal digital assistant carried by the attending health care personnel. Data communi- cation follows a peer-to-peer architecture in such a setting. On the other hand, a wireless environment designed to serve queries on flight information in an airport is based on a client-server mode of com- munication. Flight information usually reside in a database server from where data is disseminated based on the incoming queries. For an environment like an airport, it is appropriate to assume that the database will be queried more frequently for cer- tain types of data. Similar situations can be imag- ined in a stock trading center, a pervasive traffic management system, or a pervasive supermarket. Such scenarios open up possibilities of adopting a broadcast based architecture to distribute data in a way that multiple queries for the same data item get served by a single broadcast. The focus of this Preprint submitted to Elsevier May 6, 2011

Upload: tranthuan

Post on 13-Mar-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Utility Driven Optimization of Real Time Data …iray/research/papers/asoc12.pdfUtility Driven Optimization of Real Time Data Broadcast Schedules ... The system gathers real time traffic

Utility Driven Optimization of Real Time Data Broadcast Schedules

Rinku Dewria,!, Indrakshi Rayb, Indrajit Rayb, Darrell Whitleyb

aDepartment of Computer Science, University of Denver, Denver, CO 80208, USAbDepartment of Computer Science, Colorado State University, Fort Collins, CO 80523, USA

Abstract

Data dissemination in wireless environments is often accomplished by on-demand broadcasting. The timecritical nature of the data requests plays an important role in scheduling these broadcasts. Most researchin on-demand broadcast scheduling has focused on the timely servicing of requests so as to minimize thenumber of missed deadlines. However, there exists many environments where the utility of the receiveddata is an equally important criterion as its timeliness. Missing the deadline may reduce the utility of thedata but does not necessarily make it zero. In this work, we address the problem of scheduling real timedata broadcasts with such soft deadlines. We investigate search based optimization techniques to developbroadcast schedulers that make explicit attempts to maximize the utility of data requests as well as service asmany requests as possible within an acceptable time limit. Our analysis shows that heuristic driven methodsfor such problems can be improved by hybridizing them with local search algorithms. We further investigatethe option of employing a dynamic optimization technique to facilitate utility gain, thereby eliminating therequirement of a heuristic in the process. An evolution strategy based stochastic hill-climber is investigatedin this context.

Keywords: broadcast scheduling, data utility, local search, evolution strategy, quality of service

1. Introduction

Wireless computing involves a network ofportable computing devices thoroughly embeddedin our day-to-day work and personal life. The de-vices interact with each other and with other com-puting nodes by exchanging rapid and continuousstreams of data. To facilitate almost impercepti-ble human-computer interaction, data access timesin such environments must be maintained within aspecified quality-of-service (QoS) level. Challengesin doing so arise from the fact that wireless band-width is typically a limited resource, and thus itis not always possible to meet the quality require-ments of every device. This constraint not onlymakes wireless data access a challenging problem,but also identifies “optimal resource allocation” asone of the major research problems in this domain.

!Corresponding author: Tel. +1-303-871-4189; Fax. +1-303-871-3010

Email addresses: [email protected] (Rinku Dewri),[email protected] (Indrakshi Ray),[email protected] (Indrajit Ray),[email protected] (Darrell Whitley)

A wireless environment encompasses both peer-to-peer and client-server modes of data dissemina-tion. For example, a typical pervasive health caresystem may involve multiple sensor nodes dissem-inating data on the monitored vital signs of a pa-tient to a personal digital assistant carried by theattending health care personnel. Data communi-cation follows a peer-to-peer architecture in such asetting. On the other hand, a wireless environmentdesigned to serve queries on flight information inan airport is based on a client-server mode of com-munication. Flight information usually reside in adatabase server from where data is disseminatedbased on the incoming queries. For an environmentlike an airport, it is appropriate to assume that thedatabase will be queried more frequently for cer-tain types of data. Similar situations can be imag-ined in a stock trading center, a pervasive tra!cmanagement system, or a pervasive supermarket.Such scenarios open up possibilities of adopting abroadcast based architecture to distribute data ina way that multiple queries for the same data itemget served by a single broadcast. The focus of this

Preprint submitted to Elsevier May 6, 2011

Page 2: Utility Driven Optimization of Real Time Data …iray/research/papers/asoc12.pdfUtility Driven Optimization of Real Time Data Broadcast Schedules ... The system gathers real time traffic

paper is directed towards a combinatorial problemthat arises in this approach.

1.1. An Example

Quality of service (QoS) is an important facet inwireless data access. Consider the following exam-ple application - a tra!c management system for abig city. The city government implements a traf-fic pricing policy for all vehicles on all roads basedon factors such as the distance traveled, the typeof road traveled on, the time of day, vehicle cat-egory and customer type (for example, special feepaying, traveling salesman, paramedic on-call, etc.)The system gathers real time tra!c data via thou-sands of sensors, such as tra!c cameras, magneto-resistive tra!c sensors and radars spread through-out the city. To help drivers optimize their travelcosts, the tra!c management system provides vari-ous routing services to drivers to avoid roadblocks,construction delays, congestion, accidents etc1.

A driver requests and gets routing services usingsmart GPS equipped devices as follows. A deviceperiodically uploads the vehicle’s current location(based on GPS information), intended destination,time willing to spend for traveling to destination,a prioritized list of routing restrictions (for exam-ple, waypoints that the driver would like to visitif possible, rest stops, scenic byways, etc.), vehiclecategory and customer type, and current speed. Inresponse, the server replies with a set of route infor-mation. Each route information contains, amongother things, information about road segments totravel on this route and tra!c patterns on theseroad segments. The GPS equipped device uses thisinformation and uses other parameters set by thedriver to compute a good route to take. Note thatsince the response to many drivers will no doubtcontain overlapping route segments and tra!c in-formation, it makes sense broadcasting this data.

The requests arrive at the server with variouspriority levels and soft deadlines. For example, ahigher fee paying customer gets a higher prioritythan a lesser fee paying customer. A VIP’s convoymay get a still higher priority. Emergency respon-ders get the highest priority. A routing request thatcomes associated with a set of routing restrictions,for example, a list of waypoints that the driver

1Note that such an application is not very far-fetched.The Dutch government is in the process of introducing a sim-ilar electronic tra!c management system and pricing model.See [1].

would like to visit en route, automatically gets as-sociated with a set of soft deadlines based on thespeed of the driver. The requests from di"erentdrivers may also get re-prioritized at the server soas to meet a broader goal of reducing congestionand enabling smoother tra!c flows.

In this example, the di"erent data that the serverneeds to serve is associated with di"erent utility val-ues. Owing to the dynamic nature of the utility ofresponses to queries, the time criticality factor can-not be ignored altogether when disseminating data.The server would like to satisfy as many queries in atimely manner as possible. However, some queriesmay be delayed beyond their specified deadlines (forexample, the query from the VIP’s convoy). Theusers, who hardly realize the broader goals of thetra!c management system and various bottlenecksin the information infrastructure, would like to havetheir requests served at the earliest; however, it isreasonable to assume that delayed data still pro-vide some utility even if received after a specifieddeadline. For example, a delayed route informationmay prevent a driver from visiting a particular way-point en route or may require the driver to use thenext gas station. Nonetheless, it may still allow thedriver to choose a good route to the destination.An assumption of reduced data utility in the faceof missed deadline enables data broadcasts to betailored in such a way that total utility associatedwith a broadcast is maximized. This helps maintaina certain QoS level in the underlying infrastructure.

Note that the specific problem we are trying toaddress is by no means restricted only to the ex-ample application outlined above. Similar scenar-ios are witnessed in environments such as stock ex-changes. Here stock brokers on the floor seek andreceive periodic market data on wireless hand-helddevices and notebooks and analyze them to deter-mine which stocks are rewarding. They also buyand sell stocks using such devices. Popularity ofstocks changes throughout the day, and it is im-portant to analyze such trends along multiple di-mensions in order to buy and sell stocks. Thus,although queries from brokers are implicitly associ-ated with deadlines, these can be considered soft.The overall goal of the stock exchange still remainsto satisfy as many requests as possible in a timelymanner. To wrap up the scope of the problem do-main we would like to point out that in order tomake useful decisions, the stock broker may makea request for a group of data from the stock ex-change (for example, stock prices of three di"erent

2

Page 3: Utility Driven Optimization of Real Time Data …iray/research/papers/asoc12.pdfUtility Driven Optimization of Real Time Data Broadcast Schedules ... The system gathers real time traffic

oil companies). A typical constraint on such a re-quest is that all these data items must be received(not necessarily in any particular order) before thestock broker can perform the relevant local anal-ysis of the market trends. Such a request can beconsidered as a transactional request (or simply atransaction). For scheduling in such cases, addi-tional constraints must be placed for ensuring thevalidity of data received.

1.2. Contributions

Wireless broadcast mechanisms have been exten-sively investigated earlier. However, very few ofthe proposed approaches give attention to the e"ec-tive utility involved in the timely servicing of a re-quest. Time criticality has been earlier addressed ina number of contexts with the assumption of a harddeadline [2, 3, 4]. Broadcast scheduling in theseworks mostly focus on the timely servicing of a re-quest to minimize the number of missed deadlines.When the assumption of a soft deadline is appro-priate, a broadcast schedule should not only try toserve as many requests as possible, but also makea “best e"ort” in serving them with higher utilityvalues. Often, heuristics are employed in a dy-namic setting to determine these schedules. How-ever, their designs do not involve the QoS criteriaexplicitly. Heuristic based methods make local de-cisions w.r.t. a request or a broadcast, and oftenfail to capture the sought global QoS requirement.Much of this is due to the fact that real time de-cision making cannot span beyond an acceptabletime limit, thereby restricting the usage of “fulllength” optimization techniques to the domain. Itdoes become imperative to design hybrid strategiesthat can combine the fast real time performance ofheuristics and the better solution qualities obtainedfrom search based optimization.

Our contributions in this paper are summarizedin the following points.

1. We explore data broadcasting as a combinato-rial problem defined with an objective of utilitymaximization. This utility driven optimizationof broadcasts adds more value to a wireless sys-tem since the energy cycles used by the deviceswhile retrieving a data item will presumablynot be wasteful.

2. We provide an extensive analysis of the searchspace involved in the problem and report theperformance of multiple heuristics on a set ofstatistically generated synthetic requests. We

argue that traditional heuristics usually gener-ate solutions in a sub-optimal part of this spacewith respect to a given global utility measure-ment.

3. We empirically demonstrate that “local search”can be used in real-time to boost the perfor-mance of naive heuristics such as EDF, withperformance levels often at par with otherelaborate heuristics. In addition, we pro-pose an evolution strategy based light-weightstochastic hill-climber that explicitly searchesthe space of schedules to maximize utility.Results demonstrate that, in certain problemtypes, this explicit search strategy can gener-ate higher objective values than a traditionalheuristic based approach.

4. We perform the analysis indicated in the abovepoints in the context of two problem types,namely at the data item level and at the trans-action level. In the former type, requests in-volve a single data item, while multiple un-ordered data items are involved in a requestof the latter type. Besides the local searchand evolution strategy based approaches, wepresent the performance of heuristics suchas EDF, HUF, RxW, SIN-!, NPRDS andASETS* on one or more of these problemtypes. Transaction level scheduling induces anexponentially large search space and has beenobserved to demonstrate very di"erent dynam-ics than a data item level problem.

The rest of the paper is organized as follows. Sec-tion 2 summarizes the related work in this domain.The broadcast model and the scheduling problemare discussed in Section 3. The explored solutionmethods are described in Section 4. Section 5 dis-cusses the scheduling problem at the transactionlevel. The experimental setup followed by resultsand discussions are summarized in Section 6 andSection 7, respectively. Finally, Section 8 concludesthe paper.

2. Related Work

Data broadcasting has been extensively studiedin the context of wireless communication systems.Su and Tassiulas [5] study the problem in the con-text of access latency and formulate a determinis-tic dynamic optimization problem, the solution towhich provides a minimum access latency schedule.Acharya and Muthukrishnan propose the stretch

3

Page 4: Utility Driven Optimization of Real Time Data …iray/research/papers/asoc12.pdfUtility Driven Optimization of Real Time Data Broadcast Schedules ... The system gathers real time traffic

metric [6] to account for di"erences in service timesarising in the case of variable sized data items.They propose the MAX heuristic to optimize theworst case stretch of individual requests. Aksoyand Franklin’s RxW heuristic [7] is an attempt tocombine the benefits of the MRF and FCFS heuris-tics, each known to give preference to popular andlong standing data items, respectively. Hameed andVaidya adapt a packet fair queuing algorithm tothis domain [8]. Lee et al. provide a survey of thesetechniques [9] and their applicability in the area ofpervasive data access.

The above mentioned algorithms ignore the timecriticality factor while serving data requests. Earlywork on time constrained data requests is presentedby Xuan et al. [10]. Jiang and Vaidya address theissue by considering broadcast environments whereclients do not wait indefinitely to get their requestsserved [2]. Lam et al. look at the time criticalityfactor from the perspective of temporal data va-lidity [11]. Fernandez and Ramamritham proposea hybrid broadcasting approach to minimize theoverall number of deadlines missed [12]. Tempo-ral constraints on data are revisited by Wu and Leewith the added complexity of request deadlines [13].Their RDDS algorithm assigns a priority level toeach requested data item and broadcasts the itemwith the highest priority. Xu et al. propose theSIN -! algorithm to minimize the request drop rate[4]. However, their approach does not take variabledata sizes into consideration. This motivates Lee etal. to propose the PRDS algorithm that takes intoaccount the urgency, data size and access frequen-cies of various data items [3].

Theoretical analysis on the hardness of the prob-lem has led to a number of conclusions in the recentfew years. In particular, Chang et al. show thatthe o#ine problem of minimizing the maximum re-sponse time is NP-complete [14]. In the real-timecase, the authors show that no deterministic algo-rithm can be better than 2-competitive. Similarly,broadcast scheduling with the objective of minimiz-ing the total response time, or with deadlines, hasalso been proved to be NP-complete.

Several shortcomings of using a strict deadlinebased system are discussed by Ravindran et al. [15]in the context of real-time scheduling and resourcemanagement. Jensen et al. point out that real-timesystems are usually associated with a value basedmodel which can be expressed as a function of time[16]. They introduce the idea of time-utility func-tions to capture the semantics of soft time con-

straints that are particularly useful in specifyingutility as a function of completion time. An at-tempt to understand the benefit of utility valuesin hard deadline scheduling algorithms is under-taken by Buttazzo et al. [17]. Wu et al. study atask scheduling problem where utility is considereda function of the start time of the task [18]. Utilitybased scheduling in data broadcasting is addressedby Dewri et al. in the context of pervasive envi-ronments [19]. However, the focus of the work islimited to requests for single data items only. Thework presented here provides extensions in the formof formulations catering to the scheduling problemwhen requests are made at the transaction level.

While the notion of a transaction have been ex-plored in the database community, its treatment inthe form of a collection of data items is rare in thebroadcasting domain. Chehadeh et al. take intoconsideration object graphs to cluster related ob-jects close to one another in the broadcast channel[20]. Hurson et al. propose an extension for multi-ple broadcast channels [21]. Huang et al. show thatseveral cases of the problem of broadcasting relateddata is NP-hard and propose a genetic algorithmto address the problem [22]. Guirguis et al. pro-pose the ASETS* algorithm to schedule workflowsthat is conceptually equivalent to scheduling mul-tiple data items with an ordering constraint [23].This work caters to a di"erent category of appli-cations where such precedence constraints are notalways available.

Multiple issues related to data broadcasting, suchas hard deadlines, request drop rate, data item size,and access frequencies, have been addressed in thiscollection of works. We also consider variable datasizes and their access frequencies in the simulationstudy, but do not measure the performance of analgorithm using the request drop rate. Instead, atime-utility function is used for this purpose.

3. Broadcast Scheduling

Wireless data broadcasting is an e!cient ap-proach to address data requests, particularly whensimilar requests are received from a large user com-munity. Broadcasting in such cases alleviate therequirement for repeated communication betweenthe server and the di"erent clients interested in thesame data item. Push-based architectures broad-cast commonly accessed data at regular intervals.Contrary to this, on-demand architectures allow

4

Page 5: Utility Driven Optimization of Real Time Data …iray/research/papers/asoc12.pdfUtility Driven Optimization of Real Time Data Broadcast Schedules ... The system gathers real time traffic

Data

Source

Request Queue

Schedule

OptimizerBroadcast

QueueBroadcast channel

Uplink channel

Data Access Point

Clients

Figure 1: Typical on-demand broadcast architecture in a wireless environment.

the clients to send their requests to the server. How-ever, access to the data item is facilitated througha broadcast which, for more frequently requesteddata, serves multiple clients at a time.

Data access in wireless environments can be mod-eled with an on-demand broadcast architecturewhere particular emphasis has to be paid to thetime criticality and utility of a served request. Thetime criticality factor emphasizes that the requesteddata is expected within a specified time window;failure to do so would result in an utility loss. Abroadcast schedule in such environments has tocater to the added requirement of maintaining ahigh utility value for a majority of the requests.

3.1. Broadcast Model

Fig. 1 shows a typical on-demand data broad-cast architecture in a wireless environment. Di"er-ent client devices use an uplink channel to a dataprovider to request various data items served by theprovider. The data items are assumed to reside lo-cally with the data provider. Each request Qj takesthe form of a tuple !Dj , Rj , Pj", where Rj is the re-sponse time within which the requesting client ex-pects the data item Dj and asserts a priority levelPj on the request. Although expected responsetimes and priorities can be explicitly specified bythe client, it is not a strict restriction. For exam-ple, the broadcast server can assign priorities andresponse time requirements to the scheduler basedon the current status (say a geographic location) ofthe client making the request. Note that a single

request involves only one data item. A client re-quiring multiple data items sends multiple requestsfor each data item separately. Requests from thesame client can be served in any order. The dataprovider reads the requests from a queue and in-vokes a scheduler to determine the order in whichthe requests are to be served. It is important to notethat new requests arrive frequently into the queuewhich makes the task of the scheduler rather dy-namic in nature. The scheduler needs to re-examinethe previously generated schedule to accommodatethe time critical requirements of any new request.Data dissemination is carried out through a singlechannel data access point. Clients listen to thischannel and consider a request to be served as soonas the broadcast for the corresponding item begins.

The underlying scheduler is invoked every timea new request is received. At each invocation, thescheduler first determines the requests that are be-ing currently served and removes them from therequest queue. The data items required to servethe remaining requests are then determined and aschedule is generated to serve them. The schedulertries to make a “best e"ort” at generating a sched-ule that respects the response time requirements ofthe requesting devices.

3.2. Utility Metric

Real-time systems are typically known to uti-lize the abstraction of hard deadlines to specifytime constraints during resource management. Aspointed out in [15], such abstractions result in vari-

5

Page 6: Utility Driven Optimization of Real Time Data …iray/research/papers/asoc12.pdfUtility Driven Optimization of Real Time Data Broadcast Schedules ... The system gathers real time traffic

ous drawbacks while making a scheduling decision.The most prominent of all is the domino e"ect dur-ing overload conditions [24] where a hard deadlinebased scheduler would start favoring jobs that arenear to missing their deadlines. Depending on thecompletion time of such favored jobs, a situationcan arise where the scheduler is no longer (or fora majority) able to schedule jobs that satisfy theirdeadlines. Further, hard deadlines are usually bi-nary in nature – they are either met or not met.This prohibits any distinction to be made on thequality of schedules that di"er primarily on theamount of delay introduced in meeting the timeconstraints. Note that such distinctions are im-portant for a scheduler incorporating some form ofsearch to find better schedules. Hard deadlines alsofail to capture any non-linear time constraint wherethe utility of completing a job is related to the timeof completion, as has been pointed out earlier in themotivating example.

Utility functions overcome these problems byadapting their behavior in times of overload sce-narios. In underload conditions, utility functionsdrive schedules towards urgency satisfaction (meet-ing deadlines) as the highest utility is typically as-sociated with the time instances within the dead-line. When an overload occurs (deadlines are di!-cult to meet), a summed utility value satisfying anoptimality criteria o"ers a schedule that favors jobsthat are more important to be completed over thosewhich are more urgent, in that they can generatehigher utility. In fact, strict deadline based schedul-ing incorporates a very restricted formulation of anutility function.

The utility of a data item broadcast is measuredfrom the response time and priority level of the re-quests served by it. The response time rj of a re-quest Qj arriving at time tj,arr and served at timetj,ser is given by (tj,ser # tj,arr). As in [16], we as-sume that the utility of a data item received by aclient decreases exponentially if not received withinthe expected response time (Fig. 2). For a given re-quest Qj , the utility generated by serving it withina response time rj is given as,

uj =

!

Pj , rj $ Rj

Pje"!j(rj"Rj) , rj > Rj

(1)

The utility of broadcasting a data item d is thengiven as,

Ud ="

j|d serves Qj

uj (2)

P

R 2R

j

j j

0.5Pj

3Rj

0.25Pj

Response time

Utility

Figure 2: Utility of serving a request.

For a given schedule S that broadcasts the dataitems D1, D2, . . . , Dk in order, the utility of theschedule is given as,

US =Dk"

d=D1

Ud (3)

In this work, we assume that the utility of a dataitem for a client decays by half for every factor ofincrease in response time, i.e. !j = ln 0.5/Rj .

3.3. Problem Statement

A data source D is a set of N data items,{D1, D2, . . . , DN}, with respective sizes d1, d2, . . . ,dN . A request queue at any instance is a dynamicqueue Q with entries Qj of the form !Dj , Rj , Pj",where Dj % D, and Rj , Pj & 0. At an instancetcurr, let Q1, Q2, . . . , QM be the entries in Q. Aschedule S at an instance is a total ordering of theelements of the set

#

j=1,...,M{Dj}.

In the context of the broadcast scheduling prob-lem, the request queue Q corresponds to the queueleft after all requests currently being served are re-moved. Note that two di"erent entries Qj and Qk

in Q may be interested in the same data item. How-ever, it is important to realize that the current in-stance of the scheduler only needs to schedule a sin-gle broadcast for the data item. The arrival timeof all requests in Q is at most equal to the currenttime, tcurr, and the scheduled broadcast time t forthe data item in Q will be tcurr at the earliest. Aschedule is thus a total ordering of the unique ele-ments in all data items requested.

The time instance at which a particular data itemfrom the schedule starts to be broadcast depends onthe bandwidth of the broadcast channel. A broad-cast channel of bandwidth b can transmit b dataunits per time unit. If tready is the ready time of the

6

Page 7: Utility Driven Optimization of Real Time Data …iray/research/papers/asoc12.pdfUtility Driven Optimization of Real Time Data Broadcast Schedules ... The system gathers real time traffic

channel (maximum of tcurr and the end time of cur-rent broadcast), then for the schedule D1 < D2 <. . . < Dk, the data item Di starts to be broadcastat time instance tDi

= tready +$i"1

j=1(dj/b). All re-quests in Q for the data item Di is then assumed tobe served, i.e. tj,ser for such requests is set to tDi

.We explicitly mention this computation to point outthat the utility metric involves the access time, andnot the tuning time, of a request. The access timeis the time elapsed from the moment a client issuesa request to the moment when it starts to receivethe requested data item. The tuning time is thetime the client has to actively scan the broadcastchannel to receive the data item.

While Eq. (3) can be used to measure the in-stantaneous utility of a schedule, it is not suitablein determining the performance level of a solutionmethod in a dynamic setting. We thus use the util-ity generated from the queries already served as theyardstick to compare performance. In other words,the performance of a solution methodology at an in-stance where queries Q1, . . . , QK are already servedis measured by

$Kj=1 uj . In e"ect, we are interested

in the global utility generated by the method. Theaforementioned QoS criteria could be a specificationin terms of this global utility. The objective behindthe design of a solution methodology is to maximizethis global utility measured at any instance.

4. Solution Methodology

An implicit constraint in real time data broad-casting is the time factor involved in making ascheduling decision. Data requests usually arrivemore frequently than they can be served, whichin turn leads to the generation of a long requestqueue. Any scheduling method must be fast enoughto generate a good schedule without adding muchto the access time of requests. Latencies inducedbetween broadcasts because of the scheduling timeis also a detrimental factor to resource utilization.Heuristics are often used as fast decision makers,but may result in degraded solution quality. Hy-brid approaches in this context can provide a suit-able trade-o" between solution quality and decisiontime.

Operating systems employ a range of heuristicsfor various resource scheduling tasks. However,these scheduling tasks have some inherent di"er-ences with data broadcast scheduling. For instance,schedulers used for CPU scheduling assume that ascheduled entity will have exclusive access to the

processor. This assumption is not valid in databroadcasting since a scheduled data item can verywell serve multiple requests. Most schedulers alsodo not incorporate time-criticality constraints intheir design. Algorithms such as C-SCAN ensuresthat all I/O requests are eventually served, but doesnot consider the priority of a request. The operat-ing system has to be policy driven to enforce thepriority requirements.

4.1. Heuristics

For the purpose of this study, we use two tradi-tional heuristics – Earliest Deadline First (EDF)and Highest Utility First (HUF) – which takes intoaccount the time critical nature of a data request.Further, we experiment with three state-of-the-artheuristics proposed for deadline based scheduling.

4.1.1. EDF and HUFEDF starts by first scheduling the data item

that corresponds to a request with the minimumtcurr#(tarr+Rj). All requests in Q that get servedby this broadcast are removed (all requests for thescheduled data item) and the process is repeated onthe remaining requests. For multiple channels, theheuristic can be combined with best local earlinessto map the data item to a channel that becomesready at the earliest. EDF gives preference to dataitems that have long been awaited by some request,thereby having some level of impact on the utilitythat can be generated by the requests. However, itdoes not take into account the actual utility gener-ated. HUF alleviates this problem by first consid-ering the data item that can generate the highestamount of utility. The strategy adopted by HUFmay seem like a good approach particularly whenthe overall utility is the performance metric. How-ever, HUF generated schedules may not be flexi-ble enough in a dynamic environment. For exam-ple, if the most requested data item in the currentrequest queue generates the highest utility, HUFwould schedule the item as the next broadcast. Ifthis data item requires a high broadcast time, notonly will the subsequent broadcasts in the schedulesu"er in utility, but new requests will also have towait for a long time before getting served.

4.1.2. RxWThe RxW heuristic [7] schedules data items in de-

creasing order of their R'W values, where R is thenumber of pending requests for a data item and W

7

Page 8: Utility Driven Optimization of Real Time Data …iray/research/papers/asoc12.pdfUtility Driven Optimization of Real Time Data Broadcast Schedules ... The system gathers real time traffic

is the time for which the oldest pending request forthe data item has been waiting. Such a broadcastmechanism gives preference to data items which areeither frequently requested or has not been broad-cast in a long time (with at least one request waitingfor it). This approach aims at balancing popularityand accumulation of requests for unpopular items.Hence, although the heuristic does not have any im-plicit factor that considers the deadline of requests,deadlines can be met by not keeping a request toolong in the request queue.

4.1.3. SIN-!

The SIN-! (Slack time Inverse Number of pend-ing requests) heuristic [4] uses the following two ar-guments.

• Given two items with the same number of pend-ing requests, the one with a closer deadlineshould be broadcast first to reduce request droprate;

• Given two items with the same deadline, theone with more pending requests should bebroadcast first to reduce request drop rate.

Based on these two arguments, the sin.! value fora data item (requested for by at least one client) isgiven as

sin.! =slack

num!=

1stDeadline# clock

num!

where slack represents the urgency factor, given asthe duration from the current time (clock) to theabsolute deadline of the most urgent outstandingrequest (1stDeadline) for the data item, and num(& 1) represents the productivity factor, given asthe number of pending requests for the data item.With sin.! values assigned to the data items, theschedule is created in increasing order of the sin.!values. The ! value in our experiments is set at 2based on overall performance assessment presentedin the original work. We shall henceforth refer tothis heuristic as SIN2.

4.1.4. NPRDS

Apart from the two arguments provided for SIN-!, the PRDS (Preemptive Request count, Deadline,Size) heuristic [3] incorporates a third factor basedon data sizes.

• Given two data items with the same deadlineand number of pending requests, the one witha smaller data size should be broadcast first.

We modify PRDS into a non-preemptive versionand call it NPRDS. The heuristic works by first as-signing a priority value to each data item (with atleast one request for it), given as R

dln#s, where R is

the number of pending requests for the data item,dln is the earliest feasible absolute deadline (thebroadcast of the item can serve the request beforeits deadline) of outstanding requests for the dataitem, and s is the size of the item. The NPRDSschedule is generated in decreasing order of the pri-ority values. The heuristic aims at providing a fairtreatment to di"erent data sizes, access frequencyof data items and the deadline of requests.

4.2. Heuristics with Local Search

As mentioned earlier, one goal of this work isto understand the performance of heuristics whencoupled with local search techniques. We thereforeintroduce some amount of local search to improvethe schedules generated by the heuristics. The no-tion of local search in this context involves changingthe generated schedules by a small amount and ac-cept it as the new schedule if an improvement inthe utility of the schedule is obtained. The pro-cess is repeated for a pre-specified number of iter-ations. Such a “hill-climbing” approach is expectedto improve the utility of the schedule generated bya heuristic. We employ the 2-exchange operatorto search a neighborhood of the current schedule.The operator randomly selects two data items andswaps their positions in the schedule (Fig. 3). Thenotations EDF/LS and HUF/LS denote EDF andHUF coupled with local search, respectively. Intu-itively, these hybrid approaches should provide suf-ficient improvements over the heuristic schedules,w.r.t. the performance metric, as the local searchwould enable the evaluation of the overall sched-ule utility, often ignored when using the heuristicsalone.

Old Schedule : a b c d e f g h i j

Swap Pts : * *

New Schedule : a e c d b f g h i j

Figure 3: 2-exchange operator example.

8

Page 9: Utility Driven Optimization of Real Time Data …iray/research/papers/asoc12.pdfUtility Driven Optimization of Real Time Data Broadcast Schedules ... The system gathers real time traffic

4.3. (2 + 1)-ES

Evolution Strategies (ES) [25, 26] are a class ofstochastic search methods based on computationalmodels of adaptation and evolution. They are typ-ically expressed by the µ and " parameters signi-fying the parent and the child population, respec-tively. Whereas the algorithmic formulation of evo-lution strategies remains the same as that of a ge-netic algorithm [27], two basic forms have been de-fined for them. In the (µ + ")-ES, µ best of thecombined parent and o"spring generations are re-tained using truncation selection. In the (µ,")-ESvariant, the µ best of the " o"spring replace theparents. These definitions are analogous to thatof the steady-state and generational variants of ge-netic algorithms. The steady-state variant of ge-netic algorithms explicitly maintains the best so-lution found so far, while the generational variantblindly replaces the current population with the o"-spring population generated.

For our experimentation, we employ the (µ+ ")-ES variant with µ = 2 and " = 1. This simple formof the ES is chosen to keep the dynamic schedul-ing time within an acceptable limit without sacri-ficing on the solution quality. Also, a (2 + 1)-EScan be seen as a type of greedy stochastic localsearch. It is stochastic because there is no fixedneighborhood and therefore the neighborhood doesnot define a fixed set of local optima. Otherwise,the method is very much a local search technique– sample the neighborhood and move to the bestpoint. The pseudo code for the algorithm is givenbelow.

Step 1: Generate two initial solutions x and y andevaluate them.

Step 2: Recombine x and y to generate an o"-spring.

Step 3: Mutate the o"spring with probability p.

Step 4: Evaluate the o"spring.

Step 5: Replace x and y by the two best solutionsfrom x, y and the o"spring.

Step 6: Goto Step 2 until termination criteria ismet.

4.3.1. Solution encoding and evaluationFor the data broadcasting problem mentioned in

the previous section, a typical schedule can be rep-resented by a permutation of the unique data item

numbers in Q. Thus, for n unique data items, thesearch spans over a space of n! points. The eval-uation of a solution involves finding the utility ofthe represented schedule as given by Eq. (3). Thehigher the utility, the better is the solution.

4.3.2. Syswerda’s recombination operatorRecombination operators for permutation prob-

lems di"er from usual crossover operators in theirability to maintain the uniqueness of entries inthe o"spring produced. For a schedule encodedas a permutation, it is desired that recombina-tion of two schedules does not result in an invalidschedule. A number of permutation based opera-tors have been proposed in this context [28]. Weemploy Syswerda’s recombination operator in thisstudy. The operator is particularly useful in con-texts where maintaining the relative ordering be-tween entries is more critical than their adjacency.

x : a b c d e f g h i j

y : c f a j h d i g b e

Key Pos. : * * * *

Offspring: a j c d e f g h i b

Figure 4: Syswerda’s recombination operator.

The operator randomly selects several key posi-tions and the order of appearance of the elementsin these positions are imposed from one solution tothe other. In Fig. 4, the entries at the four keypositions from y, {a, j, i, b}, are rearranged in x tomatch the order of occurrence in y. The o"springis x with the rearranged entries.

4.3.3. Mutation using insertionAn elementary mutation operator defines a cer-

tain neighborhood around a solution which in turndictates the number of states which can be reachedfrom the parent state in one step [25]. Insertionbased mutation selects two random positions in asequence and the element at the first chosen posi-tion is migrated to appear after the element in thesecond chosen position (Fig. 5).

Parent : a b c d e f g h i j

Mutate Pts: * *

Offspring : a c d e b f g h i j

Figure 5: Mutation using the insertion operator.

9

Page 10: Utility Driven Optimization of Real Time Data …iray/research/papers/asoc12.pdfUtility Driven Optimization of Real Time Data Broadcast Schedules ... The system gathers real time traffic

4.3.4. Initialization and termination

The initial solutions determine the startingpoints in the permutation space where the searchbegins. Thus, a good solution produced by EDF orHUF could be a choice for the purpose. However,we do not want to add the complexity of determin-ing an EDF or HUF schedule to the search process,and hence randomly generate the initial solutionsx and y. The ES algorithm is terminated after afixed number of iterations. If schedules generatedby other heuristics are taken as initial solutions, thetermination criteria could as well be specified as thepoint where a particular level of improvement hasbeen obtained over the starting schedules. It is im-portant that an alternative is also suggested sinceimprovement based termination may never get re-alized.

5. Transaction Scheduling

The aforementioned problem assumes that re-quests are made at the data item level. A clientinterested in multiple data items would make in-dependent requests for the individual items. As anextension of this problem, we next consider the caseof transaction level requests. As mentioned in Sec-tion 1, a transaction level request di"ers from adata item level request in the sense that a requestinvolves more than one data item. The order ofretrieval of the data items is not important, butprocessing at the client end cannot start until alldata items are received.

The formulation of this next problem is similarto the one suggested in Section 3.3, with di"erencesappearing in the definition of a query Qj . An en-try Qj in the request queue now takes the form!Dj , Rj , Pj", where Dj = {Dj1 , . . . , Djn} ( D andRj , Pj & 0. The request is considered to be servedat tj,served, which is the time instance when the lastremaining data item in Dj is broadcast. The utilitygenerated by serving the request then follows fromEq. (1). Note that, with this formulation, schedul-ing at the transaction level is not a simple extensionof the problem of scheduling an aggregation of mul-tiple requests from the same client. The followingfactors highlight the di"erences in data item andtransaction level scheduling.

1. Multiple data item level requests made fromthe same client can have di"erent deadlines as-sociated with di"erent data items. However,

the deadline specified in a transaction level re-quest is specific to the transaction and not theindividual data items required to complete it.

2. Each request made at the data item level wouldaccrue some utility, irrespective of whether itcame from the same client or not. The notionof dependency between the data items (to com-plete a transaction) is not embedded in the re-quests. A request at the transaction level spec-ifies this dependency by collecting the requireddata items into a set and making a request forthe entire set. Utility is accrued in this caseonly when all items in the set are served.

3. When requests are made independently, thescheduler is not required to maintain low lev-els of latency between broadcasts of data itemsrequested by the same client. However, at thetransaction level, the schedules generated musttake the latency into consideration since theutility will keep reducing until all requesteddata items are retrieved by the client.

5.1. Heuristics

The two traditional heuristics for data item levelscheduling – EDF and HUF – are slightly modifiedto perform transaction level scheduling. The EDFheuristic for transaction level scheduling, denotedby T-EDF, sequentially schedules all data items ina request by giving preference to the request hav-ing the earliest deadline. The HUF heuristic fortransaction level scheduling, denoted by T-HUF,sequentially schedules all data items in a requestwith preference to the one which can generate thehighest amount of utility. The modifications enablethe heuristics to make scheduling decisions based onthe deadline, or utility, of serving a request, ratherthan the data items contained in it.

One might argue that T-EDF and T-HUF canbe represented by EDF and HUF, respectively byconsidering each request to be for a single dataitem whose size is given by the total size of thedata items requested. This is a viable representa-tion since T-EDF and T-HUF do not take the dataitems contained in a request into account. How-ever, note that the schedule generated is always atthe data item level. Hence, there is room to ex-ploit any existing overlaps in the data items of twodi"erent requests. For example, consider the re-quests A for data items {D1, D2, D3, D4} and Bfor data items {D2, D4}, with every data item hav-ing the same size. Representing A as a request for

10

Page 11: Utility Driven Optimization of Real Time Data …iray/research/papers/asoc12.pdfUtility Driven Optimization of Real Time Data Broadcast Schedules ... The system gathers real time traffic

a data item DA (whose size is the sum of the sizesof D1, D2, D3 and D4) and B as a request for adata item DB loses the fact that the data itemsin B are a subset of those in A. Further, exploit-ing the existence of such subsets is also not trivial.This is specifically true for T-EDF which makesdecisions based solely on the deadline. If A hasan earlier deadline than B then the schedule gen-erated will be D1 < D2 < D3 < D4; otherwiseD2 < D4 < D1 < D3. Observe that, in the formercase, both requests A and B will be served at thesame time instance (the instance when D4 startsbroadcasting), while in the latter case, B will getserved earlier. Hence, irrespective of the deadlines,the latter schedule is always equal to or better thanthe former. It is possible that even T-HUF fails toexploit the relationship. If serving A first generateshigher utility than serving B first, T-HUF wouldgenerate the former schedule, thereby pushing thefinish time of B further in the time line. It missesthe observation that by serving B first it can stillmaintain the same utility for A. The situation isfurther complicated when the data items have vary-ing sizes. Given such observations, augmenting theheuristics with local search would allow for an in-teresting exploration at the data item level. Recallthat the local search would attempt to improve onthe heuristic generated schedule by swapping dataitems at di"erent positions. The corresponding lo-cal search variants for T-EDF and T-HUF are de-noted by T-EDF/LS and T-HUF/LS, respectively.

5.2. (2 + 1)-ES

The (2 + 1)-ES does not make a distinction be-tween data item level and transaction level requests.It always operates at the data item level irrespec-tive of how the request was made. Similar to thedata item level case, the (2+1)-ES here first deter-mines the set of data items to be scheduled in orderto serve all requests in the queue –

#

Dj . It thengenerates a schedule with these items to maximizethe utility. Since requests are served only whenall requested data items are received by a client,random initial solutions are likely to disperse de-pendent data items (belonging to the same trans-action). Hence, we modified the initialization strat-egy to direct the search from a favorable region inthe search space. The modification creates the ini-tial solutions by choosing random requests from thequeue and sequentially laying out the data itemscontained in the requests on the schedule. For ex-ample, if D1 = {D1, D2, D3} and D2 = {D3, D2}

are two transactions on the request queue, andtransaction 1 is randomly chosen under this process,then data items D1, D2 and D3 will be consecutiveon the initial schedule. The process generates a fa-vorable arrangement of the data items for some ofthe requests. The remaining functionality of the ESis maintained as described in Section 4.3.

5.3. ASETS*

In addition to T-EDF and T-HUF, we ex-periment with another heuristic called ASETS*for transaction level schedules. ASETS* is aparameter-free adaptive policy that attempts tocombine the benefits of EDF and the SRPT (Short-est Remaining Processing Time) heuristics by adap-tively choosing the right algorithm depending onthe load in the system [23]. The algorithm is basedon the argument that EDF is often useful in min-imizing the tardiness (amount of delay after thedeadline) of deadline based requests in a lightlyloaded system, while SRPT proves to be a betterchoice under high loads. Under ASETS*, two sep-arate lists are maintained, one with requests thatcan still make their deadlines (and ordered by theirdeadlines) and another with those that have alreadymissed their deadlines (ordered by their remainingprocessing time). The heuristic then chooses thetop request from one of the two lists, based on thetotal negative impact on the total tardiness of thesystem. Note that the idea of web transactions usedin the original work is fundamentally di"erent fromthat of a transaction in this work. Nonetheless,adaptive switching between EDF and SRPT can bepotentially useful when transactions have low/highoverlaps in the data items they request.

6. Experimental Setup

The data sets used in our experiments are gen-erated using various popular distributions that areknown to capture the dynamics of a public dataaccess system. The di"erent parameters of the ex-periment are tabulated in Table 1 and discussed be-low. We generate two di"erent data sets with theseparameter settings, one involving data item levelrequests and another involving transaction level re-quests. Experiments are run independently on thetwo data sets using the corresponding heuristics andthe (2 + 1)-ES.

Each data set contains 10, 000 requests generatedusing a Poisson distribution with an arrival rate of

11

Page 12: Utility Driven Optimization of Real Time Data …iray/research/papers/asoc12.pdfUtility Driven Optimization of Real Time Data Broadcast Schedules ... The system gathers real time traffic

Table 1: Experiment parameters.

Parameter Value Comment

N 300 Number of data itemsdmin 5KB Minimum data item sizedmax 1000KB Maximum data item sizeb 120 KB/s Broadcast channel bandwidthm 60s/180s Request response time mean

for data item/transaction level requests# 20s/40s Request response time standard deviation

for data item/transaction level requestsr 5 Request arrival raterT 5 Transaction size Poisson distribution rates 0.8 Zipf’s law characterizing exponentP Low(1), Medium(2), High(4) Priority levelsp 0.5 Mutation probability

Gen 1000 Number of iterations for local search and ES

r requests per second. Each request consists of anarrival time, data item number (or set of data itemnumbers for transaction level requests), an expectedresponse time, and a priority level. For transactionlevel requests, the number of data items containedin a transaction is drawn from another Poisson dis-tribution with rate rT items per transaction.

We would like to point here that the number ofdata items (N) to be handled by a scheduler neednot be very big for a particular service. Each broad-cast station would typically be responsible for cater-ing requests involving a sub-domain of the entiresetup, in which case its data domain would also becomparatively smaller. The data items requestedare assumed to follow the commonly used Zipf -likedistribution [29] with the characterizing exponentof 0.8. Under this assumption, the first data itembecomes the most requested item, while the last oneis the least requested. Broadcast schedules can beheavily a"ected by the size of the most requesteddata item. Hence, we consider two di"erent assign-ments of sizes adopted from earlier works [3, 8]. TheINC distribution makes the most requested dataitem the smallest in size, while the DEC distribu-tion makes it the largest in size.

INC : di = dmin +(i# 1)(dmax # dmin + 1)

N

DEC : di = dmax #(i# 1)(dmax # dmin + 1)

N

Expected response times are assigned from a nor-mal distribution with mean m and standard devi-ation #. The particular settings of these parame-

ters in our data item level experiment results in ex-pected response times to be generated in the rangeof [0, 120]s with a probability of 0.997. For trans-action level requests, the mean and standard devi-ation are changed so that the interval changes to[60, 300]s. Any value generated outside the rangeof the intervals is changed to the lower bound ofthe corresponding interval.

We use three di"erent priority levels for the re-quests – low, medium, and high. Numeric values areassigned to these levels such that the significance ofa level is twice that of the previous one. Since thetotal utility is related to the priority of the requests,we make assignments from these levels in such a waythat the maximum utility obtainable by serving re-quests from each priority level is probabilisticallyequal. To do so, we use a roulette-wheel selection[27] mechanism which e"ectively sets the probabil-ity of selecting a priority level as: P (Low) = 4

7 ,P (Medium) = 2

7 , and P (High) = 17 . This en-

sures that no priority value is over-represented interms of its ability to contribute to the total utilityobtainable. This characteristic is required so thatheuristics like HUF cannot ignore low priority re-quests in an attempt to maximize the total utility.

Workloads on the scheduler can be varied by ei-ther changing the request arrival rate, or the chan-nel bandwidth. We use the latter approach andspecify the bandwidth used wherever di"erent fromthe default.

The local search for EDF/LS, HUF/LS, T-EDF/LS and T-HUF/LS are run for Gen numberof iterations. For (2 + 1)-ES, the same number of

12

Page 13: Utility Driven Optimization of Real Time Data …iray/research/papers/asoc12.pdfUtility Driven Optimization of Real Time Data Broadcast Schedules ... The system gathers real time traffic

Table 2: Percentage of maximum total utility obtained by the solution methods on data item level scheduling.

EDF EDF/LS HUF HUF/LS RxW SIN2 NPRDS (2+1)-ES

INC 83.52 88.21 96.32 97.88 73.71 73.87 77.01 97.26DEC 20.72 40.24 42.13 55.51 42.04 42.46 42.87 69.07

iterations is chosen as the termination criteria. Thenumber of iterations has been fixed by experimenta-tion such that the wall-clock scheduling time is notmore than 0.01s on a 2.4 GHz Pentium 4 machinewith 512 MB memory and running Fedora Core 7.This is critical so that the runtime of the iterativealgorithms do not impose high latencies in the de-cision making process. However, the non-iterativeheuristics have a much lower runtime – in the or-der of 2 - 5 milliseconds. The performance of eachmethod is measured at the time instance when allthe requests get served. In other words, the perfor-mance of each method is given by the sum of theutilities generated by serving each request in thedata set.

7. Results and Discussion

We first present the overall performance resultsobtained for the di"erent solution methodologies onthe data item level data set. Although, the dataitem level scheduling problem can be viewed as aspecial case of the transaction level scheduling, weobserved that a method’s performance can be quitedi"erent in these two problem classes.

7.1. Data Item Level Scheduling

Table 2 shows the performance in terms of thepercentage of maximum total utility returned by us-ing each of the methods. The maximum total utilityis obtained when every request is served within itsexpected response time, in which case it attains anutility equal to its priority level. Thus, summingup the priorities of all requests gives the maximumtotal utility that can be obtained.

For the INC data size distribution, HUF,HUF/LS and ES have similar performance. Al-though, EDF and EDF/LS have a slightly lowerperformance, both methods do reasonably well. Amajor di"erence is observed in the amount of im-provement gained in EDF by using local search, ascompared to that of HUF. The other three heuris-tics demonstrate a comparatively lower achieve-ment in total utility.

7.1.1. EDF vs. HUF

Di"erences arising in the performance of EDFand HUF can be explained using Fig. 6. The toprow in the figure shows the utility obtained by serv-ing a request. Clearly, the accumulation of pointsis mostly concentrated in the [0, 0.5] range for EDF(left). For HUF (right), three distinct bands showup near the points 4, 2, and 1 on the y-axis. Ahigh concentration of points in these regions indi-cate that a good fraction of the requests are servedwithin their response time requirements. Moreover,even if the response time requirement could not bemet, HUF manages to serve them with a good util-ity value. The figure confirms this point since theband near 0 utility in HUF is not as dense as inEDF.

The bottom row in Fig. 6 plots the utility of therequests served during a particular broadcast. Wenotice the presence of vertical bands in EDF (left)which shows that a good number of requests getserved by a single broadcast. This is also validatedby the fact that EDF does almost half the numberof broadcasts as done by HUF (right). For an in-tuitive understanding of this observation, considerthe instance when a broadcast is ongoing. Multi-ple new requests come in and accumulate in thequeue until the current broadcast ends. Most ofthese requests would be for data items that aremore frequently requested. When EDF generates aschedule for the outstanding requests, it gives pref-erence to the request that is closest to the deadline,or has crossed the deadline by the largest amount.Since the queue will mostly be populated with re-quests for frequently requested items, chances arehigh that EDF selects one of such requests. Thus,when a broadcast for such an item occurs, it servesall of the corresponding requests. This explana-tion is invalid for HUF since preference is first givento a data item that can generate the most utility.Since the data item with highest utility may not bethe most requested one, more broadcasts may beneeded for HUF.

Further, when the request queue gets long, EDF’spreference to requests waiting for a long time to beserved essentially results in almost no utility from

13

Page 14: Utility Driven Optimization of Real Time Data …iray/research/papers/asoc12.pdfUtility Driven Optimization of Real Time Data Broadcast Schedules ... The system gathers real time traffic

Broadcast

Request Request

Broadcast

Utility

Maximum utility: + = 4, * = 2, = 1 .

EDF

EDF

HUF

HUF

Figure 6: Utility derived by using EDF and HUF with INC data size distribution. Top: Utility obtained from each of therequest for EDF (left) and HUF (right). Bottom: Utility of requests served during a broadcast for EDF (left) and HUF (right).

serving that request. If we extend this observa-tion into the scheduling decisions taken over a longtime, EDF will repeatedly schedule older and olderrequests. If the queue size continues to grow, thisessentially results in EDF failing to generate anysubstantial utility after a certain point of time, asseen in Fig. 6 (bottom).

7.1.2. Impact of local searchLocal search improves the EDF and HUF results

for the DEC distribution up to almost between 75%and 100%. Recall that the DEC distribution as-signs the maximum size to the most requested dataitem. If a schedule is not carefully built in this situ-ation, there could be heavy losses in utility becauseother requests are waiting while the most requesteddata item is being broadcast. It is important thatthe scheduler does not incorporate the broadcastof heavy data items too frequently into its sched-ule and instead find a suitable trade-o" with theinduced utility loss. Unfortunately EDF and HUFgenerated schedules fail to maintain this sought bal-ance. To illustrate what happens when local searchis added, we refer to Fig. 7.

To generate Fig. 7, we enabled the local searchmechanism when the 3000th scheduling instancewith EDF is reached. At this point, the requestqueue contained requests for 127 unique data items.The local search mechanism uses the 2-exchangeoperator to generate a new schedule. To beginwith, the 2-exchange operator is applied to the EDFschedule to generate a new one. If the utility im-proves by more than 20 units, the new schedule

0 1000

1.0

2.5

Iteration

Utility improvement factor

Figure 7: Improvements obtained by doing 50000 iterationsof the 2-exchange operator for the EDF schedule generatedduring the 3000th scheduling instance. The DEC data sizedistribution is used here. A newly obtained solution is con-sidered better only if it exceeds the utility of the currentsolution by at least 20 units.

is considered for any further 2-exchange operation.The process is repeated for 50000 iterations. Theplot shows the factor of improvement obtained fromthe EDF schedule at each iteration of the localsearch. The solid line joins the points where a gen-erated schedule had an utility improvement of 20units, or more, over the current one.

The horizontal bands in the figure illustrate thefact that the schedule space is mostly flat in struc-ture. For a given schedule, most of its neighbor-ing schedules (obtained by the 2-exchange opera-

14

Page 15: Utility Driven Optimization of Real Time Data …iray/research/papers/asoc12.pdfUtility Driven Optimization of Real Time Data Broadcast Schedules ... The system gathers real time traffic

% success rate

Empirical CDF

0.13

0.97

0.92

0.07

HUFEDF

Figure 8: Empirical cumulative density function of the %success rates of performing 100 independent iterations of the2-exchange operation in each scheduling instance with EDF(left) and HUF (right). A success means the operation re-sulted in a better schedule.

tor) have, more or less, equal amounts of utility as-sociated with them. An interesting observation isthat the schedule utility improves to a factor of 2.5within the first 1000 iterations of the local search(inset Fig. 7), after which the progress slows down.This implies that as the schedules become betterand better, the local search mechanism finds it dif-ficult to generate a still better schedule (observethe long horizontal band stretching from around the15000 to the 50000 iteration).

Since the results indicate that EDF and HUFboth gain substantial improvement with localsearch, it can be inferred that their schedules havemuch room for improvement when the utility mea-surement is the metric of choice. This is evidencedin Fig. 8. To generate the plots, the space nearan EDF (HUF) generated schedule is sampled 100times. Each sample is obtained by applying the 2-exchange operator to the heuristic generated sched-ule. A success is noted if there is an increase in util-ity of the schedule. With the success rate (numberof success/number of samples) obtained in all thescheduling instances for the 10000 requests, an em-pirical cumulative density function is constructed.A point (x, y) in the plot evaluates to saying thatin y fraction of the scheduling instances, a betterschedule is obtained 0 to x times out of the 100independent samples taken. For EDF, about 84%(97% # 13%) of the schedules have been improved40 to 60 times. This high success rate for a majorityof the schedules generated indicate that EDF is nota good heuristic to consider in the context of utility.HUF displays a similar increase in utility, but with

Table 3: (2+1)-ES generated total raw utility value shown asmean/standard deviation from 30 runs. The maximum totalutility that can be generated is: 17230 and 16961 units in thedata item level and transaction level data sets, respectively.

Data Item Transaction

INC 16761.07/51.54 12657.14/83.62DEC 11728.61/167.35 6910.68/113.81

a relatively lower success rate. HUF schedules gen-erate higher utilities than EDF schedules and hencethe success rate is low.

The observations till this point help us make thefollowing conclusions. The nature of the searchspace tells us that significant improvements can beobtained by using local search on heuristic sched-ules, specially when the schedule utilities are sub-stantially lower than what can be achieved. Weobserve that EDF and HUF in fact generate sched-ules that are not di!cult to improve with a singleswap of the data item positions. Therefore, com-bining local search to both the heuristics enable usto at least “climb up” the easily reachable points inthe search space.

7.1.3. HUF/LS vs. (2 + 1)-ESOur justification as to why local search with an

operator like 2-exchange fails after a certain pointis based on the fact that these operators are lim-ited by the number of points they can sample – theneighborhood. As schedules become better, muchvariation in them is required to obtain further im-provement. Mutation based operators are designedto limit this variation while recombination can sam-ple the search space more diversely [30, 31]. Thisis the primary motivation behind using a recombi-native ES to get improved schedules. In addition,our initial experiments showed that a higher muta-tion rate (0.5 used here) is useful in better samplinga flat search space, specially when working with arestricted population size. Table 3 lists few disper-sion measures to statistically validate the resultsfrom the ES. Standard deviation measures derivedfrom 30 independent runs of the algorithm are lessthan 1% in all the scenarios.

To analyze the performance di"erences inHUF/LS and ES, we make the problem di!cult byreducing the bandwidth to 80 KB/s. The DECdata size distribution is used and the broadcast fre-quencies for the 300 data items are noted. Fig. 9(top) shows the frequency distribution. A clear dis-tinction is observed in the frequencies for average

15

Page 16: Utility Driven Optimization of Real Time Data …iray/research/papers/asoc12.pdfUtility Driven Optimization of Real Time Data Broadcast Schedules ... The system gathers real time traffic

Figure 9: Top: Broadcast frequency of the N data items forES (left) and HUF/LS (right). Bottom: Fraction of max-imum utility obtained from the di"erent broadcasts for ES(left) and HUF/LS (right). The DEC distribution is usedwith a 80 KB/s bandwidth.

sized data items. Recall that HUF first schedulesthe data item with the highest utility. However,it fails to take into account the impact of broad-casting that item on the utility that can be gener-ated from the remaining requests. Since a majorityof the requests are for the larger data items, it ishighly likely that such items get scheduled morefrequently. As a result, most of the bandwidth isused up transmitting heavy data items. The im-pact of this is not felt on small data items as theyare not requested often. However, for average dataitems that do have a substantial presence in the re-quests, utility can su"er. The di"erence betweenthe HUF/LS schedule and ES schedule appears atthis point.

HUF schedules broadcast average sized items tooinfrequently, which implies that most requests forthem wait for a long time before getting served. ESschedules have a comparatively higher frequency ofbroadcast for such data items, thereby maintaininga trade-o" between the utility loss from not servic-ing frequently requested items faster and the utilitygain from servicing average data items in a uniformmanner. As can be seen from Fig. 9 (bottom), thepitfalls of the absence of this balance in HUF/LS

is observed after a majority of the broadcasts hasbeen done. HUF/LS schedules do perform better inmaintaining a good fraction of the maximum utilityduring the initial broadcasts (notice that majorityof the points are above the 0.2 limit on the y-axisprior to the 600th broadcast). Much of the di"er-ence in performance arises because of the utilitylosses resulting after that. In contrast to that, ESschedules consistently balance losses and gains toperform well almost till the end of the last broad-cast.

7.1.4. Other heuristicsThe remaining three heuristics, namely RxW,

SIN2 and NPRDS, employ elaborate strategies tocontrol the size of the request queue. However,these strategies seem to have a minimal impact, ifnot adverse, on the performance of the heuristics.All of these heuristics achieve a comparatively lowerutility level on the INC distribution, which appearsto be a relatively easier problem for the other meth-ods. Nonetheless, considering factors such as num-ber of pending requests and data item size playsa vital role when most data items to be broadcastare comparatively larger in size. The three heuris-tics account for these features and achieve equallevels of utility as achieved by HUF or EDF/LS.There exists correlation between how fast the re-quest queue can be cleared and the utility achievedin the process. Unlike EDF, these heuristics havea strategic bias towards broadcasts that can cleara larger number of requests from the queue. Lo-cal search remains a viable supplement to furtherimprove the schedules.

7.2. Transaction Level Scheduling

Table 4 shows the performance of the solutionmethods on the transaction level scheduling prob-lem in terms of the percentage of maximum utilityattained. Similar to the data item level scheduling,the problem is not di!cult to solve for the INCtype data size distribution. Hybrid methods involv-ing the use of local search appear to be particularlye"ective. The problem is comparatively di!cult tosolve with the DEC type distribution. Nonethe-less, local search provides substantial improvementon the quality of the heuristic generated schedulesin this case as well.

7.2.1. Performance on INCThe INC type distribution assigns the smallest

size to the most requested data item. With this

16

Page 17: Utility Driven Optimization of Real Time Data …iray/research/papers/asoc12.pdfUtility Driven Optimization of Real Time Data Broadcast Schedules ... The system gathers real time traffic

Table 4: Percentage of maximum total utility obtained by the solution methods for transaction level scheduling.

T-EDF T-EDF/LS T-HUF T-HUF/LS ASETS* (2+1)-ES

INC 66.69 87.78 70.92 87.68 80.09 74.18DEC 33.47 54.82 27.14 50.88 14.95 40.77

1 2 4 1 2 4

1 2 4 1 2 4

228.9 232.3 230.3

112.7 105.0 104.1

196.8 182.7146.8

113.7 105.2 96.9

T-EDF T-EDF/LS

T-HUF T-HUF/LS

Request priority

Response time (s)

Figure 10: Whisker plots of response time for earliestdeadline and highest utility heuristics on transaction levelscheduling with the INC distribution. T-EDF fails to dis-tinguish between requests of di"erent priority levels. T-HUFmakes the distinction on the average but response times areoften high. Local search suitably modifies the schedules inboth cases.

distribution, most requests have smaller data itemsin the requested set. Typical bandwidth values, asused in these experiments, thus allow multiple (anddi"erent) requests to be served in a relatively shortspan of time. Most di!culties in scheduling arisewhen sporadic broadcasts of bigger data items aremade and the request queue grows in this duration.

Fig. 10 shows whisker plots of the response timesfor the earliest deadline and the highest utilityheuristics, along with their local search variants,when applied on the data set with the INC dis-tribution. The response time of a request is thedi"erence in time between its arrival and the broad-cast of the last remaining data item in the requestedset. Note that, despite the high variance in responsetime, T-HUF manages a higher utility than T-EDF.The only visible factor better in T-HUF is the me-dian value of these response times. We thus focusour attention to this statistic. Using the medianresponse time as the metric, we see that T-EDFmaintains similar values across requests of di"erentpriorities. It fails to identify that high priority re-

quests can generate more utility and hence the dataitems in such requests should receive some level ofpreference over low priority requests. The observa-tion is not surprising since T-EDF’s e"orts are re-stricted to the deadline requirement only. In e"ect,there is no distinction between two requests withthe same amount of time remaining to their dead-lines, but with di"erent priorities. T-HUF makesthe distinction clear and further di"erentiates be-tween such requests with the added advantage ofbeing able to identify broadcasts that might servemultiple low priority requests instead of just a sin-gle high priority request.

Although T-EDF and T-HUF’s di"erences comefrom the ability to distinguish between priority lev-els of requests, it does not seem to be the only factora"ecting the utility. Local search on these methodsresult in substantial improvement. As describedin the hypothetical example in Section 5.1, bothT-EDF and T-HUF do not take into account thee"ect of intermingling data items on the scheduleutility. By swapping data items across the sched-ule, local search e"ectuates better exploitation ofthe fact that commonly requested data items arepresent in a significant fraction of the requests inthe queue.

7.2.2. Performance on DEC

A major di"erence between data item levelscheduling and transaction level scheduling is in thenumber of data items that has to be broadcast toserve the requests in the data set. With an aver-age transaction size of 5 data items in each request,transaction level scheduling has to make 5 di"erentbroadcasts before a single request is served. Thus,heuristics in this problem must be able to exploitany overlaps between requests to be e"ective in util-ity. The problem is further complicated with theDEC type distribution. All requests in this casewill contain at least one or more big data items asa result of the Zipf distribution. If broadcast ofsuch items (taking a long time to broadcast) servesa very small fraction of the requests, then utilitywill most likely su"er in the long run. Further, av-erage or smaller sized data items will be requested

17

Page 18: Utility Driven Optimization of Real Time Data …iray/research/papers/asoc12.pdfUtility Driven Optimization of Real Time Data Broadcast Schedules ... The system gathers real time traffic

Data item number

Broadcast frequency T-EDF T-EDF/LS T-HUF T-HUF/LS ES

Figure 11: Broadcast frequency of the N data items for transaction level scheduling in the DEC distribution. Distribution ofaverage and below-average sized data items reaches a uniform trend for high utility solution methods.

with a less frequency, often by requests separatedfar apart in the time line. Once again, utility willsu"er if a heuristic waits for a long time to accu-mulate requests for such data items with the objec-tive of serving multiple such requests with a singlebroadcast. Thus, a heuristic must be able to bal-ance these two aspects of broadcasting when work-ing on transaction level scheduling with the DECdistribution.

Fig. 11 shows the broadcast frequency of the dif-ferent data items on this problem. The frequencydistribution is very dissimilar than in the case ofdata item level scheduling. T-EDF, which performsbetter than T-HUF here, manages to maintain analmost uniform distribution. In fact, if the first20% of the bigger data items are not considered, allmethods show a similar trend. The uniform distri-bution is an extreme form of the balance we seekas indicated above. T-HUF encounters a problemwhen seeking this balance and results in a marginaldrop in performance as compared to T-EDF. Intransaction level scheduling, requests which over-lap significantly with parts of other requests are theones that can generate the highest utility. The over-lap mostly occurs in the frequently requested dataitems. T-HUF schedules such requests first withouttaking into account that delaying a broadcast for acommonly requested item can actually help accu-mulate more requests interested in the same dataitem. Unless a new request has better overlap fea-tures than the ones already existing in the queue,the T-HUF schedule undergoes very small changesand virtually remains consistent, for a certain pe-riod of time. What disrupts the consistency is theaccumulation of enough new requests to change theoverlap features altogether. However, multiple dataitems (often the bigger ones) have already beenbroadcast by this time and T-HUF is needed to

schedule another broadcast for them. In the caseof T-EDF, this e"ect is overpowered to some ex-tent by the deadline requirement. As and when anew request with a smaller deadline, and preferablywith a lesser overlap with the existing requests ar-rive, T-EDF immediately gives preference to it. Asa result, broadcast times of data items in the ex-isting schedule is delayed, which helps serve morerequests now. The frequency of broadcast of com-monly requested data items in T-EDF and T-HUFcorroborates our justification.

Performance improvements in these heuristicswhen aided by local search is mostly attributableto the higher frequency of broadcasts for averageand smaller sized data items. Recall that a re-quest is served only when all data items of inter-est are received by it, irrespective of the order ofretrieval. Consider a request interested in the dataitems D1, D2, and D3, with D1 being a commonlyrequested item under the DEC distribution. Next,consider the two schedules – D1 < D2 < D3 andD3 < D2 < D1. In both cases, the client makingthe request completes retrieval of the data items atthe same time instance. However, the latter sched-ule has the advantage of delaying the broadcast ofD1, which in e"ect improves the chances of it serv-ing newly accumulated requests as well. Exploitingsuch strategies is not possible by T-EDF and T-HUF unless aided by an exchange operator of thekind present in the local search variant. The side ef-fect of this is that infrequently requested data itemswill be more often broadcast. This is clearly visiblein T-EDF/LS and T-HUF/LS.

7.2.3. Dynamics of local searchThe ability to exploit the overlap features of re-

quests makes local search a promising aid to theheuristic generated schedules in both data item

18

Page 19: Utility Driven Optimization of Real Time Data …iray/research/papers/asoc12.pdfUtility Driven Optimization of Real Time Data Broadcast Schedules ... The system gathers real time traffic

INC

DEC

Local search iteration

Utility improvement factor

Figure 12: Improvements obtained by local search in T-EDF/LS during the 5000th scheduling instance. Improve-ments are minor when the T-EDF schedule is already “good"(INC distribution). Utility shows an increasing trend at theend of the 1000th iteration of the local search for the DECdistribution.

level and transaction level scheduling. Recall fromFig. 7 that the local search is most e"ective whenschedules can be easily improved by adopting ahill-climbing approach. The search space becomesmore and more flat as better schedules are discov-ered. Often, reaching this flat region is not di!cultin data item level scheduling, both for the INCand DEC distribution. However, the dynamics aresomewhat di"erent in transaction level scheduling.

Fig. 12 shows the improvement gained over a T-EDF generated schedule during the 5000th schedul-ing instance in T-EDF/LS. The plot depicts theprogress of local search in its search for a betterschedule. The T-EDF generated schedule is closerto a flat region for the INC distribution than forthe DEC distribution. Hence, improvements areoften slow in the INC distribution. However, ob-serve that for the DEC distribution, at the endof the 1000th iteration of local search, the improve-ments are still showing an increasing trend. This in-dicates that, given more number of iterations, localsearch can continue to easily improve the schedulesbefore hitting the flat regions. Given the diversenumber of factors determining the utility of sched-ules in the DEC distribution, it is not surprisingto see that T-EDF generated schedules are far frombeing optimal. Besides, a hill-climbing strategy likelocal search can e"ectively exploit the dynamics ofthe search space to adjust these schedules to bring

them closer to the optimal. The only hindrancewe face in doing so is the right adjustment of thenumber of iterations without imposing a bottleneckin the running time of the scheduler. We providesome suggestions on how this can be done in a latersubsection.

7.2.4. Performance of (2 + 1)-ESOur primary motivation behind using (2+ 1)-ES

in data item level scheduling is to enable a diversesampling of the search space when a schedule’s util-ity no longer shows quick improvements. However,for transaction level scheduling, other factors startto dominate the performance of this method.

Transaction level scheduling leads to an explo-sion in the size of the search space. For data itemlevel scheduling, a request queue of size n with re-quests for distinct items has a schedule search spaceof size n!, whereas in transaction level schedulingwith k items per transaction, the search space canbecome as big as (kn)!. A bigger search space notonly requires more exploration, but can also injecta higher possibility of prematurely converging to alocal optima. Fig. 13 depicts the exploration withlocal search and the (2+1)-ES. In order to generatethe plots, we ran T-EDF until the 5000th requestarrives. A schedule for the requests in the queueat this point is then generated by using T-EDF/LSand (2 + 1)-ES ran for 5000 iterations each. Thisguarantees that both methods are operating on thesame queue (same search space) and for su!cientnumber of iterations. Utility improvements by theES starts stagnating after the 2000th iteration, sug-gesting convergence towards a local optima. Localsearch, on the other hand, has a better rate of im-provement before starting to stagnate. This is vis-ible for both INC and DEC distributions.

The premature convergence in (2 + 1)-ES ismostly due to the loss in genetic diversity of thesolutions. As more and more recombination takesplace, the two parents involved in the method startsbecoming more and more similar, and finally areunable to generate a diverse o"spring. This phe-nomenon is what typically identifies the conver-gence of the method. However, given the biggersearch space and the small population (two) in-volved in exploring it, the phenomenon results inthe method getting trapped in the local optimapresent across the space. It is suggestive from thebroadcast frequency of di"erent data items (Fig.11) that, on the overall, the method failed to bal-ance the broadcast of di"erent sized data items for

19

Page 20: Utility Driven Optimization of Real Time Data …iray/research/papers/asoc12.pdfUtility Driven Optimization of Real Time Data Broadcast Schedules ... The system gathers real time traffic

Iteration number

Utility improvement factor

Local search Local search

ESES

INC DEC

Figure 13: Improvements obtained by local search and ES in the schedule generated by T-EDF during the 5000th schedulinginstance. For transaction level scheduling, local search performs better in avoiding local optima compared to (2 + 1)-ES.

the DEC distribution. Changing the µ and " pa-rameters, as well as the mutation probability ofthe method usually helps resolve such prematureconvergence. However, one should keep the realtime runtime constraints into consideration whileexperimenting with the parameters. Local search,on the other hand, does not face such a problem.Ideally, given enough number of iterations, a hill-climbing approach can be made resilient againstgetting trapped in a local optimum using randomrestarts.

7.2.5. Performance of ASETS*

The ability to adaptively choose between EDFand SRPT appears to have a positive impact on thetotal utility achievable by a heuristic. The strategyworks considerably well on the INC data size dis-tribution, with global utility levels surpassing thatof a naive heuristic like T-EDF or T-HUF. However,the same strategy has a much negative impact whenthe data size distribution is reversed. In this case,ASETS* fails to generate almost any utility in thebroadcast data. Note that ASETS* does not distin-guish between the di"erent data items that form atransaction, but only considers the time required tobroadcast all the remaining items in a transaction.This disables the algorithm from identifying exist-ing overlaps between transactions. Exploiting theseoverlaps becomes crucial in a broadcast scenariowhere most items will occupy the broadcast band-width for a relatively longer extent of time. Thisresult further corroborates the claim that schedul-ing for multiple data items is not a simple extensionof the data item level counterpart.

7.3. Scheduling Time

The number of generations allowed to localsearch, or ES, can a"ect the quality of solutionsobtained and the time required to make schedulingdecisions. In our experiments, this value is set sothat an average request queue can be handled in asmall amount of time. However, the average queuesize will greatly vary from problem to problem, of-ten depending on the total number of data itemsserved by the data source. In such situations, itmay seem di!cult to determine what a good valuefor the number of iterations should be. Further, ina dynamic environment, the average queue lengthitself may be a varying quantity. Nonetheless, oneshould keep in mind that scheduling decisions neednot always be made instantaneously. The broadcasttime of data items vary considerably from one to theother. The broadcast scheduled immediately nextcannot start until the current one finishes. Thislatency can be used by a scheduler to continue itssearch for better solutions, specially with iterativemethods like a local search or an ES.

8. Conclusions

In this paper, we address the problem of timecritical data access in wireless environments wherethe time criticality can be associated with a QoSrequirement. To this end, we formulate an util-ity metric to evaluate the performance of di"er-ent scheduling methods. The earliest deadline first(EDF) and highest utility first (HUF) heuristicsare used in two problem domains – data item levelscheduling and transaction level scheduling. In thedata item level domain, our initial observation ontheir performance conforms to the speculation that

20

Page 21: Utility Driven Optimization of Real Time Data …iray/research/papers/asoc12.pdfUtility Driven Optimization of Real Time Data Broadcast Schedules ... The system gathers real time traffic

HUF performs better since it takes into consider-ation the utility of requests while making schedul-ing decisions. Also, strategies adopted in elaborateheuristics such as RxW, SIN-! and NPRDS have apositive impact on certain classes of problem only.Further analysis of the nature of the schedulingproblem shows that local search can be a promisingsupplement to a heuristic.

The observations drawn from this understand-ing of the behavior of local search aided heuristicsenable us to propose an evolution strategy basedsearch technique that provides more variance to asimple local search. The utility induced by such atechnique surpasses that of both EDF and HUF,and their local search variants. This result alsoshows that search based optimization techniquesare a viable option in real time broadcast schedulingproblems.

The transaction level domain appear to be a rel-atively harder problem to solve. Balancing thebroadcast frequency of data items is important hereand we observe that T-EDF performs better in do-ing so as compared to T-HUF. Local search stillremains a promising candidate to boost the perfor-mance of these heuristics. For certain data size dis-tributions, local search improvements can continueif allowed to run outside the runtime restrictionsset in our experiments. Better strategies to triggerthe scheduler is thus required. Transaction levelscheduling also has di"erent search space dynam-ics, often with an explosive size and the presenceof local optima. The evolution strategy method isa"ected by this, often leading to lower utility sched-ules than local search. Further experimentation isrequired in this direction to see how the parametersof the ES can be changed to avoid susceptibility to-wards such situations.

From a utility standpoint, we need to explore theoption of designing heuristics that pay special at-tention to factors like broadcast frequency and loss-gain trade-o" during scheduling decisions. The va-lidity of a broadcast is to be taken into accountwhen requests involve multiple data items whichmay undergo regular updates. Timely delivery of adata item then has to consider a validity deadline aswell. Application usage characteristics can also beincorporated while making utility based schedulingdecisions. This is particularly useful since di"erentapplications may have di"erent time criticality re-quirements on the data, which in turn implies dif-ferent time-utility functions. The overall QoS inthe system will then depend on how the broadcast

schedule incorporates knowledge about the di"erentutility functions.

Acknowledgments

This work was partially supported by the U.S.Air Force O!ce of Scientific Research under con-tract FA9550-07-1-0042. The views and conclusionscontained in this document are those of the authorsand should not be interpreted as representing o!-cial policies, either expressed or implied, of the U.S.Air Force or other federal government agencies.

References

[1] W. de Jonge, Kilometerhe!ng op basis van elektronis-che aangifte, Informatie (2003) 22–27.

[2] S. Jiang, N. H. Vaidya, Scheduling Data Broadcasts to“Impatient" Users, in: Proceedings of the 1st ACM In-ternational Workshop on Data Engineering for Wirelessand Mobile Access, 1999, pp. 52–59.

[3] V. C. Lee, X. Wu, J. K. Ng, Scheduling Real-Time Re-quests in On-Demand Data Broadcast Environments,Real-Time Systems 34 (2) (2006) 83–99.

[4] J. Xu, X. Tang, W.-C. Lee, Time-Critical On-DemandData Broadcast: Algorithms, Analysis and Perfor-mance Evaluation, IEEE Transactions on Parallel andDistributed Systems 17 (1) (2006) 3–14.

[5] C. Su, L. Tassiulas, Broadcast Scheduling for Informa-tion Distribution, in: Proceedings of the INFOCOM’97, 1997, pp. 109–117.

[6] S. Acharya, S. Muthukrishnan, Scheduling On-DemandBroadcasts: New Metrics and Algorithms, in: Proceed-ings of the 4th Annual ACM/IEEE International Con-ference on Mobile Computing and Networking, 1998,pp. 43–54.

[7] D. Aksoy, M. Franklin, RxW: A Scheduling Ap-proach for Large-Scale On-Demand Data Broadcast,IEEE/ACM Transactions on Networking 7 (6) (1999)846–860.

[8] S. Hameed, N. H. Vaidya, E!cient Algorithms forScheduling Data Broadcast, Wireless Networks 5 (3)(1999) 183–193.

[9] K. C. K. Lee, W. Lee, S. Madria, Pervasive DataAccess in Wireless and Mobile Computing Environ-ments, Wireless Communications and Mobile Comput-ing (2008) 25–44.

[10] P. Xuan, S. Sen, O. Gonzalez, J. Fernandez, K. Ramam-ritham, Broadcast on Demand: E!cient and TimelyDissemination of Data in Mobile Environments, in:Proceedings of the 3rd IEEE Real-Time Technology andApplications Symposium, 1997, pp. 38–48.

[11] K. Lam, E. Chan, J. C. Yuen, Approaches for Broad-casting Temporal Data in Mobile Computing Systems,Journal of Systems and Software 51 (3) (2000) 175–189.

[12] J. Fernandez, K. Ramamritham, Adaptive Dissemina-tion of Data in Time-Critical Asymmetric Communica-tion Environments, Mobile Networks and Applications9 (5) (2004) 491–505.

21

Page 22: Utility Driven Optimization of Real Time Data …iray/research/papers/asoc12.pdfUtility Driven Optimization of Real Time Data Broadcast Schedules ... The system gathers real time traffic

[13] X. Wu, V. C. Lee, Wireless Real-Time On-DemandData Broadcast Scheduling with Dual Deadlines, Jour-nal of Parallel and Distributed Computing 65 (6) (2005)714–728.

[14] J. Chang, T. Erlebach, R. Gailis, S. Khuller, Broad-cast Scheduling: Algorithms and Complexity, in: Pro-ceedings of the 19th Annual ACM-SIAM Symposiumon Discrete Algorithms, 2008, pp. 473–482.

[15] B. Ravindran, E. D. Jensen, P. Li, On Recent Ad-vances in Time/Utility Function Real-Time Schedul-ing and Resource Management, in: Proceedings of the8th IEEE International Symposium on Object-OrientedReal-Time Distributed Computing, 2005, pp. 55–60.

[16] E. Jensen, C. Locke, H. Tokuda, A Time DrivenScheduling Model for Real-Time Operating Systems,in: Proceedings of the 6th IEEE Real-Time SystemsSymposium, 1985, pp. 112–122.

[17] G. Buttazzo, M. Spuri, F. Sensini, Value vs. DeadlineScheduling in Overload Conditions, in: Proceedings ofthe 16th IEEE Real-Time Systems Symposium, 1995,pp. 90–99.

[18] H. Wu, U. Balli, B. Ravindran, E. D. Jensen, Utility Ac-crual Real-Time Scheduling Under Variable Cost Func-tions, in: Proceedings of the 11th IEEE Conference onEmbedded and Real-Time Computing Systems and Ap-plications, 2005, pp. 213–219.

[19] R. Dewri, I. Ray, I. Ray, D. Whitley, Optimizing On-Demand Data Broadcast Scheduling in Pervasive En-vironments, in: Proceedings of the 11th InternationalConference on Extending Database Technology, 2008,pp. 559–569.

[20] Y. Chehadeh, A. Hurson, M. Kavehrad, Object Orga-nization on a Single Broadcast Channel in the MobileComputing Environment, Multimedia Tools and Appli-cations 9 (1) (1999) 69–94.

[21] A. R. Hurson, Y. C. Chehadeh, J. Hannan, Object Or-ganization on Parallel Broadcast Channels in a GlobalInformation Sharing Environment, in: Proceedings ofthe 19th International Performance, Computing, andCommunications Conference, 2000, pp. 347–353.

[22] J.-L. Huang, M.-S. Chen, Dependent Data Broadcast-ing for Unordered Queries in a Multiple Channel MobileEnvironment, IEEE Transactions on Knowledge andData Engineering 16 (9) (2004) 1143–1156.

[23] S. Guirguis, M. A. Sharaf, P. K. Chrysanthis,A. Labrinidis, K. Pruhs, Adaptive Scheduling of WebTransactions, in: Proceedings of the 25th InternationalConference on Data Engineering, 2009, pp. 357–368.

[24] C. D. Locke, Best-E"ort Decision Making for Real-TimeScheduling, Ph.D. thesis, Carnegie Mellon University(1986).

[25] H. G. Beyer, H. P. Schwefel, Evolution Strategies:A Comprehensive Introduction, Natural Computing 1(2002) 3–52.

[26] I. Rechenberg, Evolutionsstrategie: Optimierung tech-nischer Systemenach Prinzipien der biologischen Evo-lution, Ph.D. thesis, Technical University of Berlin(1970).

[27] D. E. Goldberg, Genetic Algorithms in Search, Op-timization, and Machine Learning, Addison-Wesley,1989.

[28] T. Starkweather, S. McDaniel, C. Whitley, K. Mathias,D. Whitley, A Comparison of Genetic Sequencing Op-erators, in: Proceedings of the 4th International Con-ference on Genetic Algorithms, 1991, pp. 69–76.

[29] L. Breslau, P. Cao, L. Fan, G. Phillips, S. Shenker, WebCaching and Zipf-Like Distributions: Evidence and Im-plications, in: Proceedings of the IEEE INFOCOM ’99,1999, pp. 126–134.

[30] H. G. Beyer, An Alternative Explanation for the Man-ner in which Genetic Algorithms Opearate, BioSystems41 (1997) 1–15.

[31] J. Holland, Hidden Order: How Adaptation Build Com-plexity, Basic Books, 1995.

22

Page 23: Utility Driven Optimization of Real Time Data …iray/research/papers/asoc12.pdfUtility Driven Optimization of Real Time Data Broadcast Schedules ... The system gathers real time traffic

Vitae

Rinku Dewri is an Assistant Professor in theComputer Science Department at University ofDenver. He obtained his Ph.D. in Computer Sci-ence from Colorado State University. His researchinterests are in the area of information security andprivacy, risk management, data management andmulti-criteria decision making. He is a member ofthe IEEE and the ACM.

Indrakshi Ray is an Associate Professor in theComputer Science Department at Colorado StateUniversity. She obtained her Ph.D. from GeorgeMason University. Her research interests includesecurity and privacy, database systems, e-commerceand formal methods in software engineering. Sheserved as the Program Chair for SACMAT 2006 andIFIP WG 11.3 DBSEC 2003. She has also been amember of several program committees such as forEDBT, SACMAT, ACM CCS and EC-Web. She isa member of the IEEE and the ACM.

Indrajit Ray is an Associate Professor in theComputer Science Department at Colorado StateUniversity. His main research interests are in the

areas of computer and network security, databasesecurity, security and trust models, privacy andcomputer forensics. He is on the editorial boardof several journals, and has served or is serving onthe program committees of a number of interna-tional conferences. He is a member of IEEE, IEEECS, ACM, ACM SIGSAC, IFIP WG 11.3 and IFIPWG 11.9.

Darrell Whitley is Professor and Chair of theComputer Science Department at Colorado StateUniversity. He also currently serves as Chair ofACM SIGEVO. He was chair of the governingboard for International Society for Genetic Algo-rithm from 1993 to 1997, and was Editor-in-Chiefof the MIT Press journal Evolutionary Computa-tion from 1997 to 2002.

23