this is a pre-print, author's version of the paper to...

12
DORA: Optimizing Smartphone Energy Efficiency and Web Browser Performance under Interference Davesh Shingari *∓ , Akhil Arunkumar †∓ , Benjamin Gaudette , Sarma Vrudhula , Carole-Jean Wu * School of Electrical, Computer and Energy Engineering School of Computing, Informatics, and Decision Systems Engineering Arizona State University Email: {dshingar,akhil.arunkumar,bgaudett,vrudhula,carole-jean.wu}@asu.edu Abstract—This paper proposes DORA — a dynamic frequency controller that maximizes the energy efficiency of smartphones subject to user satisfaction demands in the presence of mem- ory interference stemmed from background processes and co- scheduled applications. The proposed algorithm predicts the opti- mal energy-efficient frequency setting at runtime using statically- trained performance, dynamic power, and leakage power models. The parameters of the models represent web page characteristics and dynamically varying architecture and system conditions. The algorithm is designed, implemented and extensively evaluated on a Google Nexus 5 smartphone using a variety of mobile web browsing workloads. The results show high prediction accuracies for the performance and power models of 97.5% and 96%, respectively. Overall, DORA improves the smartphone’s energy efficiency by an average of 16% compared to the default Android frequency governor, interactive, while maintaining the desired levels of user satisfaction (web page load time). I. I NTRODUCTION Today’s smartphone is a high performance, parallel pro- cessing computer, with a general-purpose chip multiproces- sor, graphics processing units, digital signal processing units, and custom hardware accelerators. The increasing degree of parallelism and heterogeneity offered by the general-purpose programmable cores mean that a significant number of appli- cations (or Apps) can now be executed simultaneously, leading to an ever increasing demand on the shared resources. With the rapid improvement in the cellular technologies from 2G/3G to LTE/5G, the network latency in round-trip time has decreased significantly in the past decade, from a few 100 milliseconds to 10s of milliseconds [1], [2]. Thus, beyond the 2G/3G cellular technologies, and with advancement in the browser software protocol stack, e.g., SPDY [3], the performance bottleneck of mobile Apps shift from network performance to the performance of mobile processors. And, as the technology and system management techniques for handheld display [4]–[9] and communication fabrics [10] offer higher performance while consuming less power, the performance and energy efficiency of processors on mobile SoCs become more critical to the overall smartphone energy efficiency [1], [11], [12]. The advent of new execution paradigms on today’s smart- phones also have a significant impact on their performance and energy efficiency. Starting from Android 4.2.2 and iOS The first and the second authors contributed equally to this work. 9, mobile operating systems (OS) support multiprogramming features, such as screen sharing between multiple Apps. For instance, when users utilize a social networking App, such as Facebook, in the foreground, image recognition al- gorithms could be running at the background as part of the Facebook App, while a video conferencing App can be running simultaneously. Such execution scenarios give rises to increased contention in the memory subsystem which, if not properly managed, results in significant interference and therefore performance degradation. This performance loss can lead to quality of service (QoS) violations, which is particularly problematic for real-time, interactive applications, as these directly manifest as lower user satisfaction. For certain applications, some amount of performance degradation can be tolerated without sacrificing user satisfaction; however, there are many applications for which missing an absolute QoS target is intolerable. The web browser is a key smartphone application whose performance has a direct impact on user satisfaction. As the computation complexity and memory requirement of web pages continue to increase, the need to improve web browser performance and to ensure user satisfaction has come to the fore-front. While a number of prior works [13]–[18] have proposed solutions to address this need, they have all been limited to scenarios where the web browsers are running in isolation. However, these solutions could lead to sub-optimal performance and user satisfaction as realistic workloads and user scenarios generally consist of background tasks and other co-scheduled applications. We observe that web browsers can be highly susceptible to the impact of interference due to multiprogramming execution. To illustrate this, we conduct an experiment to study the impact of interference on the web browser QoS (web page load time). Figure 1 shows the real system measurements 1 of the web page load time for Reddit at different frequencies, when it is co- scheduled with applications with different memory intensities. The vertical bars and dots at each processor frequency show the variations in load time depending on the memory intensity of the interfering application. The horizontal dotted lines represent a 2-, 3-, or 4-second deadline, corresponding to 1 The data is collected with a Google Nexus 5 smartphone by rendering Reddit concurrently with the other interfering applications in a multipro- grammed manner. This is a pre-print, author's version of the paper to appear in the IEEE International Symposium on Performance Analysis of Systems and Software, 2018.

Upload: lyduong

Post on 19-Aug-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

DORA: Optimizing Smartphone Energy Efficiency and WebBrowser Performance under Interference

Davesh Shingari∗∓, Akhil Arunkumar†∓, Benjamin Gaudette†, Sarma Vrudhula†, Carole-Jean Wu†∗School of Electrical, Computer and Energy Engineering

†School of Computing, Informatics, and Decision Systems EngineeringArizona State University

Email: {dshingar,akhil.arunkumar,bgaudett,vrudhula,carole-jean.wu}@asu.edu

Abstract—This paper proposes DORA — a dynamic frequencycontroller that maximizes the energy efficiency of smartphonessubject to user satisfaction demands in the presence of mem-ory interference stemmed from background processes and co-scheduled applications. The proposed algorithm predicts the opti-mal energy-efficient frequency setting at runtime using statically-trained performance, dynamic power, and leakage power models.The parameters of the models represent web page characteristicsand dynamically varying architecture and system conditions. Thealgorithm is designed, implemented and extensively evaluatedon a Google Nexus 5 smartphone using a variety of mobileweb browsing workloads. The results show high predictionaccuracies for the performance and power models of 97.5% and96%, respectively. Overall, DORA improves the smartphone’senergy efficiency by an average of 16% compared to the defaultAndroid frequency governor, interactive, while maintainingthe desired levels of user satisfaction (web page load time).

I. INTRODUCTION

Today’s smartphone is a high performance, parallel pro-cessing computer, with a general-purpose chip multiproces-sor, graphics processing units, digital signal processing units,and custom hardware accelerators. The increasing degree ofparallelism and heterogeneity offered by the general-purposeprogrammable cores mean that a significant number of appli-cations (or Apps) can now be executed simultaneously, leadingto an ever increasing demand on the shared resources.

With the rapid improvement in the cellular technologiesfrom 2G/3G to LTE/5G, the network latency in round-trip timehas decreased significantly in the past decade, from a few 100milliseconds to 10s of milliseconds [1], [2]. Thus, beyondthe 2G/3G cellular technologies, and with advancement inthe browser software protocol stack, e.g., SPDY [3], theperformance bottleneck of mobile Apps shift from networkperformance to the performance of mobile processors. And,as the technology and system management techniques forhandheld display [4]–[9] and communication fabrics [10]offer higher performance while consuming less power, theperformance and energy efficiency of processors on mobileSoCs become more critical to the overall smartphone energyefficiency [1], [11], [12].

The advent of new execution paradigms on today’s smart-phones also have a significant impact on their performanceand energy efficiency. Starting from Android 4.2.2 and iOS

∓The first and the second authors contributed equally to this work.

9, mobile operating systems (OS) support multiprogrammingfeatures, such as screen sharing between multiple Apps.For instance, when users utilize a social networking App,such as Facebook, in the foreground, image recognition al-gorithms could be running at the background as part ofthe Facebook App, while a video conferencing App can berunning simultaneously. Such execution scenarios give risesto increased contention in the memory subsystem which,if not properly managed, results in significant interferenceand therefore performance degradation. This performance losscan lead to quality of service (QoS) violations, which isparticularly problematic for real-time, interactive applications,as these directly manifest as lower user satisfaction. For certainapplications, some amount of performance degradation can betolerated without sacrificing user satisfaction; however, thereare many applications for which missing an absolute QoStarget is intolerable.

The web browser is a key smartphone application whoseperformance has a direct impact on user satisfaction. Asthe computation complexity and memory requirement of webpages continue to increase, the need to improve web browserperformance and to ensure user satisfaction has come to thefore-front. While a number of prior works [13]–[18] haveproposed solutions to address this need, they have all beenlimited to scenarios where the web browsers are running inisolation. However, these solutions could lead to sub-optimalperformance and user satisfaction as realistic workloads anduser scenarios generally consist of background tasks and otherco-scheduled applications.

We observe that web browsers can be highly susceptible tothe impact of interference due to multiprogramming execution.To illustrate this, we conduct an experiment to study the impactof interference on the web browser QoS (web page load time).Figure 1 shows the real system measurements1 of the web pageload time for Reddit at different frequencies, when it is co-scheduled with applications with different memory intensities.The vertical bars and dots at each processor frequency showthe variations in load time depending on the memory intensityof the interfering application. The horizontal dotted linesrepresent a 2-, 3-, or 4-second deadline, corresponding to

1The data is collected with a Google Nexus 5 smartphone by renderingReddit concurrently with the other interfering applications in a multipro-grammed manner.

This is a pre-print, author's version of the paper to appear in the IEEE International Symposium on Performance Analysis of Systems and Software, 2018.

0

1

2

3

4

5

6

0.7 0.8 0.9 1.1 1.5 1.7 1.9 2.2Web

PageLoad

Tim

e(secon

ds)

CoreFrequency(GHz)

Deadline

Fig. 1. Impact of memory interference on web page load time,at different frequencies for Reddit. The High-Low bars and thedots indicate the range of web page load times experienced ateach frequency under the presence of different degrees of memoryinterference from co-scheduled processes.

different levels of expected QoS or user satisfaction. Theresults indicate significant variation in the load times subjectto the degree of memory interference. This variation owing tothe state of the co-scheduled processes can result in possibleviolation of QoS requirements. For example, with a 3-secondload time deadline, the frequency setting of 0.9 GHz allowsthe web page to meet the QoS target when the memoryinterference is low. However, the web page would miss itsdeadline with greater interference. This demonstrates the needfor an effective approach to maximizes smartphone energyefficiency while guaranteeing the performance QoS of user-facing, web browsing, in the presence of interference, is met.

To tackle the optimization problem, this paper proposes touse the dynamic voltage and frequency scaling (DVFS) fea-ture to simultaneously manage the contention in the memorysubsystem and the smartphone energy efficiency. We use amodel-based approach to predict the web browser performanceand the smartphone energy efficiency in the presence ofmemory interference. The paper designs a Dynamic qualityOf service, memoRy interference-Aware frequency governor,named DORA, to make online frequency setting predictionsusing statically-trained performance and, dynamic and leakagepower models. The models utilize web page characteristics,dynamically-varying architecture and system conditions tocapture web page complexity and the impact of memoryinterference due to co-scheduled applications. DORA exploresthe web page load time and smartphone power consumption atdifferent frequency settings and selects the setting that allowsweb browsing to meet a specified deadline while simultane-ously maximizing the energy efficiency of the smartphone.

DORA is implemented and evaluated on a realsmartphone—a Google Nexus 5. The evaluation results showthat DORA’s prediction models are highly accurate. DORAis able to predict performance and power consumption in thepresence of memory interference with an accuracy of 97.5%and 96% respectively. Furthermore, direct measurements ona real platform show that DORA can effectively select thefrequency setting so that the web browser meets the specifiedload time deadline 82% of the time, while maximizing thetotal smartphone energy efficiency. For the other 18% of the

workloads, the web browser cannot meet the deadline evenwhen running at the highest possible frequency setting. Forthese workloads, DORA performs the same as the baselineinteractive governor. Across all workloads, DORAincreases the overall smartphone energy efficiency by asmuch as 35% and by an average of 16% compared to theexisting interactive governor.

The key contributions of this work are as follows:• This work offers an effective system architecture solution

to provide performance guarantee for mobile web brows-ing in the presence of interference, by using the DVFSfeature to simultaneously modulate memory interferenceand to maximize overall smartphone energy efficiency.

• The key parameters for performance and energy effi-ciency prediction concerning web browsers subject tomemory interference are identified.

• The proposed frequency controller, DORA, is imple-mented as a light-weight user space frequency governorwithin the Android OS and is evaluated on a real systemplatform. The insights and the DORA frequency gover-nor proposed in this paper are also applicable to othersmartphone platforms with re-parametrization.

• The effect of memory interference on smartphone energyefficiency and QoS has not received much attention. Wehope the insights and results presented in this papercan advance the state-of-the-art understanding and inspireadditional innovative solutions.

II. BACKGROUND AND MOTIVATION

In this section, we discuss important characteristics of webpages which determine the page load time (Section II-A) andshow that the performance degradation and energy consump-tion of web browsing can be significantly increased by co-scheduled workloads (Section II-B), leading to deadline vio-lation. Our characterization results in Section II-C demonstratethat an optimal energy-efficient operating mode exists whichmaximizes device energy efficiency while meeting the QoSrequirement of web browsing.

A. Web Page Execution Flow

The execution flow of a web browser can be abstractedinto two components: networking and rendering. The net-working component provides the necessary communicationand security requirements to fetch a web page’s content fromthe internet while the rendering engine evaluates the fetchedcontent and provides a viewable representation of the web pagefor the user. Since the networking component’s efficiency isdependent on network latencies, and thus out of the systemarchitect’s control, we focus on studying the performance ofthe rendering engine in this work.

The rendering engine parses a web page’s HTML document.An HTML page provides the blueprint of a web page byspecifying two important components — tags and attributesof the web page. The tags are used by the rendering engineto determine outline of the various blocks of a web page.The attributes are associated with the tags and describe the

1.5

2

2.5

3

3.5

4

Low Medium High

Load

Tim

e(Secon

d)

InterferingApplica7onMemoryIntensity

AliExpress Hao123ESPN Imgur

deadline

05

101520253035

Addi7o

nalEne

rgyCo

stCom

pared

toBrowserR

unningAlone

(%)

Low High(a) (b)

Fig. 2. (a) The measured load time of web pages (y-axis) increaseswith increasing memory intensity of the co-scheduled application (x-axis). (b) The measured energy consumption (y-axis) increases withincreasing memory intensity of the co-scheduled application (bluediagonal vs. black bars).

characteristics of the blocks. These tags and attributes are usedto create a hierarchical structure called the DOM tree, whichdefines the rendering order of the different blocks of a webpage. The DOM tree, with the CSS attributes (which determinethe visual properties and style information), completes therender tree. This render tree goes through a layout and a finalpaint stage to complete the load process.

Prior studies [18], [19] have shown that web page loadtime is a function of the complexity of a web page and isdominated by important web page features, such as the numberof tags, attributes, and the amount of meta data utilized by theweb page. Since these properties of web pages are availablebefore a page is rendered, the web page load time can bepre-computed fairly accurately. However, existing approachescannot accurately estimate the web page load time in thepresence of other system dynamics.

B. Impact of Memory Interference on Web Browsing

To quantify the degree of performance degradation and theadditional energy cost introduced by co-scheduled applicationsonto web browsing, we design experiments to quantitativelyevaluate the performance and energy impact.Increased web page load time: Figure 2(a) shows themeasured web page load times for four common web pageswhen co-scheduled with an interfering application with vary-ing memory intensities2. The x-axis represents the degree ofmemory intensities of the co-scheduled application (catego-rized as low, medium and high), and the y-axis represents theweb page load time.

Depending on the web page complexity and the co-scheduled application’s memory intensity, some web pagesmeet the hypothetical deadline of 3 seconds while othersdo not3. For example, ESPN was able to meet the deadline

2Experimental methodology including memory intensity classification isdescribed in detail in Section IV. The web page results shown here areobtained at the 2.2GHz processor frequency setting.

3The 3-second deadline is not an absolute performance target and varieswith user responsiveness, but the notion of the performance deadline holds. Weuse the 3-second mark here based a recent user survey [20] as the performancetarget for web browsing and evaluate the performance of the proposed designsubject to different performance targets as well (Section V-G).

0

2

4

6MSN

0.1

0.2

0.3

EnergyEfficien

cy

(PPW

)

CoreFrequency(GHz)

0

2

4

6

LoadTim

e(Secon

ds)

ESPN

0.1

0.2

0.3

CoreFrequency(GHz)

deadline

EnergyEfficien

cy

(PPW

) x x

fE fDfD>fE

CoreFrequency

xx

fEfD

fD<fE

CoreFrequency

deadlinedeadline

foptfopt

fDfE

fE fD

x

xx

x foptfopt

+28%

+17%

Fig. 3. The most energy-efficient frequency setting (fopt) switches be-tween the deadline-meeting frequency setting (fD) and the deadline-oblivious, optimal energy efficiency frequency setting (fE). An intel-ligent memory interference aware frequency scheduler could provide17% and 28% device energy efficiency improvement for ESPN andMSN, respectively.

regardless of the degree of interference, while others, suchas Ali Express, were not able to meet the deadline atany memory intensity. For web pages such as Hao123 andImgur, as the memory intensity of co-run processes increases,the web page load time increases, which manifests as QoSdeadline violations.Increased device energy consumption: In addition to theincrease in the web page load time, when web browsing isco-run with other workloads, additional energy overhead isincurred. Figure 2(b) shows measured values of the additionalenergy cost when a web browser and other applications runconcurrently versus when they run separately. That is, ifEB and EO denote the energy consumption due to the webbrowser and the application running separately, then the energyconsumption in the co-run case is EB +EO +E∆, where E∆

is the additional energy due to running them simultaneously.Part of the energy overhead E∆ is due to the longer web pageload time while the rest is due to the additional movementof data that could be cached in the memory hierarchy but isevicted early due to interference. In Figure 2(b), the x-axisrepresents four different webpages while the y-axis representsthe percentage increase (E∆/(EB + EO + E∆)) in energyconsumption. The bars represent the intensity of the co-runapplication. This additional energy cost is high, correspondingto as much as 29% increase in energy consumption.

C. Optimal Operating Mode for Browsing with Interference

Modern processors are often equipped with dynamic voltageand frequency scaling (DVFS) to allow processors to operateat different performance and power regions. The availabilityof the DVFS control knob not only enables designs that

predict and operate processors at an optimal energy-efficientsetting [21]–[31] but also allows designs to control the degreeof memory interference experienced in a multiprogrammedsetting [32]–[35].

We measure the web page load time (performance) and eval-uated the performance-per-watt (PPW) which represents theenergy efficiency, for two workload combinations with varyingprocessor frequency settings. Figure 3 shows the performance(web page load time) and the corresponding PPW of the webpages co-run with an interfering application from 729MHzto the maximum 2.2GHz settings. There is a frequency fEthat maximizes the PPW. This is the unconstrained (i.e., withrespect to a deadline) frequency setting that will result in themaximized battery lifetime. Now suppose that a deadline isimposed, and let the minimum frequency setting which ensuresthe web page meets the deadline, be fD (unknown). Then theoptimal frequency setting fopt is given by

fopt =

{fE fD ≤ fEfD fD > fE

(1)

From Figure 3 we can see that for a web page like ESPN,the most energy efficient frequency setting fE would resultin a large violation of the target load time. Therefore, in thiscase, fopt should be equal to fD. In contrast, for a web pagelike MSN, fD that allows the web page to meet its QoS targetwould result in a significant energy cost. fE , on the other hand,results in the optimal energy efficiency while allowing the webpage to meet its QoS target. Scheduling the web page loadprocess at the highest possible frequency is certainly an optionthat guarantees browsing QoS, but it results in a drasticallylower energy efficiency, leading to 17% and 28% lower PPWthan what the optimal PPW achieves with fopt for ESPN andMSN, respectively.

Furthermore, as we show in Section V-E, this optimal fre-quency setting could vary depending on the intensity of co-runapplications. These experiments demonstrate the importanceof an optimal smartphone frequency controller that can makedynamic predictions on the optimal frequency setting fopt byconsidering the memory interference, energy efficiency, andQoS targets simultaneously.Summary: The real-system results presented so far havedemonstrated processor DVFS as an effective control knobfor managing the memory interference allowing web pagesto meet its performance QoS deadline. It can also be usedto significantly improve smartphone energy efficiency. Withthese insights, this paper aims to design an intelligent dynamicfrequency prediction algorithm that, in the presence of back-ground processes and co-scheduled workloads, predicts thefopt frequency setting at runtime to maximize the smartphoneenergy efficiency while enabling QoS-aware, satisfactory userweb browsing experience.

III. A NEW FREQUENCY GOVERNOR TO OPTIMIZEMOBILE WEB BROWSING WITH INTERFERENCE

We propose a new frequency governor to dynamically setthe PPW-optimal frequency subject to satisfying a given dead-line for smartphones running a web browser. The algorithm is

Webpage ComplexityCore Utilization

Core TemperatureL2 MPKI

Optimal Frequency, fopt

frequency

Web

page

Lo

ad T

ime

T Loa

d

Pow

er

Cons

umpt

ion

P dyn

+ P l

kg

PPW

En

ergy

Effic

ienc

y

deadline

frequency range

Xfopt

DORA Frequency Decision

MPKI of an example co-scheduled process

Time

AXI interface

CPU1 CPU2 CPU3CPU0

Main Memory

Shared L2 Cache

Mod

ified

CPU

Fr

eque

ncy

Gove

rnor

Fig. 4. DORA overview. DORA periodically monitors the coreutilization, core temperature and L2 cache MPKI to select the mostenergy efficient frequency setting that allows the web page to meetits QoS target load time.

referred to as DORA which stands for Dynamic quality Ofservice, memoRy interference-Aware frequency governor. Theobjective of DORA is to provide a high-quality web browsingexperience for users while maximizing battery lifetime.

The pdeudo-code of DORA implementation is presented inAlgorithm 1. The first task is to accurately predict the rangeof core frequencies, fi...n, that ensure the web pages completerendering within the given deadline (lines 4-6 in Algorithm 1).The second task is to accurately identify a core frequency(fopt) within fi...n such that the energy efficiency of thesmartphone device is maximized (lines 7-15 in Algorithm 1).

The main components of DORA include models to pre-dict the web browser performance and to predict the powerconsumption of the smartphone. The performance (web pageload time) model includes the complexity of web pages, thedegree of memory interference introduced by backgroundprocesses and co-scheduled applications, and the core oper-ating frequencies (Section III-A). The power model accountsfor the dynamic power as a function of the core frequencyand the leakage power as a function of core temperature(Section III-B). These models are used to estimate the de-vice energy efficiency, performance-per-watt (PPW), at eachfrequency. Then, the PPW at each setting is used to set thefrequency, fopt that maximizes the PPW.

To take into account the dynamic nature of interferencefrom co-scheduled applications, DORA monitors the intensityof memory interference and determines fopt for the currenttime period and adjusts the core operating frequency to foptperiodically. Figure 4 shows this iterative process, as DORAexecutes in the background.

Algorithm 1 DORA pseudo-code.1: function DORA(QoS Target, Page Complexity,

Core Utilization, Core Temperature, L2 MPKI) . select anenergy-efficient, QoS-aware frequency setting

2: max PPW ← 03: optimal freq ← 04: for F in AllFrequencies do5: pred time← PredictLoadT ime(F )6: if pred time <= QoS target then . QoS target is

met at this frequency7: pred power ← PredictTotalPower(F )8: pred PPW ← 1

pred time∗pred power9: if pred PPW > max PPW then

10: max PPW ← pred PPW11: optimal freq ← F12: end if13: end if14: end for15: SetCoreFrequency(optimal freq)16: end function

A. Web Page Load Time Prediction

We construct the web page load time and device powermodels using regression. A regression model is a hypothesizedparametric relationship between the response or dependentvariable, y, and a set of N independent variables X1, X2, ...,XN . The unknown model parameters are the coefficients tothe polynomial combinations of Xi, which are estimated byminimizing the mean-square error between a set of observedvalues and model predicted values. Similar to Zhu et al. [18],we observe that five important parameters of web pages bestrepresent web page complexity and hence the impact on webpage load time—the number of DOM Tree nodes, classand href attributes, a and div tags. Therefore, we includethese parameters in the web page load time prediction model.

Next, we identify the runtime architectural parameters thatinfluence the web page load time. In order to account formemory interference on the performance of web browsing, thedegree of interference in the shared memory is considered, i.e.,the access rate in the shared L2 cache and DRAM.

Finally, the core and the memory bus frequencies also havea pronounced impact on the web page load time. Specifically,we note that on a typical SoC, a set of core frequencies mapto a particular memory bus frequency. Therefore, we buildpiece-wise models for each set of core frequencies that sharea single memory bus frequency. With the above insights, weconstruct a set of independent variables which demonstrate astrong correlation with web page load time (Table I).

We evaluate three typical response surfaces: the linear,quadratic, and interaction models (Equations (2)–(4)).

L = c0 +

N∑i=1

ciXi (2)

L = c0 +

N∑i=1

ciXi +∑

i,j∈(1...N)

ci,jXiXj (3)

Xi EventsX1 Number of DOM tree nodesX2 Number of class attributesX3 Number of href attributesX4 Number of “a” tagsX5 Number of “div” tagsX6 Shared L2 cache MPKIX7 Core frequencyX8 Memory bus frequencyX9 Core utilization of co-scheduled task

TABLE I: List of Independent Variables

L = c0 +

N∑i=1

ciXi +∑

i,j∈(1...N),i6=j

ci,jXiXj (4)

where L is the web page load time and Xi represents theindependent variables, and the ci’s and ci,j’s are the coefficientparameters to be determined. We provide additional detailsabout the web page load time model in Section IV.

B. Dynamic and Leakage Power Prediction

DORA’s power model includes the dynamic power Pdynand the leakage power Plkg . Pdyn depends on the compute andmemory resource utilization of the cores and the correspondingprocessor voltage and frequency settings, and Plkg depends onthe operating voltage and temperature.Dynamic Power Model: Similar to the timing model, webpage complexity is a good predictor for the dynamic powerconsumption. In addition, the degree of memory interferenceis also a key factor that significantly contributes to thesmartphone power consumption. As interference at the L2cache increases, additional data movement is required to fetchdata into the L2 cache upon demand. Thus, the parameterof L2 cache MPKI is included in the dynamic power model.To consider the dynamic power contribution from the coresrunning background processes or co-scheduled applications,core utilization is used. Core utilization has been shown tohave a linear relationship with Pdyn for general compute-bound workloads. Finally, the core operating frequency hasa direct effect on Pdyn. We again utilize the same threeresponse surface models as in Equations (2)–(4); however, thedependent variable is now the dynamic power consumption.Leakage Power Model: Due to the lack of cooling elementsin most smartphones, high thermal levels often contributeto a significant portion of the total device power budget inthe form of leakage power. Therefore, when predicting thetotal smartphone power consumption, we must include theeffect of leakage power consumption such that the energy-efficient setting prediction for fopt considers this important,dynamic system runtime condition. We utilize the empiricalmodel of [36] to capture the non-linear power dependenceon temperature and voltage as exhibited by CMOS-basedtechnologies:

Plkg = k1vT2e

αv+βT + k2e

(γv+δ) (5)

Google Nexus5Operating System Android KitKat 4.4Chipset MSM8974 Snapdragon 800Application Processor Quad-core KraitISA ARMv7L1 I/D Caches Private 16KB per coreL2 Unified Cache Shared 2MBGPU Adreno 330DSP Hexagon DSPMemory LPDDR3 2GB

TABLE II: Device Specification

where k1, k2, α, β, γ, and δ are parameters that depend oncircuit topology, and v and T are the operational voltage andtemperature of the SoC respectively. The parameters of theleakage model are determined using non-linear numerical so-lutions and mean square error minimization. For SoC chipsetsthat implement multiple voltage and temperature domains, e.g.,per-core thermal sensors, we construct a leakage model foreach computational unit to increase the model accuracy.

IV. METHODOLOGY

A. Real Device Measurement Infrastructure

We perform all experiments on a Google Nexus 5 smart-phone which has a Qualcomm MSM8974 Snapdragon 800chipset [37]. The MSM8974 includes four Krait cores, eachof which is equipped with private L1 instruction and datacaches (16KB each). All four cores share a 2MB L2 last-levelcache. The SoC consists of a 2GB Low Power DDR (LPDDR)memory which is shared between the application processorand the various accelerators. The chipset has 14 different fre-quency settings available, ranging from 300MHz to 2265MHz.Table II summarizes the device specifications. The deviceruns the rooted Android OS. We also configure the kernelto enable performance profiling using perf [38]. We used aNational Instruments Data Acquisition Unit (DAQ) to measuresmartphone power consumption. Note, our smartphone powermeasurement and energy efficiency results include the powerconsumption of the entire smartphone (the display, the appli-cation processors, SSD, and all other active components onthe device.) Thus, the energy efficiency improvement resultsdirectly translate to battery life improvement.

DORA is compared with the existing Android fre-quency governors, interactive and performance4. Theperformance governor always operates the cores in thehighest available frequency of 2.2GHz. The interactivegovernor on the other hand chooses a frequency setting basedon the processor utilization. We use interactive as thebaseline for our studies as it is the default option on mostsmartphones to-date.

B. Workload Characteristics

We use the 18 most visited web pages reported on “Alexatop 500 websites” [39] that load completely on an Android

4We do not consider powersave as an effective governor as it results inunreasonably long load times (7 - 26 seconds) for all workloads while alsobeing extremely energy inefficient.

Intensity LoadTime

Web Pages

Low < 2 Sec Amazon, Twitter, Youtube, 360,MSN, BBC, CNN, Reddit, Alibaba,eBay, Alipay, Instagram

High > 2 Sec IMDB, ESPN, Hao123, Imgur,Aliexpress, Firefox

Intensity L2 MPKI Co-run ApplicationsLow < 1 Image processing (srad, heart-wall),

clustering analysis (kmeans), tem-perature management (hot-spot)

Medium 1 - 7 Image processing (srad2), tree andgraph traversal (bfs, b+tree)

High > 7 Sensor data analysis (back-propagation), bioinformatics(needleman-wunsch)

TABLE III: Web Page and Co-run Application Classification

smartphone. These pages represent a wide variety of domainssuch as online shopping, sports, entertainment, news, andsocial media. They also vary widely in complexity resultingin load times in the range of hundred of milliseconds to 4seconds, when running alone. The Firefox mobile web browseris used to load the pages. The source code of web pages isinstrumented to enable the load time measurement. All webpages are stored in memory, eliminating any non-deterministicnetwork fluctuation. Table III shows the web pages used in thispaper and the classification. Similar experimental methodologyis used in recent works [17], [18], [24], [34], [40]–[43].

We use a diverse set of workloads to serve as the co-runapplications. The algorithms behind these workloads form thebasic building blocks of current and future smartphone work-loads, representing sensor data analysis, image processing,thermal prediction and management, video games, and medicalapplications [44]. We classify the algorithms based on theirmemory intensity (Table III). All the co-run applications arecross-compiled using the ARM-Android NDK toolchains andare statically assigned to a specific core. The applications arepushed to the device and launched using the Android debugbridge (adb) terminal.

We construct workloads to mimic multiprogrammed execu-tion scenarios. Specifically, the Firefox browser is executed ontwo cores5 while a co-run application is executed on the thirdcore of the application processor6. We create workloads bycombining a web page with an application from each memoryintensity category shown in Table III. This results in a totalof 54 workload combinations, i.e., 18 web pages, each co-scheduled with an application from the low, medium, andhigh intensity categories. 14 of the 18 web pages have beenused to construct the models and thus the multiprogrammed

5A recent characterization study has shown that the average thread-levelparallelism for mobile Apps hovers around 2 [45]. Performance scaling withrespect to the number of cores plateaus at 2 cores. Furthermore, as anincreasing number of cores are integrated into SoCs from one generationto the next, we expect core-level computation resource contention is lessof a performance problem as compared to the highly shared and contendedmemory subsystem.

6The fourth core was switched off for all our experiments.

0

0.2

0.4

0.6

0.8

1

0 0.05 0.1

Percen

tageofW

ebPages

Error(a)WebPageLoadTimeModel

0

0.2

0.4

0.6

0.8

1

0 0.1 0.2Percen

tageofW

ebPages

Error(b)WebPagePowerModel

Fig. 5. The cumulative distribution of prediction errors for perfor-mance and power models.

workloads formed with the 14 web pages are considered as thetraining set, resulting in 42 Webpage-Inclusive combinations7.The remaining 12 workloads form the test set and are referredto as the set of Webpage-Neutral workloads.

C. Model Parameters and DORA Configuration

Over 300 measurements of power and web page loadtimes are taken by executing multiple workload combinationsat different frequency settings, using the setup described inSection IV-A. The observations are used to determine thecoefficients of the power and performance models using meansquare error minimization.

For DORA’s decision making granularity, we evaluate threedecision intervals of 50ms, 100ms, and 250ms. We observethat while 250ms is too slow to capture web page phases, 50msand 100ms decision intervals perform similarly. Therefore, wechoose the less intrusive 100ms decision interval for DORA.

V. EVALUATION RESULTS AND ANALYSIS

A. Performance and Power Model Evaluation

We evaluate three regression models — simple linear regres-sion, linear regression with cross product terms (interaction)and quadratic, for the performance and power models.Performance Model Accuracy: We observe that the inter-action and quadratic models achieve the highest accuracy forweb page load time prediction. Due to relative simplicity ofthe interaction model, we choose this to model the web pageload time. The average error rate for this web page load timemodel is 2.5%. Figure 5(a) shows the cumulative distributionof prediction errors. Each (x, y) shows the fraction of webpages (y) which have errors lower than or equal to (x). About87.5% of the web pages have less than 5% error with amaximum error of 10%.Power Model Accuracy: In case of power consumptionestimation, we observe that all three models (linear, interactionand quadratic) achieve a similar prediction accuracy. Since alinear model is simpler than the other two, we adopt it to

7Webpage-Inclusive workloads are constructed by co-scheduling one train-ing set web page with one distinct interfering application, since the mostcommonly-visited web pages are relatively stable while co-scheduled appli-cations or background processes vary more frequently.

0.2

0.22

0.24

0.26

0.28

0.3

0.7 0.8 0.9 1.2 1.5 1.7 1.9 2.2

PP

W

Core Frequency (GHz)

Δt = +20.3% ΔP = -13.3%

Δt = -20.8% ΔP = +34.8%

fopt fopt -1 fopt +1

Fig. 6. PPW of different frequencies for Youtube when it is co-scheduled with a high memory intensity application.

predict the power consumption of the web page load processin the presence of interference. The average error rate forthe power model is 4%. Figure 5(b) shows the cumulativedistribution of prediction errors in the power model. For 75%of web pages, the model gives less than 5% error, and for 90%of web pages, it gives less than 10% error.

B. Sensitivity of fopt to Model Errors

The value of the optimal frequency, fopt, is not highlysensitive to model errors. This is because the available pro-cessor frequencies are discretized (typically in steps of afew hundred MHz). Therefore, as long as the error rate ofthe estimated PPW for fopt is lower than the PPW deltabetween fopt and its adjacent frequency settings, fopt−1 andfopt+1, DORA’s prediction for fopt would still be correct.Figure 6 illustrates this with an example with the PPWvariation for loading a Youtube web page, co-run with amemory-intensive application. For this workload, the highestenergy efficiency is attained at fopt = 1.2GHz. As noted inFigure 6, the neighboring frequency fopt−1 would result in aload time difference of ∆t = +20.3%, and power differenceof ∆P = −13.3%, compared to those at fopt. Similarly theneighboring frequency fopt+1 would result in ∆t = −20.8%and ∆P = +34.8%. Let te and Pe be the percent error in theperformance and power models. Then the PPW in the presenceof error is

PPW =1

P ∗ t ∗ (1 + Pe) ∗ (1 + te)(6)

where P , t are actual power and web page load time values.DORA would select the optimal energy-efficient frequency

setting (fopt = 1.2GHz) as long as the errors in theperformance and power models are small enough such thatDORA does not mistakenly choose either of the neighboringfrequencies as the optimal setting. Analysis of this workloadshows an error of +0.26% in power prediction and 1.32%error in load time prediction which can be easily tolerated bythe discretization of frequency settings. This means DORA’sselection of fopt is accurate in the presence of small errorsin power and load time prediction. This holds true for mostother workloads as well.

0.951

1.051.1

1.151.2

1.25

Perfo

rman

ce DL EE

DORA

Perfo

rman

ce DL EE

DORA

Perfo

rman

ce DL EE

DORA

Webpage Inclusive Webpage Neutral All

Ener

gy Ef

ficie

ncy

(PPW

)No

rmal

ized

to in

tera

ctiv

e(a)

(b)

Deadline

0

0.25

0.5

0.75

1

0 1 2 3 4

Frac

tion

of W

ebpa

ges

Load Time (Seconds)

Interactive Performance DL EE DORA

Fig. 7. Average energy efficiency and web page load time comparisonof DORA with other governors across all evaluated workloads.

C. Performance and Energy Efficiency Trends

Figures 7(a) and 7(b) show the average energy efficiencyimprovement and the distribution of web page load timeachieved by DORA and other governors. On average, DORAimproves the smartphone energy efficiency by 16% comparedto the baseline interactive governor (18% and 10% forthe Webpage-Inclusive and the Webpage-Neutral workloads,respectively). Frequency settings based on the load time pre-dictions meet the QoS target of the 3-second deadline when-ever possible with the available frequency settings. Althoughperformance and interactive generally achieve fasterweb page load time than DORA (Figure 7(b)), this comesat the cost of lower energy efficiency (Figure 7(a)). Further-more, DORA performs as well as a static offline optimalconfiguration, Offlineopt,8 and matches the energy efficiencyimprovement brought by Offlineopt.

DORA is also compared with two hypothetical gover-nors — Deadline (DL) and Energy Efficient (EE).Deadline is a governor which ensures that web pageload time target is met while disregarding energy efficiency,and Energy Efficient is a governor which ensures thatenergy efficiency is maximized while disregarding any QoSconstraint. On average, EE results in a 19% improvementin energy efficiency when compared to interactive. Al-

8Offlineopt represents the single frequency setting that maximizes theenergy efficiency achieved while loading the web page within the 3-seconddeadline. We obtained the PPW results for Offlineopt by enumeratingall possible frequency settings for ten randomly chosen workloads from theworkloads constructed in this paper because the time taken to generate thePPW results for all possible frequency settings for all available workloads isprohibitively high.

0.8

0.9

1

1.1

1.2

1.3

1.4

1 6 11 16 21 26 31 36 41 46 51

Ener

gy Ef

ficie

ncy

(PPW

) No

rmal

ized

to Interactive

Workload Number

interactive performance fD fE DORADL EE

Fig. 8. Energy efficiency comparison of DORA with other governorsfor all evaluated workloads. Each point on the x-axis represents adifferent workload whose energy efficiency improvement for differentgovernors is plotted along the y-axis. The workloads are sorted in theorder of energy efficiency improvement achieved by DORA.

though EE results in a greater energy efficiency than DORA,it also results in 21% of the workloads missing the QoStargets by a large margin, as seen in Figure 7(b). EE couldresult in web page load time as large as 6 seconds, which aresimply unacceptable from a user satisfaction standpoint. Usingthe DL governor, the 3-second load time deadline is satisfiedas long as it is feasible for a given workload combination.While this allows for more web pages to meet the QoS targets(compared to EE), it results in sub-optimal energy efficiency,as seen in Figure 7(a). The sub-optimal energy efficiency ofDL and large QoS violations of EE highlight the importance ofsimultaneously considering both web page load time deadlineand dynamic and leakage power consumption.

Importance of the Consideration of Memory Interfer-ence and Other Dynamic Conditions: The fopt settingdepends on the QoS deadline specified by users or by theOS (Section V-G), the specific web page being loaded, andthe dynamic system conditions. In particular, we observe thatfopt changes significantly (often by more than one frequencysetting away) when considering the varying degree of memoryintensity of co-scheduled and background processes and theoperating temperature of the smartphone (Section V-F). Ifmemory interference is not considered in the performanceprediction, a sub-optimal fopt is reached, leading to deadlinemissing for more than 64% of the multitasking workloads andenergy efficiency degradation.

D. The Adaptive Nature of DORA

DORA is designed to dynamically predict the web page loadtime and energy efficiency in order to determine fopt given inEquation 1. It does so by estimating fE , the most energy-efficient frequency setting, and fD, the lowest frequencythat meets the deadline. Figure 8 shows the improvementin energy efficiency achieved by the different governors,for all evaluated workloads. For workloads 20 and beyond,DORA closely follows the energy efficiency trend of the EE

governor. Thus, both DORA and EE result in an averageimprovement in energy efficiency of 24% when compared tothe interactive governor. It is important to note that forthese workloads, DORA’s predictions result in meeting the3-second load time deadline while achieving the maximumpossible energy efficiency.

For workloads 1 through 19, the EE governor violatesthe 3-second web page load time deadline. We denote theseworkloads as “fE < fD”. In such scenarios, since EE ignoresthe QoS, it continues to operate in lower but energy efficientfrequency settings even when these settings do not meet theperformance target. EE results in large violations in QoS.

For workloads where EE is not able meet the web pageload time QoS target, DORA correctly identifies the frequencyrange that allows the web page to meet its QoS target,and then selects the most energy efficient frequency settingwithin that range. This frequency often coincides with thefrequency chosen by the DL governor for these workloads.Figure 8 shows that DORA follows DL closely for theseworkloads. Generally, DORA satisfies the 3-second deadline aslong as the deadline is met by the performance governor.For workloads where the web pages cannot load within the3-second deadline at the higher frequency setting, DORAprioritizes for QoS and chooses the highest frequency settingto ensure that the web pages are loaded as fast as possible.This resiliency to variations highlights the fact that an optimalfrequency governor must make its frequency decision consid-ering runtime factors, such as memory interference, in orderto guarantee the quality of web browsing in the most energyefficient way.

E. DORA with Different Web Page Complexity and MemoryInterference Intensity

The complexity of web pages vary and so does the in-tensity of co-scheduled applications. To explore the behaviorof DORA in such varied conditions, we take a closer lookat one low complexity web page (Amazon) and a highcomplexity web page (IMDB). Figure 9 shows the behavior ofDORA and the other governors for these two web pages whenthey are co-scheduled with low, medium and high intensityapplications. The improvement in the PPW compared to theinteractive governor is shown on the primary y-axis.The secondary y-axis shows the web page load time. Threegroups of plots are shown, corresponding to low, mediumand high memory intensities. The abscissa represents thefrequencies determined by the governors performance, DL,EE, and DORA. To avoid confusion, note that the vertical barsrepresent the PPW and the line graph represents the web pageload time.Frequency of operation: Due to the fact that the low com-plexity web pages are relatively simple to load, fD typicallyhovers around 0.7-0.9 GHz. For example, Amazon in Figure 9has its fD at 0.7 GHz. These low frequency points are oftennot the most energy efficient frequency settings. The fE pointfor Amazon occurs at 1.9 GHz when it is co-scheduled with alow intensity application. Therefore, the low complexity web

0

3

6

0.9

1

1.1

1.2

per

form

ance

fD (

1.9

)

fE (

0.9

)

DO

RA

(1

.9)

per

form

ance

fD (

1.9

)

fE (

0.9

)

DO

RA

(1

.9)

per

form

ance

fD (

2.2

)

fE (

1.7

)

DO

RA

(2

.2)

Low Medium High

IMDB

We

b P

age

Lo

ad T

ime

(S

eco

nd

s)

PP

W N

orm

aliz

ed

to

in

tera

ctiv

e

01234

0.91

1.11.21.3

per

form

ance

fD (

0.7

)

fE (

1.9

)

DO

RA

(1

.9)

per

form

ance

fD (

0.7

)

fE (

1.7

)

DO

RA

(1

.7)

per

form

ance

fD (

0.8

)

fE (

0.9

)

DO

RA

(0

.9)

Low Medium High

Amazon

We

b P

age

Lo

ad T

ime

(S

eco

nd

s)

PP

W N

orm

aliz

ed

to

in

tera

ctiv

e

PPW Improvement Load Time

EE

EE

f E

f E f E

EE

DL

DL

f D

f D

f D

DL

EE

DL

EE

DL

EE

DL

Fig. 9. Interaction of DORA with web page complexity and interfer-ing application intensity.

pages typically fall under the “fE > fD” category. For thiscategory of workloads, as discussed earlier, DORA’s behavioris similar to that of EE. DORA and EE both select the sameoperating frequencies for Amazon. As the complexity of theweb pages increases, fD shifts to higher frequency range.For instance, IMDB, a high intensity web page has its fDat 1.9 GHz or 2.2 GHz, depending on the intensity of theco-scheduled application. This makes these high complexityweb pages to typically fall under the “fE ≤ fD” categoryand DORA follows fD’s behavior. DORA and DL choose thesame frequency for IMDB.Energy efficiency improvement: DORA improves the smart-phone energy efficiency by as much as 27% for Amazon. Onthe other hand, for workloads in the situation where “fE ≤fD”, it is expected that fopt should be closer to the highestfrequency setting. This in turn means that the energy effi-ciency improvement achieved by DORA is more modest whencompared to the interactive governor. DORA achieves amodest 1%-10% energy efficiency improvement for IMDB, inorder to meet the web page load time deadline.Impact of memory interference: The presence of memoryinterference influences the behavior of the smartphone SoCin multiple ways. First, as memory interference changes, thefrequency of operation might change. For example, Figure 9shows that for Amazon, fD changes from 0.7 GHz to 0.8GHz as we move from a medium to a high intensity interferingapplication. We observe a similar behavior for IMDB as wellwhere fD changes from 1.9 GHz to 2.2 GHz. Next, asexpected, the web page load time is often degraded as memoryinterference increases. This is evident in the case of Amazon

0.9

0.95

1

1.05

1.1

1.15

1.2

Ene

rgy

Effi

cie

ncy

(P

PW

) n

orm

aliz

ed

to

inte

ract

ive

1

2

3

4

0.7

0.8

0.9

1.1

1.5

1.7

1.9

2.2

Po

we

r C

on

sum

pti

on

(W

) Core Frequency (GHz)

Low Ambient Temperature

Room Temperature

fopt

fopt

(a) (b)

Fig. 10. Impact of leakage power. (a) Energy efficiency compari-son for DORA and DORA without the leakage power estimation.(b) Power consumption at different frequencies under two ambientconditions — room temperature and low ambient temperature.

and IMDB. This web page load time increase could eventuallyresult in web pages missing the QoS targets.

F. Impact of Leakage Power

An important feature of DORA is its consideration oftemperature as it influences the selection of fopt. Prior work,such as [17], does not consider the leakage power componentwhen optimizing for energy efficiency. This is likely to lead toa sub-optimal frequency setting. In order to evaluate the impactof leakage power on energy efficiency, we compare DORAwith a configuration that does not take leakage power intoaccount, DORA_no_lkg. That is, DORA makes the frequencyselection decision using the dynamic power consumptioncomponent only. Figure 10(a) shows the energy efficiencyof DORA and DORA_no_lkg for the web page Amazonwhen it is co-scheduled with a medium memory intensityapplication. DORA achieves 10% higher energy efficiencycompared to the configuration that ignores the dynamic op-erating temperature of the smartphone.

We further explore the impact of temperature (and conse-quently leakage power) by evaluating the smartphone powerconsumption at different frequencies and fopt under roomtemperature and cooler ambient temperature. Ignoring theprocessor operating temperature when determining fopt leadsto an energy efficiency sub-optimal solution, with 10% lowerbattery lifetime (Figure 10(a)). Figure 10(b) shows that thereis a significant increase in the power consumption at higherfrequencies at room temperature, compared to a lower ambienttemperature condition. This increase in power consumptioncan be attributed to the additional leakage power due to theincrease in device temperature. We observe that the maximumdevice temperature increases from 58°C to 65°C when oper-ating at 1.9 GHz under the ambient room temperature. Thetemperature rise is significantly higher at higher frequenciesthan lower frequencies, making leakage power a significantcontributor to device power at high frequencies. This increasein temperature and consequently leakage power result in the

0.5

1

1.5

2

2.5

1 2 3 4 5 6 7 8 9 10

CoreFrequ

ency

(GHz

)

WebPageLoadTimeDeadline(seconds)

fOpt=fD

fOpt=fE

Fig. 11. DORA frequency selection for different deadlines.

optimal operating point, fopt, shifting from 1.9 to 1.7 GHzfor this workload. DORA is able to predict this significantadditional leakage power as shown in Equation 5 and identifyfopt accurately, leading to higher energy efficiency.

G. DORA with Varying Performance QoS Deadline

DORA is designed to predict web page load time andsmartphone power in order to operate a smartphone at themost energy efficient condition, fopt. We have already seenthat DORA either chooses fE , the most energy-efficient fre-quency setting, or fD, the lowest frequency that meets browserdeadline. To highlight the behavior of DORA for varyingdeadline, we look at fopt chosen by DORA when MSN isbeing loaded with a high memory intensity application. Note,the models used by DORA do not need to be re-parameterizedfor using a different QoS deadline. Figure 11 shows fopt fordifferent deadlines, from 1 to 10 seconds. For a demandingperformance deadline, e.g., 1 to 2 seconds, DORA choosesthe highest frequency in order to meet the QoS target. Whenthe performance target is relaxed to 3-second, fopt becomes1.7GHz, that is the most energy-efficient, deadline-meetingsetting. When the performance target is further relaxed, foptswitches from fD to fE , which is 1.19GHz for this workload.

H. Overhead

From the implementation standpoint, DORA includes threekey operations, namely, (1) periodically assessing hardwareperformance counters, (2) computing the optimal frequencypoint, fopt, and (3) switching the core frequency to fopt if thenewly computed fopt is different from the current setting.

DORA is a lightweight controller. Its time and poweroverhead coming from the first two tasks mentioned aboveis less than 1%. This is because the first two steps are non-intrusive to the web page load process and occur in thebackground. Although most prior work assume a relativelysmall overhead for the frequency scaling operation mentionedabove, we observe that this overhead is slightly higher thanthat of the first two tasks with a maximum of 3% of executiontime. DORA monitors the variation in the runtime systemperformance conditions and decides to change the frequencysetting only when the system performance conditions havechanged significantly enough to alter fopt. This results in theoverhead of DORA to be dependent on the number of timesfrequency is scaled during the web page load process. This

overhead is negligible for workloads which enjoy a relativelystable phase behavior. For other workloads where DORAscales the processor frequency often, the energy efficiencyimprovement brought by DORA is high; thus, the overheadassociated with the needed frequency scaling is considered tobe worthwhile. Overall, the performance and energy efficiencyresults presented in this paper include the overhead incurredand still show a significant 18% energy efficiency improve-ment over the interactive governor.

VI. RELATED WORK

The study of smartphone web browser performance has beenthe subject of many recent works as it is one of the mostwidely used applications on mobile devices. Web browser per-formance optimization can be achieved through enhancementsat multiple levels of the hardware-software stack. Many of theearly works on browser optimization focused on improvingbrowser specific tasks through software techniques such astask parallelization, browser rendering, and smarter browsercaching, such as [13], [14], [16]. Butkeiwicz et al. [19] relatedthe web page complexity with important web page primitivesand characterized the impact of web page complexities onperformance. Thiagarajan et al. [15] presented a detailedbreakdown of web page energy consumption based on thedifferent web page primitives and proposed a set of web pagedesign recommendations to minimize energy consumption.Similarly, Bui et al. [13] used web page primitives to designenergy efficient web browsers.

Prior works have also suggested that micro-architecturaltechniques such as branch predictors, advanced cache andprefetcher management techniques can improve browser per-formance and, therefore, reduce its energy consumption signif-icantly [40], [46]. Zhu et al. proposed hardware specializationsto improve the performance and energy efficiency of mobileweb browsing [47]. Another recent work by Fan et al. [48]demonstrated improved browser efficiency with asymmetricmultiprocessors sharing the cache. These software and micro-architecture level works are orthogonal to our proposed DORAwhich performs energy efficiency optimization in the systemlevel. Therefore, we expect the performance and energy effi-ciency gains from DORA to be additive.

Many other system level designs, aimed at improving theenergy efficiency and QoS of mobile browsers have beenreported. Lo et al. [42] considered the response time and thelimits of human perception together to find opportunities tothrottle frequencies while executing interactive applicationson an Odroid SoC board. While their design is QoS-aware,it is not necessarily energy optimal as we demonstrate withthe DL governor in this paper. Zhu et al. [18] developedmodels for web page load time and energy consumption todesign a deadline- and energy-aware governor but the effectof memory interference from background or co-scheduledprocesses is not considered. Another recent work by Gaudetteet al. [24] developed probabilistic models to account fornon-determinism in web page load time and demonstratedimproved QoS for the web browser.

Although many of above works optimize for QoS andenergy efficiency for mobile web browsing, none of themexplicitly consider the effect of memory interference. As wehave shown in this paper, memory interference plays a key roleand impacts both web page load time and energy efficiencysignificantly for modern smartphones. This is the first workthat designs an effective solution to provide performanceQoS guarantee for mobile web browsing in the presence ofbackground processes and other co-scheduled applications.

VII. CONCLUSION

We develop a framework capable of characterizing, profil-ing, and predicting the execution time and power consumptionof a given web page as the foreground application, subjectto memory interference from co-scheduled applications andsmartphone operating temperature. The proposed frameworkperforms offline training for the performance and powermodels. This training is conducted on the most viewed webpages co-scheduled with interfering applications commonlyseen on mobile devices. The models are proved to be suffi-ciently accurate in predicting web page load times and powerconsumption with 2.5% and 4.0% average error respectively.Once the models are parameterized, DORA performs runtime,energy efficiency optimal control of the application processorto ensure that web pages are loaded within a targeted QoSlevel, by modulating the processor core frequencies to simulta-neously control memory interference and performance/powerstates of the smartphone. This is done by reading the webpage characteristics along with the dynamically changingstates of the co-scheduled processes and the smartphone—the memory access intensity, core utilization, and temperature.The proposed design is implemented and evaluated on aGoogle Nexus 5 smartphone and proven to be capable ofproviding satisfactory performance for mobile web browsingdespite the interference from memory intensive co-scheduledapplications.

ACKNOWLEDGEMENTS

The authors would like to thank the anonymous reviewersfor their useful feedback. This work is supported in part bythe National Science Foundation under grants for the I/UCRCCenter for Embedded Systems #1361926, CCF #1525462, andCCF #1652132.

REFERENCES

[1] Y. Zhu, M. Halpern, and V. J. Reddi, “The role of the CPU in energy-efficient mobile web browsing,” IEEE Micro, vol. 35, no. 1, 2015.

[2] Z. Wang, F. X. Lin, L. Zhong, and M. Chishtie, “Why are web browsersslow on smartphones?” in Proceedings of the 12th Workshop on MobileComputing Systems and Applications, 2011.

[3] “SPDY: an experimental protocol for a faster web,” https://www.chromium.org/spdy/spdy-whitepaper, Chromium.

[4] M. Dong and L. Zhong, “Chameleon: A color-adaptive web browser formobile OLED displays,” in International Conference on Mobile Systems,Applications, and Services, 2011, pp. 85–98.

[5] J. Flinn and M. Satyanarayanan, “Energy-aware adaptation for mobileapplications,” in Proceedings of the Symposium on Operating SystemsPrinciples, 1999.

[6] V. G. Moshnyaga and E. Morikawa, “LCD display energy reductionby user monitoring,” in International Conference on Computer Design,2005.

[7] M. Schuchhardt, S. Jha, R. Ayoub, M. Kishinevsky, and G. Memik,“CAPED: Context-aware personalized display brightness for mobiledevices,” in International Conference on Compilers, Architecture andSynthesis for Embedded Systems, 2014, pp. 19:1–19:10.

[8] D. Shin, Y. Kim, N. Chang, and M. Pedram, “Dynamic voltage scalingof OLED displays,” in Design Automation Conference, 2011.

[9] K. Yan, X. Zhang, J. Tan, and X. Fu, “Redefining QoS and customizingthe power management policy to satisfy individual mobile users,” inInternational Symposium on Microarchitecture, 2016.

[10] A. Rahmati and L. Zhong, “Context-for-wireless: Context-sensitiveenergy-efficient wireless data transfer,” in International Conference onMobile Systems, Applications and Services, 2007, pp. 165–178.

[11] M. Halpern, Y. Zhu, and V. J. Reddi, “Mobile CPU’s rise to power:Quantifying the impact of generational mobile CPU design trends onperformance, energy, and user satisfaction,” in International Symposiumon High Performance Computer Architecture, 2016, pp. 64–76.

[12] C. G. Jones, R. Liu, L. Meyerovich, K. Asanovic, and R. Bodik,“Parallelizing the web browser,” in USENIX HotPar, 2009.

[13] D. H. Bui, Y. Liu, H. Kim, I. Shin, and F. Zhao, “Rethinking energy-performance trade-off in mobile web page loading,” in Proceedings ofthe International Conference on Mobile Computing and Networking,2015.

[14] L. A. Meyerovich and R. Bodik, “Fast and parallel webpage layout,” inProceedings of the International Conference on World Wide Web, 2010.

[15] N. Thiagarajan, G. Aggarwal, A. Nicoara, D. Boneh, and J. P. Singh,“Who killed my battery?: Analyzing mobile browser energy consump-tion,” in Proceedings of the International Conference on World WideWeb, 2012.

[16] K. Zhang, L. Wang, A. Pan, and B. B. Zhu, “Smart caching for webbrowsers,” in Proceedings of the International Conference on WorldWide Web, 2010.

[17] Y. Zhu, M. Halpern, and V. Reddi, “Event-based scheduling for energy-efficient QoS (eQoS) in mobile web applications,” in Proceedings of theInternational Symposium on High Performance Computer Architecture,2015.

[18] Y. Zhu and V. J. Reddi, “High-performance and energy-efficient mobileweb browsing on big/LITTLE systems,” in Proceedings of the Interna-tional Symposium on High Performance Computer Architecture, 2013.

[19] M. Butkiewicz, H. V. Madhyastha, and V. Sekar, “Understanding websitecomplexity: Measurements, metrics, and implications,” in Proceedingsof the ACM Conference on Internet Measurement Conference, 2011.

[20] “How loading time affects your bottomline,” https://blog.kissmetrics.com/loading-time/.

[21] P. Bogdan, R. Marculescu, S. Jain, and R. T. Gavila, “An optimalcontrol approach to power management for multi-voltage and frequencyislands multiprocessor platforms under highly variable workloads,” inProceedings of the International Symposium on Networks on Chip, 2012.

[22] G. Dhiman and T. S. Rosing, “Dynamic voltage frequency scaling formulti-tasking systems using online learning,” in Proceedings of theInternational Symposium on Low Power Electronics and Design, 2007.

[23] J. Donald and M. Martonosi, “Techniques for multicore thermal man-agement: Classification and new exploration,” in Proceedings of theInternational Symposium on Computer Architecture, 2006.

[24] B. Gaudette, C.-J. Wu, and S. Vrudhula, “Improving smartphone userexperience by balancing performance and energy with probabilistic qosguarantee,” in Proceedings of the International Symposium on HighPerformance Computer Architecture, 2016.

[25] V. Hanumaiah, D. Desai, B. Gaudette, C. Wu, and S. B. K. Vrudhula,“STEAM: A smart temperature and energy aware multicore controller,”ACM Trans. Embedded Comput. Syst., vol. 13, no. 5s, 2014.

[26] V. Hanumaiah and S. B. K. Vrudhula, “Energy-efficient operation ofmulticore processors by DVFS, task migration, and active cooling,”IEEE Trans. Computers, vol. 63, no. 2, pp. 349–360, 2014.

[27] C. Isci, A. Buyuktosunoglu, and M. Martonosi, “Long-term workloadphases: Duration predictions and applications to DVFS,” IEEE Micro,vol. 25, no. 5, pp. 39–51, 2005.

[28] C. Isci, G. Contreras, and M. Martonosi, “Live, runtime phase mon-itoring and prediction on real systems with application to dynamicpower management,” in Proceedings of the International Symposiumon Microarchitecture, 2006.

[29] K. Kang, J. Kim, S. Yoo, and C. M. Kyung, “Temperature-awareintegrated DVFS and power gating for executing tasks with runtime dis-tribution,” IEEE Transactions on Computer Aided Design of IntegratedCircuits and Systems, vol. 29, no. 9, pp. 1381–1394, 2010.

[30] J. S. Lee, K. Skadron, and S. W. Chung, “Predictive temperature-awareDVFS,” IEEE Transactions on Computers, vol. 59, no. 1, 2010.

[31] P. Pillai and K. G. Shin, “Real-time dynamic voltage scaling for low-power embedded operating systems,” in Proceedings of the Symposiumon Operating Systems Principles, 2001.

[32] E. Ebrahimi, C. J. Lee, O. Mutlu, and Y. N. Patt, “Fairness viasource throttling: A configurable and high-performance fairness substratefor multi-core memory systems,” in Proceedings of the InternationalSymposium on Architectural Support for Programming Languages andOperating Systems, 2010.

[33] A. Herdrich, R. Illikkal, R. Iyer, D. Newell, V. Chadha, and J. Moses,“Rate-based QoS techniques for cache/memory in CMP platforms,” inProceedings of the International Conference on Supercomputing, 2009.

[34] D. Shingari, A. Arunkumar, and C. J. Wu, “Characterization andthrottling-based mitigation of memory interference for heterogeneoussmartphones,” in Proceedings of the International Symposium on Work-load Characterization, 2015.

[35] X. Zhang, R. Zhong, S. Dwarkadas, and K. Shen, “A flexible frameworkfor throttling-enabled multicore management (TEMM),” in InternationalConference on Parallel Processing, 2012.

[36] W. Liao, L. He, and K. M. Lepak, “Temperature and supply voltageaware performance and power modeling at microarchitecture level,”IEEE Transactions on Computer-Aided Design of Integrated Circuitsand Systems, vol. 24, no. 7, pp. 1042–1053, 2005.

[37] “Snapdragon 800,” https://www.qualcomm.com/products/snapdragon/processors/800.

[38] “Linux profiling with performance counters,” https://perf.wiki.kernel.org/index.php/Main Page.

[39] “Alexa Top 500 Global Websites,” http://www.alexa.com/topsites.[40] A. Gutierrez, R. G. Dreslinski, T. F. Wenisch, T. Mudge, A. Saidi,

C. Emmons, and N. Paver, “Full-system analysis and characterization ofinteractive smartphone applications,” in Proceedings of the InternationalSymposium on Workload Characterization, 2011.

[41] D. Pandiyan and C.-J. Wu, “Quantifying the energy cost of datamovement for emerging smart phone workloads on mobile platforms,”in Proceedings of the International Symposium on Workload Character-ization, 2014.

[42] D. Lo, T. Song, and G. E. Suh, “Prediction-guided performance-energy trade-off for interactive applications,” in Proceedings of the 48thInternational Symposium on Microarchitecture, 2015.

[43] K. Rao, J. Wang, S. Yalamanchili, Y. Wardi, and H. Ye, “Application-specifc performance-aware energy optimization on android mobile de-vices,” in Proceedings of the International Symposium on High Perfor-mance Computer Architecture, 2017.

[44] S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, andK. Skadron, “Rodinia: A benchmark suite for heterogeneous computing,”in Proceedings of the International Symposium on Workload Character-ization, 2009.

[45] C. Gao, A. Gutierrez, M. Rajan, R. Dreslinski, T. Mudge, and C.-J. Wu, “A study of mobile device utilization,” in Proceedings ofthe International Symposium on Performance Analysis of Systems andSoftware, 2015.

[46] D. Pandiyan, S. Y. Lee, and C. J. Wu, “Performance, energy character-izations and architectural implications of an emerging mobile platformbenchmark suite - mobilebench,” in Proceedings of the InternationalSymposium on Workload Characterization, 2013.

[47] Y. Zhu and V. J. Reddi, “WebCore: Architectural support for Mobilewebbrowsing,” in Proceedings of the International Symposium on ComputerArchitecuture, 2014.

[48] S. Fan and B. C. Lee, “Evaluating asymmetric multiprocessing formobile applications,” in Proceedings of the International Symposiumon Performance Analysis of Systems and Software, 2016.