context-aware task ofﬂoading for wearable...

Context-Aware Task Offloading for WearableDevices

Yi Yang, Yeli Geng, Li Qiu, Wenjie Hu and Guohong CaoDepartment of Computer Science and Engineering

The Pennsylvania State UniversityEmail: {yzy123, yzg5086, lyq5023, wwh5068, gcao}@cse.psu.edu

Abstract—Wearable devices such as smartwatches do nothave enough power and computation capability to process com-putationally intensive tasks. One viable solution is to offloadthese tasks to the connected smartphone. Existing Androidsmartphones allocate CPU resources to a task according to itsperformance requirement, which is determined by the contextof the task. However, due to lack of context information, smart-phones cannot properly allocate resources to tasks offloaded fromwearable devices. Allocating too few resources to urgent tasks(related to user interaction) may cause high interaction latencyon wearable devices, while allocating too many resources to unim-portant tasks (unrelated to user interaction) may lead to energywaste on the smartphone. To solve this problem, we proposea context-aware task offloading (CATO) framework, in whichoffloaded tasks can be properly executed on the smartphone orfurther offloaded to the cloud based on their context, aimingto achieve a balance between good user experience on wearabledevices and energy saving on the smartphone. To validate ourdesign, we have implemented CATO on the Android platformand developed two applications on top of it. Experimental resultsshow that CATO can significantly reduce latency for urgent tasksand save energy for other unimportant tasks.

I. INTRODUCTION

Wearable devices like smartwatches or Google glasses, arebecoming increasingly popular accompanied with a wide rangeof novel wearable apps, such as health monitoring [1], [2],augmented reality [3], [4], [5], and gesture recognition [6], [7].Most of these emerging apps involve computationally intensivetasks. However, due to the size limitation, wearable devices donot have enough power and computation capability to processthese tasks.

As a solution, wearable devices can offload computationallyintensive tasks to the connected smartphone. Previous researchhas investigated how to make offload decisions on wearabledevices [4], [8], [9], but none of them considers how to executeoffloaded tasks on smartphone. Existing Android system allo-cates CPU resources to a task according to its performance re-quirement, which is determined by the context of the task, i.e.,whether the task is related to user interaction. A task relatedto user interaction will have high performance requirementand should be processed as fast as possible. Thus, Androidexecutes such task in a foreground process, which can obtainall CPU resources and then runs faster. In contrast, a taskunrelated to user interaction has low performance requirement.

This work was supported in part by the National Science Foundation (NSF)under grant CNS-1526425 and CNS-1421578.

Thus, Android executes such task in a background process,which is restricted to limited CPU resources and then runsslower. However, due to lack of task context information, tasksoffloaded from wearable devices cannot be properly executedon smartphones. Current Android smartphones simply executeall offloaded tasks in background processes. Then, tasks relatedto user interaction cannot be processed promptly, resultingin high interaction latency on wearable devices. To avoidsuch problem, smartphones can execute all offloaded tasksin foreground processes, but resources may be wasted onunimportant tasks.

This problem becomes more severe for modern smartphonesequipped with the ARM big.LITTLE architecture [10], wherehigh-performance and high-power big CPU cores are com-bined with energy-efficient but low-performance little CPUcores. On such smartphones, foreground processes can runon big cores to accelerate the processing of tasks relatedto user interaction, while background processes only run ona little core to save energy. Since these two types of CPUcores have significant differences in power consumption andcomputation performance, executing offloaded tasks all inbackground processes will incur significant interaction latencyon wearable devices, while executing them all in foregroundprocesses will waste a large amount of energy on smartphones.

To solve this problem, we propose a Context-Aware TaskOffloading (CATO) framework, where offloaded tasks can beexecuted based on their context. Specifically, tasks relatedto user interaction are executed in foreground processes toreduce the interaction latency on wearable devices, whilethose unrelated to user interaction are executed in backgroundprocesses to save energy on smartphone. CATO has threesalient features. First, CATO does not require any modificationto the underlying Android OS. Second, CATO is backwardcompatible with traditional context-unaware offloading solu-tions to avoid unexpected failure when offloading tasks to aCATO-disabled smartphone. That is, CATO enables wearabledevices to detect whether the connected smartphone supportsCATO, and if not, switch back to traditional ways to offloadtasks. Third, CATO explores opportunities to further offloadtasks from the smartphone to the cloud. Different from priorapproaches that use offline profiling [11], [12], [13], [14], weuse online profiling to estimate the latency and energy cost ofexecuting a task, and consider the task context when makingoffload decisions.

Activity Service

Process

Text Translation App

Fig. 1. Android executes all tasks (components) of the same app in the sameprocess.

Activity Service

Process

Text Translation App

Process

Service

Text-to-speech App

Foreground Foreground

Fig. 2. If a foreground app depends on another app, both apps will be runin foreground processes.

We have implemented CATO on Android platforms. Toevaluate its performance, we have developed two wearableapps: a speech recognition app with active user interaction, andan activity and health monitoring app without user interaction.Both apps use CATO to offload tasks to smartphone. Tomeasure the power consumption of the smartphone, we designand fabricate a Flex PCB based battery interceptor to connectthe smartphone with an external power monitor. Experimentalresults show that CATO can reduce latency by at least one thirdfor the task related to user interaction (compared to executingthe task in a background process on the smartphone), andreduce energy by more than half for the task unrelated to userinteraction (compared to executing the task in a foregroundprocess on the smartphone). Our contributions are as follows.

• We find that smartphones cannot properly execute tasksoffloaded from wearable devices due to lack of taskcontext.

• We design a context-aware task offloading framework,in which offloaded tasks can be properly executed on thesmartphone or further offloaded to the cloud according totheir context, aiming to achieve a balance between gooduser experience on wearable devices and energy savingon smartphone.

• We implement the CATO framework on the Androidplatform and develop two applications to demonstrate itseffectiveness.

The rest of this paper is organized as follows. SectionII presents some preliminaries. Section III provides the mo-tivation for context-aware task offloading. We present thedesign and implementation of the task offloading frameworkin Section IV and Section V, respectively. The performanceof CATO is evaluated in Section VI. Section VII discussesrelated work, and Section VIII concludes the paper.

II. PRELIMINARIES

In this section, we first provide some background of howtasks are executed in Android, and then introduce the emergingARM big.LITTLE architecture and its support in Android.

A. Task Execution in Android

In Android, a component is a unique building block toperform a specific task. The most common components areactivity and service, where an activity provides a screen ofuser interface (UI), and a service is used to perform a long-running task. An android app usually consists of multiple

components to perform various tasks. For example, in a texttranslation app, an activity is used to collect the user input andoutput the result, and a service is responsible for the real taskof translating the text. Since tasks of the same app usuallydepend on each other, Android assumes these tasks have thesame performance requirement and executes them in the sameprocess (Fig. 1). For a foreground app (i.e., an app with anactive activity that the user interacts with), its tasks are relatedto user interaction and have high performance requirement.Thus, Android executes them in a foreground process, whichcan obtain all system resources and then runs faster. For abackground app, its tasks are unrelated to user interaction andhave low performance requirement. Thus, Android executesthem in a background process, which is restricted to limitedsystem resources and runs slower. Sometimes a foreground appmay call another app. For example, to convert the translatedtext into speech, the text translation app may ask the text-to-speech app for help (Fig. 2). Then the text-to-speech appshould also be run in a foreground process to reduce theinteraction latency. To handle this, Android keeps trackingof the relationship among running apps and manages theirprocesses accordingly.

B. big.LITTLE Architecture on Smartphones

The ARM big.LITTLE architecture [10] is becoming in-creasingly popular on modern smartphones, where big coresare designed for maximum computation performance whilelittle cores are designed for maximum energy efficiency. Withthis architecture, a task can be dynamically allocated to abig or little core according to its performance requirement.By allocating urgent tasks (related to user interaction) to bigcores, the smartphone can accelerate the processing and thusreduce the interaction latency. By allocating other unimportanttasks to little cores, the smartphone can save energy. Themost popular commercial implementations of big.LITTLEon smartphones include Samsung Exynos 5/7/8 and Qual-comm Snapdragon 808/810. For example, the LG Nexus 5Xsmartphone uses Qualcomm Snapdragon 808 with quad-coreCortex-A53 (little) and dual-core Cortex-A57 (big). The littleand big cores can operate in a frequency range from 384 MHzto 1.44 GHz, and from 384 MHz to 1.82 GHz, respectively.

To understand the difference between big and little cores interms of computation performance and energy consumption,we ran the same workload on a little core and a big core atdifferent frequencies on a Nexus 5X smartphone. The work-load is a number of iterations that can generate 100% CPUload. The CPU frequency is manually controlled by writingfiles in the /sys/devices/system/cpu/[cpu#]/cpufreq

virtual file system with root privilege. To specify the CPUcore to run the workload, the thread affinity is set by calling__NR_sched_setaffinity through the Java native interface(JNI) in Android. Fig. 3 shows the measured results. Theaverage power of the little core (554 mW) at its highestfrequency of 1.44 GHz is only one fifth of the power ofthe big core (2468 mW) at its highest frequency of 1.82GHz. Compared to the little core at the same CPU frequency,

0 500 1000 1500 2000

Frequency (MHz)

0

500

1000

1500

2000

2500P

ow

er

(mW

)little core

big core

(a) Average power

0 500 1000 1500 2000

Frequency (MHz)

0

0.5

1

1.5

2

Late

ncy (

msec)

×104

little core

big core

(b) Latency

0 500 1000 1500 2000

Frequency (MHz)

100

200

300

400

500

Energ

y c

onsum

ption (

uA

h)

little core

big core

(c) Energy consumptionFig. 3. Average power, latency and total energy consumption of a little core and a big core to run the same workload at different frequencies on Nexus 5X

Smartphone

Wearable

deviceApp A

Service AProcess

Offload

Task A

App B

Service BProcess

Offload

Task B

Foreground?

Foreground Background

Background?

Fig. 4. Wearable app A (foreground) and B (background) offload tasks to thesmartphone by invoking service A and B, respectively. Since the smartphonedoes not know the context of these two tasks, it cannot properly run serviceA in a foreground process and service B in a background process.

although the big core can reduce the latency (execution time)by one third (Fig. 3b), due to its high power level, the big coreincreases the total energy consumption by a factor of two (Fig.3c). In summary, big cores are much more powerful than littlecores, but also consume significantly more energy.

C. big.LITTLE Support in Android

Old Android before version 6.0 does not differentiate be-tween big and little cores, so there is no guarantee that impor-tant tasks can be executed on big cores when necessary. Asbig.LITTLE becomes increasingly prevalent on smartphones,Android 6.0 Marshmallow, the latest Android version releasedin October 2015 [15], brings better support for big.LITTLEusing the cpuset. A cpuset defines a list of CPU cores on whicha process can execute. By moving processes into differentcpusets, Android can control the usage of big and little cores.For example, if processes are moved to a cpuset with alittle core, they can only run on that little core. The cpusetis configured in the device specific init.rc file and set upin the init process during system startup. On most devicessuch as Nexus 5X or Nexus 6P, two cpusets are created:one for foreground processes which contains all CPU cores,and another for background processes which only containsa little core, i.e., cpu0. As introduced above, in Android,foreground processes execute tasks related to user interaction,and background processes execute other unimportant tasks.Thus, by separating foreground and background processesusing these two cpusets, Android is able to allocate urgenttasks onto big cores to reduce the latency, and unimportanttasks onto little cores to extend the battery life.

III. THE MOTIVATION FOR CONTEXT-AWARE TASKOFFLOADING

The smartphone knows the context of local tasks, so itcan allocate them onto different CPU cores according to theirperformance requirements. However, for a task offloaded from

wearable devices, the smartphone is unaware of its context.Then executing those offloaded tasks without any contextinformation will either waste more energy on the smartphoneor cause extra delay on wearable devices. For example, Fig. 4shows two wearable apps that offload tasks to the smartphone.App A is a foreground app with active user interaction, whileapp B is a background app without user interaction. App Aand app B offload task A and task B to the smartphone byremotely invoking service A and service B, respectively. Ifthe smartphone knows the context of the task, task (service)A should be executed in a foreground process, since it isrelated to user interaction. Similarly, task (service) B should beexecuted in a background process, since it does not have userinteraction. However, current Android devices offload taskswithout context information. That is, the invocation of theremote service is implemented by simply sending a messageusing Google MessageApi, and that message contains noinformation about whether the offloading app is a foregroundapp or a background app. Current Android smartphones simplyexecute all offloaded tasks in background processes. Then im-portant tasks like task A are executed on a little core, resultingin significant interaction latency on wearable devices. To avoidthis problem, smartphones can execute all offloaded tasks inforeground processes, but energy will be wasted if unimportanttasks like task B are executed on big cores. Thus, to betterutilize big.LITTLE cores on smartphone, it is necessary toinform the smartphone about the context of offloaded tasks.Then the smartphone can execute them on proper CPU coresaccording to their performance requirements.

Smartphones can further offload tasks to the cloud. In mostexisting work, researchers only consider a single objectivewhen offloading tasks to the cloud. For example, in [9], [16],the goal is to minimize the latency in order to provide gooduser experience. However, reducing latency may be at the costof consuming more energy. For example, the LTE interface onthe smartphone consumes a large amount of energy (i.e., tailenergy) after a data transmission. Thus, although sometimesoffloading a task through LTE can reduce the latency, it mayconsume more energy than running the task on the smartphone.Such energy waste is unnecessary for tasks unrelated to userinteraction, which do not need to be processed very quickly.To achieve a balance between good user experience and energysaving, we should consider the task context when makingoffload decisions. That is, for tasks related to user interaction,they should be offloaded if latency can be reduced, and for

Profiler

CATO

runtime

Wearable device(e.g. smartwatch)

Smartphone Cloud

CATO

runtime

Solver

App A

App B

Offload

service

App A

Local

serviceApp A

App B

IPC

(AIDL)

Intent

Offload

to

cloudCAT

O

discovery /

Task

offloading

Intent

Server

proxyClient

proxy

Offload

service

Local

service

App B

Offloadto

cloud

Solver

Fig. 5. Overview of the CATO architecture

tasks unrelated to user interaction, they should be offloaded ifenergy can be saved.

IV. CONTEXT-AWARE TASK OFFLOADING (CATO)

In this section we introduce the design of our offloadingframework CATO. We first give an overview of CATO andthen describe its major components.

A. CATO Overview

CATO enables tasks to keep their context when offloadedfrom wearable devices to the smartphone. Based on the contextof the task, CATO achieves a balance between good userexperience on wearable devices and energy saving on thesmartphone by minimizing the latency of UI related tasks andmaximizing the energy saving of other unimportant tasks.

The architecture of CATO is shown in Fig. 5. A clientproxy runs on the wearable device, which is responsiblefor discovering whether the connected smartphone is CATOenabled, attaching context information to offloaded tasks, andoffloading tasks to the smartphone. On the smartphone, aserver proxy is responsible for receiving offloaded tasks. Basedon measurements from past task executions collected by aprofiler, a solver estimates the latency and energy cost ofexecuting a new task, and then makes the offload decision.

B. Client and Server Proxies

It is important to provide a mechanism for the wearabledevice to discover whether the connected smartphone hasCATO enabled. If CATO is not supported, tasks can beoffloaded using transitional context unaware ways to avoidunexpected failures of wearable apps. The CATO discoveryservice is implemented by using the node capability mecha-nism supported in Android, which enables the wearable deviceto find connected nodes with a specific capability. In the serverproxy, we declare the CATO capability for the smartphone as

<string-array name="android_wear_capabilities"><item>CATO_capability</item>

</string-array>

The client proxy can detect and connect to a smartphonewith CATO capability. If no such smartphone is found, it willimmediately notify the wearable app to use traditional waysto offload tasks.

Once a CATO-enabled smartphone is found, the client proxyattaches context information to tasks and sends them to theserver proxy. To obtain the context of a task, we inquire the

TABLE IPOWER CONSUMPTION OF VARIOUS NETWORK INTERFACES

State Power (mW) Duration (s)IDLE* 979.5± 23.3 -

WiFi Sending 2529.6± 53.1 -Receiving 2247.4± 32.9 -

LTE

Sending 2643.2± 49.5 -Receiving 2286.4± 54.3 -Promotion 1586.3± 27.6 0.5± 0.1

Tail 1265.3± 38.2 11.3± 0.8

*: IDLE indicates the power consumption of the smartphone when allnetwork interfaces are at the IDLE state and the screen is on.

activity manager in Android to check the running status ofthe app that offloads the task. If the app is a foreground app(i.e., it runs in a foreground process), the offloaded task isconsidered to be related to user interaction and should havehigh performance requirement. Otherwise, the task is unrelatedto user interaction and has low performance requirement. Notethat the running status of an app changes over time. Forexample, a foreground app could become background if theuser opens another app. So the context information of anoffloaded task is retrieved in real time.

C. Profiler

The profiler collects two types of data: CPU profiling andnetwork profiling. CPU profiling focuses on the latency andCPU energy consumption of task executions on the smart-phone, and network profiling focus on the network character-istics of wireless environments, which are used to calculatethe latency and energy cost of offloading a task to the cloud.

1) CPU Profiling: Prior work [11], [12], [16] only consid-ers specific tasks, and uses offline or instrumentation basedmethods to profile the CPU energy consumption on the smart-phone. Different from existing work, we adopt online profilingwhich can be used for any task without modifying the sourcecode. To simplify the profiling, we assume that the service thatexecutes the task runs in its own process. If the app’s processincludes other components besides the service, we can use theservice’s android:process attribute in the app manifest tolet the service run in a separate process. Then the goal is toprofile the CPU energy consumption of the process hostingthe service, and we have the following two approaches.

The first approach utilizes the battery fuel gauge on smart-phones, which provides whole-phone power readings (in theforms of battery voltage and discharge current) through filesin /sys/class/power_supply/battery. We can use thepower reading before running a process as the baseline toestimate the process’s power consumption during its execution.However, on most smartphones, the fuel gauge does notprovide instantaneous power readings. Instead, it reports theaverage battery power for a long time interval. For example,the time interval on the Nexus 5X smartphone is around thirtyseconds. Thus, the power readings are too coarse-grain toaccurately profile the process in real time.

The second approach (adopted by our profiler) builds powermodels based on the hardware components of the smartphone,and then estimates the energy consumption of a processbased on the utilization of each hardware component. The

TABLE IIFEATURES OF EXECUTING A SPEECH RECOGNITION TASK

Category Features

Task inputSpeech length

LanguageAudio sample rate

Execution environment Foreground / background

CPU power model is inspired by [17]. We first measuredthe power consumption of different CPU cores at 100%utilization under different frequencies using an external powermonitor, as shown in Fig. 3a. We then built a utilization basedCPU power model, in which the CPU power consumptionis linear to the CPU utilization for a given frequency. Touse this model, we log the process’s CPU utilization (from/proc/[pid]/stat) and the frequencies of CPU cores (from/sys/devices/system/cpu/[cpu#]/cpufreq) every halfa second during the process’s execution. Then we can estimatethe process’s CPU energy over each logging interval, and sumup the energy to obtain the total CPU energy consumption ofthe process.

The accuracy of CPU profiling is evaluated by comparingthe estimated CPU energy consumption of executing a task onthe smartphone with the actual energy measured by a powermonitor. The mean error is 7% and the standard deviation is5.8%. The profiling overhead is negligible, since it only runsduring the task execution and reads system files periodically.

2) Network Profiling: To calculate the latency and energycost of offloading a task, we need to know the network char-acteristics of the wireless environment, such as the bandwidth,and power consumption. Each time CATO offloads a taskto the cloud, the profiler keeps track of the amount of datatransmitted, from proc/uid_stat/[uid#], and the transferduration to calculate the bandwidth. The current bandwidthis estimated based on an average of five most recent mea-surements. To determine the power consumption of wirelessinterfaces, we use the Monsoon power monitor as the powersupplier for our Nexus 5X smartphone to measure the averageinstant power. The results are summarized in Table I, wherethe power consumption is measured when the screen is on.

D. Solver

When a new task arrives, the solver first estimates thelatency and energy cost of executing the task based on theprofiling data, and then makes the offload decision.

1) Estimating Latency and Energy Cost: The latency andenergy cost of executing a task on the smartphone may be quitedifferent with different inputs and execution environments. Forexample, it may take different amounts of time and energyto process a text translation task with input text of differentlengths or different languages. Based on the CPU profilingdata, CATO constructs a prediction model by considering taskinputs as well as the execution environment to predict thelatency and energy consumption of executing a new task onthe smartphone. We compared the following three predictors,and chose the one with the smallest prediction error in termsof the mean error (mean absolute error divided by mean) andthe standard deviation of errors.

TABLE IIIERROR OF PREDICTING THE LATENCY OF EXECUTING A SPEECH

RECOGNITION TASK ON THE SMARTPHONE

Mean error (%) Standard deviation (%)NAIVE 40.5 26.6KNN 4.5 8.1

RTREE 6.4 11.8

Test number

0 20 40 60 80 100

Late

ncy (

sec)

0

5

10

15

0Measured latency

Predicted latency

Fig. 6. Measured and KNN predicted latency

• NAIVE returns the mean value of the latency and energyconsumption measured from past executions of the task.

• KNN, k-nearest neighbors (k = 5), finds k past taskexecutions (neighbors) that are closest in distance (mea-suring the similarity of features) to the new execution.The prediction is made by a weighted average of thesek neighbors, where the weight of each neighbor is basedon its distance to the new execution.

• RTREE constructs a regression tree based on the taskfeatures for the prediction.

In the experiment, we ran a speech recognition task for twohundred times with different inputs on the smartphone. Wemeasured the accuracy of each predictor to predict the latencyof executing the task. Features used by KNN and RTREEare summarized in Table II. Every time a test is finished,the test result is added into the training dataset to retrainthe predictors. Table III shows the prediction error for eachpredictor. The high prediction error of NAIVE (40.5% meanerror, 26.6% standard deviation) indicates that the latency ofexecuting the same task varies significantly. KNN outperformsRTREE with a smaller prediction error (4.5% vs 6.4% formean error, 8.1% vs 11.8% for standard deviation). Fig. 6shows the measured and KNN predicted latency for each test.The curve of predicted latency appears to closely follow themeasured latency after the initial 20 tests. According to theexperimental results, KNN is chosen as the predictor.

For each type of tasks, a predictor P is trained usingthe task’s CPU profiling data. When a new task Ti arrives,we use P (f i

1, fi2, ..., f

in) to predict the latency (Li

local) andenergy cost (Ei

local) of executing Ti on the smartphone, wheref i1, f

i2, ..., f

in are Ti’s input features and execution environment

features (i.e., foreground / background).To estimate the cost of offloading Ti to the cloud, we first

estimate the current uplink bandwidth (Bup) and downlinkbandwidth (Bdown) according to the network profiling. Thenthe latency of sending Ti’s input data diin is Li

send = diin/Bup,and the energy consumption is Ei

send = Pup × Lisend, where

Pup is the power consumption of sending data through thewireless interface. Similarly, the latency of receiving Ti’soutput data diout is Li

rcv = diout/Bdown, and the energyconsumption is Ei

rcv = Pdown × Lircv , where Pdown is the

Input : a task Ti with features f i1, f

i2, ..., f

in

Result: how to execute the task1 predict Li

local and Eilocal using P (f i

1, fi2, ..., f

in);

2 calculate Lisend, Ei

send, Lircv , and Ei

rcv based on current bandwidth;3 Li

off ← Lisend + Li

rcv + Licloud;

4 Eioff ← Ei

send + Eircv ;

5 if Ti is related to user interaction then// Ti is offloaded by a foreground app;

6 if Lioff < Li

local then7 offload Ti to the cloud;8 else9 execute Ti in a foreground process;

10 end11 else12 if Ei

off < Eilocal then

13 offload Ti to the cloud;14 else15 execute Ti in a background process;16 end17 end

Fig. 7. Algorithm to make offload decisions

power consumption for downloading data. In LTE network,we also consider the promotion delay [18] when calculatingthe latency and the tail energy [19] when calculating the energyconsumption. Finally, the latency of offloading Ti to the cloudis

Lioff = Li

send + Lircv + Li

cloud, (1)where Li

cloud is the estimated execution time of Ti on thecloud. The energy consumption of offloading Ti is

Eioff = Ei

send + Eircv. (2)

2) Offload Decision: The offload decision engine decideshow to execute a task Ti according to its context. If Ti isrelated to user interaction (from a foreground app), we aim tominimize its latency in order to provide better user experience.We can execute Ti in a foreground process on the smartphone,or offload it to the cloud if latency can be further reduced (i.e.,Lioff < Li

local). If Ti is unrelated to user interaction (froma background app), we aim to minimize the smartphone’senergy consumption. We can execute Ti in a backgroundprocess, or offload it to the cloud if energy can be saved (i.e.,Ei

off < Eilocal). The algorithm to make offload decisions is

described in Fig. 1. In this algorithm, we only consider tasksoriginated from wearable devices, but it can also be used fortasks generated on the smartphone.

Once the decision is made, CATO invokes the correspondingservice to execute the task on the smartphone or offload it tothe cloud. Since CATO runs in the background, the invokedservice runs in a background process by default. We needto move the background process into the foreground when aforeground process is required to execute the task. If CATOhas the root privilege, this can be easily done by adding theprocess id into the file /dev/cpuset/foreground/tasks.However, most smartphones are unrooted and users may haveprivacy concerns about the root privilege. As an alternativeapproach, we find that when a service issues a notification bycalling the method startForeground(Notification), itshosting process becomes a foreground process. This approach

does not require the root privilege, but it brings a little interfer-ence (i.e., a notification) to the user. We have implemented thesecond approach in CATO. In the future, we plan to integrateCATO into the Android system and use the first approach.

V. IMPLEMENTATION

In this section we introduce the application programminginterface (API) of CATO and two apps based on CATO.

A. CATO API

CATO is implemented as an Android library for developersto prepare a context included task and offload the task to thesmartphone in a wearable app. Its API, as shown in Table IV,is designed similar to Google MessageApi for ease of use.

1) Preparing a Task: CATO provides the CATOTask classfor developers to construct a task. Although based on the taskinput, CATO can extract input features like data size for theprediction of the latency and energy cost of executing the task(see Section IV-D), it is very hard to obtain all useful features(e.g., audio sample rate) for the prediction. Thus, we providethe addFeature() method for developers to optionally addfeatures to the task.

2) Offloading a Task: As introduced in Section IV-B,CATO runs a client proxy on the wearable device to offloadtasks to the smartphone. The client proxy exposes interfaces,which are defined by Android Interface Definition Language(AIDL), for other apps to use. To simplify the usage ofthese interfaces, CATO provides the OffloadApi class, inwhich the isConnected() method returns whether CATO isenabled (CATO discovery), the offload() method offloadsa task from the wearable device to the smartphone, and thesetOffloadResultCallback() method allows developersto register a callback function to process the offload result.

B. Applications

We develop two apps on top of CATO: a speech recognitionapp, and an activity and health monitoring app (smart alarm).

1) Speech Recognition: Most wearable devices like smart-watches do not have keyboards due to the very limited screensize. The speech recognition app provides an alternative wayto get user’s input via voice by converting user’s speech intotext. It collects the user’s speech on the wearable device, andthen uses CATO to offload the speech recognition task tothe smartphone (or further to the cloud), where the task isprocessed using the open-source Pocketsphinx library [20].Other apps can invoke the speech recognition app to obtainthe voice typing capability. For example, a SMS app can askusers to speak the reply message via this app. Since this app iseither in the foreground or related to a foreground app, CATOexecutes its task in a foreground process on the smartphoneto reduce interaction latency.

2) Smart Alarm: Wearable sensors are extremely useful inproviding accurate and reliable information on user’s activitiesand health conditions. We use the ideas reviewed in [21] todetect user’s abnormal situations in order to prompt rapidassistance. For example, we can detect injurious falls and

TABLE IVCATO API USED FOR PREPARING A TASK AND OFFLOADING THE TASK TO THE SMARTPHONE

Class Method Description

CATOTask CATOTask(Input) Constructs a CATO task.addFeature(Key, Value) Adds features for the prediction of the latency and energy cost of executing the task.

OffloadApiisConnected() Returns whether CATO is enabled.

offload(CATOTask) Offloads a task to CATO.setOffloadResultCallback(Callback) Sets a callback to be called when the offload result is ready.

TABLE VTASKS OF THE SMART ALARM APP WITH DIFFERENT CONFIGURATIONS

Sensor set Interval (sec) Data size (KB)Task 1 full 60 240Task 2 key 60 95Task 3 full 30 120Task 4 key 30 48Task 5 full 15 60Task 6 key 15 24

Fig. 8. Flex PCB based battery interceptor for the Nexus 5X smartphone tobe connected with the Monsoon Power Monitor

automatically alarm preassigned people like family membersor friends. The smart alarm app does not have an UI. Itcontinuously collects sensor data on the wearable device,and periodically offloads the data processing task to thesmartphone (or further to the cloud) via CATO. This app is abackground app, so CATO executes its task in a backgroundprocess to save energy on the smartphone.

VI. PERFORMANCE EVALUATION

In this section, we run some experiments to evaluate thebenefits of CATO in terms of latency reduction for foregroundapps and energy saving for background apps.

A. Experimental Setup

We run the speech recognition app and the smart alarmapp on the Sony SmartWatch 3 paired with the Nexus 5Xsmartphone, both running Android version 6.0.1. The cloudserver is a desktop with 2.83 GHz dual-core CPU runningWindows 7. For the speech recognition app, we generate inputspeeches of the same language (English) but different lengthsvarying from 1 second to 20 seconds. These speeches aresampled at 8 KHz and quantized to 16 bits. We test the smartalarm app with different configurations. Two different sets ofwearable sensors are used: the full set contains all availablesensors on the smartwatch, and the key set contains five keysensors (i.e. accelerometer, magnetometer, gyroscope sensor,gravity sensor, and pedometer). We also consider differentintervals to offload sensor data. With a larger interval, there

will be more data in one offload (task) to process. Six differenttasks are generated using different configurations, as shown inTable V.

We test CATO under various scenarios. In the evaluation,we use “CATO-local” to indicate that CATO cannot offloadtasks from the smartphone to the cloud so that all tasks areexecuted on the smartphone. “CATO-WiFi” and “CATO-LTE”are used to indicate that WiFi and LTE are available for CATOto offload tasks from the smartphone to the cloud, respectively.

To measure the power consumption, we modify the smart-phone’s battery connection and use an external power monitorto provide power supply for the smartphone, which can elim-inate the measurement error introduced by the phone battery.Different from old phone models, the battery connector ofmodern smartphones like Nexus 5X is very tiny, and cannotbe directly connected to an external power monitor. To solvethis problem, we design a battery interceptor based on FlexPCB, which is a very thin circuit board that can be easily bentor flexed. The interceptor is connected with the smartphone’smainboard and the battery through the corresponding batteryconnector, which is Hirose BM22-4 for Nexus 5X, and uses acustomized circuit to modify the battery connection. As shownin Fig. 8, we use this interceptor to connect the Nexus 5Xsmartphone with the Monsoon Power Monitor to measure thepower consumption.

B. Speech Recognition

The speech recognition app should be run in a foregroundprocess to reduce the latency. However, due to lack of contextinformation, it will be run in a background process whenoffloaded from wearable devices. Thus, we compare CATOwith the “background” strategy which runs tasks offloadedfrom wearable devices in background processes.

Compared to running in a background process, executingthe task in a foreground process (i.e., CATO-local) can reducethe latency by one third (Fig. 9). Since the goal here is toreduce the delay and provide better user experience, CATO-local achieves this goal at the cost of consuming more energy.This performance difference is due to the usage of differentCPU cores. As shown in Fig. 10, background process only usesa little core (i.e. cpu0), while foreground process can use thebig core. We also observe that the speech recognition library,Pocketsphinx, cannot fully utilize the multi-core capability ofmodern smartphones, since only one cpu is running at 100%load at a certain time (Fig. 10b). This is probably becausePocketsphinx puts all computationally heavy jobs on a singlethread. If those jobs can be distributed on multi-threads, it canbetter utilize multi-cores and further accelerate the process ofspeech recognition.

1 2 3 4 5 6 7 8 9 10 15 20

Speech length (sec)

0

5

10

15

20

La

ten

cy (

se

c)

Background

CATO-local

CATO-LTE

CATO-WiFi

(a) Latency

1 2 3 4 5 6 7 8 9 10 15 20

Speech length (sec)

0

1000

2000

3000

4000

En

erg

y c

on

su

mp

tio

n (

uA

h)

Background

CATO-local

CATO-LTE

CATO-WiFi

(b) EnergyFig. 9. Latency and energy consumption of the speech recognition task with speech input of different lengths. CATO executes the task in a foreground process.Compared to executing in a background process, CATO-local can reduce latency by one third. To further reduce latency, according to the algorithm described inSection IV-D2, CATO-LTE and CATO-WiFi offload tasks to the cloud when the speech length is longer than 3 seconds and 1 second, respectively. Offloadingthrough LTE consumes much more energy than through WiFi, since the LTE interface consumes lots of energy (i.e., tail energy) after a data transmission.

0 1 2 3 4 5Time (sec)

0

20

40

60

80

100

CP

U lo

ad

(%

)

cpu0

cpu1

cpu2

cpu3

cpu4 (big)

cpu5 (big)

a little core at

100% CPU load

(a) Background

0 1 2 3 4 5Time (sec)

0

20

40

60

80

100

CP

U lo

ad

(%

)

cpu0

cpu1

cpu2

cpu3

cpu4 (big)

cpu5 (big)

big cores at

100% CPU load

(b) Foreground (CATO)Fig. 10. The CPU load of executing the speech recognition task in a background process or a foreground process. A background process only uses a littlecore (i.e. cpu0), while a foreground process can use the big core to accelerate the task processing.

When LTE or WiFi is available, CATO can significantlyreduce latency by offloading the task to the cloud (Fig. 9a).According to the algorithm described in Section IV-D2, thetask is offloaded when the speech length is longer than 3 sec-onds for LTE and 1 second for WiFi, respectively. Offloadingthrough WiFi spends much less time than through LTE. This isbecause 1) The network speed of WiFi is faster than the speedof LTE; 2) the LTE interface takes some time (0.5 seconds)to promote from the low-power state to the high-power statebefore it is ready for the required data transmission, knownas the promotion delay. Although energy is not consideredwhen making the offload decision for this task, in most cases,offloading the task can also save energy due to the relativelyhigh CPU energy to compute the task on the smartphone (Fig.9b). LTE consumes much more energy than WiFi. The mainreason is that the LTE interface remains at the high-powerstate for several seconds before switching back to the low-power state to avoid unnecessary promotion delay [22]. As aresult, a large amount of energy (also referred to as the tailenergy) is wasted after a data transmission.

C. Smart Alarm

CATO executes the task of the smart alarm app in abackground process in order to save energy. However, due tolack of context information, it may be run in a foregroundprocess when offloaded from wearable devices. Thus, wecompare CATO with the “foreground” strategy which runstasks offloaded from wearable devices in foreground processes.

Compared to running in a foreground process, executing thetask in a background process (i.e., CATO-local) can reducethe energy consumption by half (Fig. 11). CATO offloads thetask to the cloud if energy can be saved. Since the goal hereis to save energy, CATO-local achieves this goal at the costof increasing delay. Since LTE consumes a large amount of

energy (the tail energy is counted) to transmit small piecesof data, CATO does not offload the task through LTE. Thisis why CATO-LTE and CATO-local have the same energyconsumption and delay. When WiFi is available, task 1 andtask 2 are offloaded, while other tasks are still executed on thesmartphone because of the high energy efficiency of the littlecore.

D. CATO Overhead

Running CATO may incur extra delay, and we measuresuch overhead. We measure the latency overhead of CATOfrom the time when the offload request is received by CATO,to the time when the task starts to be executed. We foundthat CATO takes 5 ms on average to make offload decisionsand invoke the corresponding service to process the task. Thelatency overhead is mainly caused by training the predictionmodel. Since the latency is very short, we can infer that theenergy overhead is also very small.

VII. RELATED WORK

In the past several years, task offloading from the smart-phone to the cloud has been extensively studied. Lots of re-search has focused on what tasks should be offloaded in orderto save energy or reduce delay [11], [12], [13], [14], [16], [23].MAUI [11] decides which methods should be offloaded, drivenby a runtime optimizer that uses integer-linear programming,to maximize energy savings on the smartphone. Odessa [16]adaptively makes offload decisions and adjusts the level of dataor pipeline parallelism for mobile interactive perception appsto improve their performance. Geng et al. [12] further considerthe characteristics of cellular networks in making offloaddecisions. Gao et al. [13] investigate the dynamic executions ofapps, i.e., apps may have different execution paths at runtime,and solve the offload decision problem based on probabilistic

1 2 3 4 5 6

Task number

0

200

400

600

800

1000

1200

En

erg

y c

on

su

mp

tio

n (

uA

h)

Foreground

CATO-local

CATO-LTE

CATO-WiFi

(a) Energy

1 2 3 4 5 6

Task number

0

2

4

6

8

10

La

ten

cy (

se

c)

Foreground

CATO-local

CATO-LTE

CATO-WiFi

(b) Latency

Fig. 11. Energy consumption and latency of different tasks from the smart alarm app. CATO executes the task in a background process. Compared to runningin a foreground process, CATO-local can reduce energy by half. CATO-LTE never offloads the task to the cloud, so CATO-LTE and CATO-local have thesame energy consumption and delay. CATO-WiFi offloads task 1 and task 2 to the cloud to further save energy.

analysis. All these works rely on offline profiling or requirespecific programming environments (e.g., Odessa is built onSprout, a distributed stream processing environment). Differentfrom them, we use online profiling and have no assumptionof any programming environment on the smartphone.

As wearable devices become increasingly popular, taskoffloading from wearable devices to smartphone has receivedsome attention [4], [8], [9]. In [4], a case study has been donebased on three augmented reality apps developed on GoogleGlass. Experimental results show that all computationallyintensive tasks should be offloaded to the nearby smartphone inorder to reduce the app’s latency and save energy for wearabledevices. Cheng et al. [9]. introduce a three layered offloadframework, i.e., from wearable devices in the first layer to thesmartphone in the second layer and further to the cloud in thethird layer. Based on this framework, they propose an offloadalgorithm to minimize the delay on wearable devices. All theseexisting works focus on how to make offload decisions, butignore the smartphone-side problem of how to execute theseoffloaded tasks, which is the focus of this paper. We findthat the smartphone cannot properly execute offloaded tasksdue to lack of context, and then propose a context-aware taskoffloading framework to solve the problem.

VIII. CONCLUSION

In this paper, we found that the big.LITTLE architecturebased smartphone cannot execute tasks offloaded from wear-able devices on proper CPU cores due to lack of task context,resulting in either energy waste if unimportant tasks areexecuted on big cores or high interaction latency if urgent tasksare executed on little cores. Based on this finding, we proposeda task offloading framework CATO to keep the context ofoffloaded tasks so that they can be executed properly onthe smartphone according to their performance requirements.CATO also explores opportunities to further offload tasks tothe cloud, aiming to further save energy for unimportant tasksand reduce delay for urgent tasks. To validate our design,we have implemented CATO on the Android platform anddeveloped two applications on top of it. Experimental resultsshow that CATO can reduce latency by at least one third forthe task related to user interaction (compared to executing thetask in a background process), and reduce energy by morethan half for the task unrelated to user interaction (comparedto executing the task in a foreground process).

REFERENCES

[1] A. Parate, M.-C. Chiu, C. Chadowitz, D. Ganesan, and E. Kalogerakis,“Risq: Recognizing smoking gestures with inertial sensors on a wrist-band,” in ACM MobiSys, 2014.

[2] A. R. Chowdhury, B. Falchuk, and A. Misra, “Medially: A provenance-aware remote health monitoring middleware,” in IEEE PerCom, 2010.

[3] K. Ha, Z. Chen, W. Hu, W. Richter, P. Pillai, and M. Satyanarayanan,“Towards wearable cognitive assistance,” in ACM Mobisys, 2014.

[4] B. Shi, J. Yang, Z. Huang, and P. Hui, “Offloading guidelines foraugmented reality applications on wearable devices,” in ACM MM, 2015.

[5] A. I. Anam, S. Alam, and M. Yeasin, “Expression: A dyadic conversationaid using google glass for people who are blind or visually impaired,”in ACM MobiCASE, 2014.

[6] H. Wang, T. T.-T. Lai, and R. Roy Choudhury, “Mole: Motion leaksthrough smartwatch sensors,” in ACM MobiCom, 2015.

[7] S. Shen, H. Wang, and R. Roy Choudhury, “I am a smartwatch and ican track my user’s arm,” in ACM MobiSys, 2016.

[8] D. Huang, L. Yang, and S. Zhang, “Dust: Real-time code offloadingsystem for wearable computing,” in IEEE GLOBECOM, 2015.

[9] Z. Cheng, P. Li, J. Wang, and S. Guo, “Just-in-time code offloadingfor wearable computing,” IEEE Transactions on Emerging Topics inComputing, vol. 3, no. 1, pp. 74–83, March 2015.

[10] “Arm big.little technology,” http://www.thinkbiglittle.com.[11] E. Cuervo, A. Balasubramanian, D.-k. Cho, A. Wolman, S. Saroiu,

R. Chandra, and P. Bahl, “Maui: Making smartphones last longer withcode offload,” in ACM MobiSys, 2010.

[12] Y. Geng, W. Hu, Y. Yang, W. Gao, and G. Cao, “Energy-efficientcomputation offloading in cellular networks,” in IEEE ICNP, 2015.

[13] W. Gao, Y. Li, H. Lu, T. Wang, and C. Liu, “On exploiting dynamicexecution patterns for workload offloading in mobile cloud applications,”in IEEE ICNP, 2014.

[14] S. Guo, B. Xiao, Y. Yang, and Y. Yang, “Energy-efficient dynamicoffloading and resource scheduling in mobile cloud computing,” in IEEEINFOCOM, 2016.

[15] “Android 6.0 marshmallow,” https://www.android.com/versions/marshmallow-6-0.

[16] M.-R. Ra, A. Sheth, L. Mummert, P. Pillai, D. Wetherall, and R. Govin-dan, “Odessa: Enabling interactive perception applications on mobiledevices,” in ACM MobiSys, 2011.

[17] X. Chen, N. Ding, A. Jindal, Y. C. Hu, M. Gupta, and R. Vannithamby,“Smartphone energy drain in the wild: Analysis and implications,” inACM SIGMETRICS, 2015.

[18] W. Hu and G. Cao, “Energy optimization through traffic aggregation inwireless networks,” in IEEE INFOCOM, 2014.

[19] ——, “Quality-aware traffic offloading in wireless networks,” in ACMMobiHoc, 2014.

[20] D. Huggins-Daines, M. Kumar, A. Chan, A. W. Black, M. Ravishankar,and A. I. Rudnicky, “Pocketsphinx: A free, real-time continuous speechrecognition system for hand-held devices,” in IEEE ICASSP, 2006.

[21] S. C. Mukhopadhyay, “Wearable sensors for human activity monitoring:A review,” IEEE Sensors Journal, vol. 15, no. 3, pp. 1321–1330, March2015.

[22] W. Hu and G. Cao, “Energy-aware video streaming on smartphones,” inIEEE INFOCOM, 2015.

[23] L. Xiang, S. Ye, Y. Feng, B. Li, and B. Li, “Ready, set, go: Coalescedoffloading from mobile devices to the cloud,” in IEEE INFOCOM, 2014.

context-aware task ofﬂoading for wearable...

Documents