mars: adaptive remote execution scheduler for multithreaded mobile devices asaf cidon*, tomer m....
TRANSCRIPT
MARS: Adaptive Remote Execution Scheduler for Multithreaded Mobile Devices
Asaf Cidon*, Tomer M. London*, Sachin Katti, Christos Kozyrakis, Mendel Rosenblum
*Equal contributorsStanford University
New Class of Mobile Applications
October 23, 2011 Slide 2
Augmented Reality
Computer Vision
Motion Sensing
Mobile Client Trends• Mobile CPU performance increasing
– Hitting ‘energy wall’• Can we improve performance and reduce energy
consumption?• Opportunity: network bandwidth increase utilize the cloud
Slide 3October 23, 2011
802.11 Legacy
Mode
802.11b
802.11a
802.11g
802.11n - 40 M
Hz
802.11ac - 80 M
Hz (pro
jection)
1
10
100
1000
Evolution of Wi-Fi Bandwidth
Max
imum
Ban
dwid
th (M
b/s)
Static Client-Server PartitioningDoesn’t Work
• Dynamic resources:– Network bandwidth and latency– Available CPU, memory
• Same code, different platforms:– Smartphones (single-core, multi-core)– Tablets
October 23, 2011 Slide 4
MARS: Adaptive Remote Execution• Opportunistically offload computations to remote
server– Enhance computational capabilities– Decrease energy consumption
• Make dynamic decisions– Adapt to network and CPU variability
October 23, 2011 Slide 5Data CenterMobile Device
Agenda
1. Design of MARS2. Simulator Results and Analysis3. Conclusions
October 23, 2011 Slide 6
Existing Remote Execution Systems
October 23, 2011 Slide 7
The Unit ofRemote Execution
Target of Performance Optimization
RPC
VM
Single-thread application
Multi-threadedapplication
System
CloneCloud [Kirsch et al.,
‘11]
Cloudlets[Satyanarayanan
et al., ‘09]
MAUI [Cuervo et al. ‘10]
Chroma [Balan et al. ‘03]
Odessa [Ra et al. ‘11]
MARS“Cloud-on-
Chip”
Previous Systems:Application Partitioning
October 23, 2011 Slide 8
RPC 1Process 1
RPC 2Process 1
RPC 3Process 1
RPC 4Process 1
RPC 5Process 1
Local Execution Remote Execution
RPC 2Process 3
RPC 1Process 3
RPC 2Process 1
RPC 1Process 2
RPC 1Process 1
RPC Queue
LocalCores
RemoteCores
MARS “Cloud-on-Chip”:System Scheduling
Greedy Algorithm
Slide 9October 23, 2011
Higher POR: better performance gain from offloading
Higher EOR: better energy saving from offloading
PC)NetDelay(Rme(RPC)RemoteExTi
e(RPC)LocalExTimPOR(RPC)
)(RPCrgyNetworkEne
LocalPowere(RPC)LocalExTimEOR(RPC)
EOR ≥ ?
EOR < ?
Remote Server
Local Core
Controller Algorithm
Slide 10October 23, 2011
Priority Queue, sorted by Performance Offload Rank (POR)
Available
Available
EORLocal RemoteBoth
𝟏𝑮
Check EOR Threshold
G (Greediness) trades-off utilization
and energy efficiency
𝑮
RPC 2 (POR 0.4)
RPC 4 (POR 1.3)
RPC 6 (POR 1.8)
RPC 5 (POR 1.9)
RPC 3 (POR 2.5)
RPC 6 (POR 1.8)
Agenda
1. Design of MARS2. Simulator Results and Analysis3. Conclusions
October 23, 2011 Slide 11
Remote Execution Applications
Detection
Recognition
Pic
Barcode
Rendering
Pic
Slide 12
Barcode
Rendering
Pic
Barcode
Rendering
Pic
Detection
Recognition
Pic
Detection
Recognition
Pic
Augmented Reality Face Recognition
Simulator Methodology• Trace-driven simulation• Clients:
– Nokia N900 (single core)– NVIDIA Tegra 250 (multicore)
• Server:– Amazon EC2 Opteron 2007
• Networks:– Outdoors Wi-Fi– Indoors Wi-Fi– 3G
Slide 13June 4, 2011
MARS vs. Static Policies
Slide 14
Nokia N900 Power Consumption
• WiFi: Performance and energy are highly correlated• 3G: trade-off performance and energy
October 23, 2011 Slide 15
Wi-Fi 3GIdle Network Power 1.31 Watts 0.66 Watts
Upload Network Power
1.464 Watts 2.36 Watts
Download Network Power
1.39 Watts 2.26 Watts
Upload Network Power Overhead
10.51% 72.03%
Same Application, Different Networks
Slide 16
Remote Execution with Multicore
Slide 17October 23, 2011
Agenda
1. Design of MARS2. Simulator Results and Analysis3. Conclusions
October 23, 2011 Slide 18
Conclusions
1. Can’t always be greedy– Performance and energy trade-off
2. MARS is optimized for multiple parallel applications and cores
3. MARS “Cloud-on-Chip”: validation of system-level remote execution scheduling– 57% performance increase, 33% energy savings
October 23, 2011 Slide 19