xpds13: performance evaluation of live migration based on xen arm pvh - jaeyong yoo, samsung
DESCRIPTION
Electricity charge for operating data centers is reaching approximately 27% of total operation cost. For this reason, ARM servers have been getting more attention for future energy-efficient data centers and the performance of ARM processors keeps increasing (i.e., almost 3GHz). For efficiently utilizing ARM cores, ARM PVH has been introduced in Xen 4.3, and based on this, we have implemented live migration feature and evaluated on top of dualcore ARM board. More specifically, we choose multimedia streaming workload, measure the maximum concurrent clients, and calculate clients per watt (CPW) as the performance metric. From this, we have found out that even dualcore ARM processor (with virtualization) gives higher CPW (7 CPW) over x86 case (6 CPW). In addition we could reduce the energy consumption around 70% (4-to-1 consolidation for low-loaded servers) by using server consolidation.TRANSCRIPT
Software Center
Performance Evaluation of Live Migration based on Xen ARM PVH for Energy-efficient ARM Server
2013-10-24
Jaeyong Yoo, Sangdok Mo, Sung-Min Lee, ChanJu Park, Ivan Bludov, Nikolay Martyanov
Software R&D Center
Samsung Electronics
Software Center
Contents
• Motivation
• Live Migration in Xen ARM PVH – Design and Implementation
• Performance Evaluation 1. Streaming service with ARM vs. x86 2. Streaming server consolidation with live migration 3. Streaming service with quad-core ARM board
• Concluding Remark
Software Center
Motivation
Software Center
Energy Problem in Datacenters
• Datacenters eat up magnificent amount of electricity
Ref: Jaroslav Rajić, ``Evolving Toward the Green Data Center,’’ http://stack.nil.si/ipcorner/GreenDC/#chapter2
Electricity (27%)
Service (13%)
Engineering & Installation (19%)
Power Equipment (17%)
Cooling Equipment (6%)
Space (17%)
Racks (3%)
Datacenter operation cost
Software Center
ARM Servers for Future Green Data Center • Economical choice
– Significant advantage in compute/watt
• Vendors of ARM Server Soc – AMD: Seattle (64-bit ARM server processor, 2H 2014) – Calxeda: ECX-1000 – Applied Micro: X-Gene
• OS for ARM Servers
– Linaro LEG – Redhat deploys ARM-Based Servers for Fedora Project
Calxeda Energy Core ECX-1000
AMD Seattle: 64-bit ARM server
Applied micro X-Gene
Software Center
ARM Servers for Future Green Data Center • Economical choice
– Significant advantage in compute/watt
• Vendors of ARM Server Soc – AMD: Seattle (64-bit ARM server processor, 2H 2014) – Calxeda: ECX-1000 – Applied Micro: X-Gene
• OS for ARM Servers
– Linaro LEG – Redhat deploys ARM-Based Servers for Fedora Project
Calxeda Energy Core ECX-1000
AMD Seattle: 64-bit ARM server
Applied micro X-Gene
Further energy efficiency maximization:
Server consolidation by virtualization
Software Center
Design and Implementation of Live Migration in Xen ARM PVH
Software Center
Overall Architecture
• Components for Live Migration in Xen ARM PVH
Dom0 DomU
libxl
libxc
xl
Kernel Kernel
streaming server
apache mysql
Hypervisor
VCPU
sa
ve/re
store
dirty
-page
dete
cting
get d
irty-
bitm
ap
HVM
conte
xt
save/re
store
Mem
ory d
ata
sa
ve/re
store
ARM-migrate
suspend /resume
Mem
ory
map
get/se
t
Legend
Newly Impleme
nted
Existing module
Hardware (Arndale)
Cortex-A15 Dualcore 1.7 GHz, 2GB Memory, SATA3, USB3.0
libvirt perform-migrate
Modified module
Software Center
Sequence of Live Migration
xl xc memory get map
memory restore
dirty detection
dirty bitmap
HVM save
VCPU save
DomU Suspend
xl xc memory set map
memory save
HVM restore
VCPU restore
DomU resume
migrate- receive domain
- save domain -restore
get/set memory map
start dirty- paging
store dirty- pages
get dirty bitmap save/restore memory contents
loop until stop-condition
suspend domU
last-dirty pages
save/restore HVM
save/restore VCPU
resume DomU
migration destination migration source
Software Center
Major Hypercalls for Live Migration
Functions Hypercalls Description
Memory Migration XENMEM_get/set_memory_map • Save/restore physical memory map of DomU
XEN_DOMCTL_shadow_op • Enable dirty-page detection • Get dirty-page bitmap
XENMEM_add_to_physmap_range • Access the domU’s memory from dom0
VCPU Migration XEN_DOMCTL_get/setvcpucontext • Save/restore the vcpu registers
HVM Migration XEN_DOMCTL_get/sethvmcontext • Save/restore the hvm contexts (e.g., timer, interrupt controller)
Implemented Hypercalls for Enabling Live Migration Feature in Xen ARM PVH
Software Center
Dirty-page Tracing: Get-dirty Bitmap
hypercall param from toolstack: dirty-page bitmap
libxc
ARM-migrate
Dirty-page detecting
get dirty-page bitmap
Temporary dirty-page storing
Filling up the dirty-page bitmap
dirty pages
XEN_DOMCTL_ shadow_op (peek dirty-
pages)
candidates:
1. Embedded in page table (use un-used bits in PTE)
2. Linked list of PFNs 3. Bitmap of PFNs
Software Center
Dirty-page Tracing: Dirty-page Detection
Level 1
Level 2
Level 3
Xen-side for Xen itself Xen-side
for domu
domu kernel
Level 1
Level 2
Level 3
Level 1
Level 2
Level 3
guest VA
IPA
MA
Guest page table
p2m: physical to machine page table
Xen page table
Software Center
Dirty-page Tracing: Dirty-page Detection
Level 1
Level 2
Level 3
Xen-side for domu
domu kernel
Level 1
Level 2
Level 3
Level 1
Level 2
Level 3
guest VA
IPA
MA
PTE
w=0
write bit=0/1
Xen page table
Guest page table
Xen-side for Xen itself
Software Center
Dirty-page Tracing: Dirty-page Detection
Level 1
Level 2
Level 3
Xen-side for domu
domu kernel
Level 1
Level 2
Level 3
Level 1
Level 2
Level 3
guest VA
IPA
MA
PTE
w=0
write bit=0/1
write request
fault traped by
xen
Xen page table
Guest page table
Xen-side for Xen itself
Software Center
Implementation Choice
• Manual walking of p2m table
• Virtual-linear page table
Software Center
Manual Walking of p2m Table
Level 1
Level 2
Level 3
Xen-side for Xen itself
Xen-side for domu
Level 1
Level 2
Level 3
IPA
MA
physical memory (a.k.a. machine memory)
create a mapping to Xen
(3 times)
PTE PTE
Superpage checking
w bit modification
Software Center
Virtual-linear Page Table
• Consider third-level page table as a continuous memory block in virtual address space
ref: http://www.technovelty.org/linux/virtual-linear-page-table.html
physical memory (a.k.a. machine memory)
virtual memory
※ virtually continous third-level page table (8GB DomU requires 16MB third-level page table)
3lvl PT #1
3lvl PT #2
3lvl PT #5
Level 1 Lev
el 2 Level 3
※ guest’s third-level page table
Xen page table
Software Center
Virtual-linear Page Table
• Consider third-level page table as a continuous memory block in virtual address space
physical memory (a.k.a. machine memory)
virtual memory
※ virtually continous third-level page table (8GB DomU requires 16MB third-level page table)
3lvl PT #1
3lvl PT #2
3lvl PT #5
Level 1 Lev
el 2 Level 3
※ guest’s third-level page table
Xen page table for given IPA, with some arithmetic, calculate the Xen VA and just read
it!
ref: http://www.technovelty.org/linux/virtual-linear-page-table.html
Software Center
Evaluation
Software Center
Experiment Environment (Hardware/Software)
power source Power meter (Yokogawa WT3000)
220v power clients 1G switch
x86 HW
Arndale board
Linux Linux
Linux
xen
Streaming Server
Streaming Server
Streaming Server Exp. Platform 1 Exp. Platform 2
Exp. Platform 2
• x86 hardware – 8 cores (i7-2600 3.4GHz) – Intel 1Gbps NIC – 4GB memory
• ARM
– Arndale board – 2 cores – 1Gbps Network card (USB 3.0) – SSD mSATA – 2GB memory
• Xen source: Xen 4.4 staging • Domain kernels:
– Dom0: Linaro kernel 3.11 – DomU: Linaro kernel 3.9
• Streaming server: – ffserver (RTSP streaming)
Software Center
Experiment Environment (Hardware/Software)
power source Power meter (Yokogawa WT3000)
220v power clients 1G switch
x86 HW
Arndale board
Linux Linux
Linux
xen
Streaming Server
Streaming Server
Streaming Server Exp. Platform 1 Exp. Platform 2
Exp. Platform 2
• x86 hardware – 8 cores (i7-2600 3.4GHz) – Intel 1Gbps NIC – 4GB memory
• ARM
– Arndale board – 2 cores – 1Gbps Network card (USB 3.0) – SSD mSATA – 2GB memory
• Xen source: Xen 4.4 staging • Domain kernels:
– Dom0: Linaro kernel 3.11 – DomU: Linaro kernel 3.9
• Streaming server: – ffserver (RTSP streaming)
Note: Major evaluations are performed within mobile-featured ARM board.
Performance evaluation of server-featured ARM board is presented at the end of the slides.
Software Center
Experiment Environment (Scenarios)
Test case 1: Streaming service with ARM vs. x86
Saturate the streaming server to get the maximum number of
streaming clients
Test case 2: Streaming server consolidation with live migration
10% of the maximum number of streaming clients
Measurement 1:
Maximum number of streaming clients for each test platform
Measurement 2:
Energy-efficiency comparison for each test platform
Measurement 1:
Energy-efficiency comparison for each test platform
Measurement 2:
Streaming server consolidation within xen-virtualized servers
Measurement 3:
Total live migration time, service downtime
Measurement 4:
Dirty-page detection time, dirty-page get-bitmap time,
total dirty-page counts
Test case 3: Streaming with quad-core ARM board
Maximum clients with varying number of ARM cores
(in-progress)
Software Center
Case 1: Streaming Service ARM vs. x86 (Maximum capacity of ARM virtualized Server)
• Max streaming clients with varying number of VMs – Dual-core ARM board
– Single VCPU for each VM
Number of VMs
Per VM Memory
Max Streaming Clients
Watt
1 512MB around ~110 14.8
2 512MB around ~80 12.6
3 256MB around ~90 14.5
4 256MB around ~80 11.8
Finding: ARM cores are major bottleneck
point
Software Center
Case 1: Streaming Service ARM vs. x86 (Energy-efficiency comparison to x86 hardware)
• Compare with the best case of ARM* virtualization
OS Total memory in server
Max Streaming Clients
Watt Client/Watt Required memory
x86 with Linux
4GB ~750 121.5 W 6.17 CPW ~ 2.4GB
ARM with native Linux
2GB ~200 11.7 W 17.09 CPW ~ 707MB
ARM with virtualization
512MB ~110 14.8 W 7.43 CPW ~ 340MB
* Dual-core ARM CPU Finding:
Even dual-core ARM with virtualization show higher CPW than x86
Software Center
Case 2: Streaming Server Consolidation of ARM virtualized server
• Scenario: – 4 ARM boards, each running a 256MB VM – Each VM has 10 clients – Consolidate all VMs to one ARM board, and turn off other 3
ARM boards
Watts before consolidation
Watts after consolidation
Energy saving percentage
2 to 1 consolidation
2 x 8w = 16w 8.6w 46% saving
[extrapolated] 3 to 1
3 x 8w = 24w 8.9w 63% saving
[extrapolated] 4 to 1
4 x 8w = 32w 9.4w 71% saving Finding:
Server consolidation can significantly save energy consumption
Software Center
Case 2: Live Migration Performance
• Migrate a VM at a time – With different domU memory size (128MB, 256MB, 512MB)
• Measurements: – Live migration time
• Whole time for live migration
– Total dirty pages • Number of dirtied pages during the time of live migration
Software Center
Case 2: Live Migration Performance
• Number of dirty-pages in iterations
configuration for stop-condition
max iter: 29
max_mem_factor: 3 min_dirty_per_iter: 50
Software Center
Case 2: Service downtime due to live migration • Service downtime
– The time that VM is not responding to outside interaction – Measurement method:
• flood-ping to migrating domain • time difference between packets send from the migrating domain
Software Center
Case 2: Performance of dirty-page detection • Measure the elapsed time of two major functions
– dirty-page detection
– dirty-page collection
Software Center
Case 3: Quad-core ARM board (In-progress)
• ARM board: 4 ARM cores with 8GB memory
Number of VMs
Per VM Memory
Max Streaming Clients
Watt CPW
1 1GB ~ 120 17.0 W 7.06 CPW
2 1GB ~250 18.5 W 13.51 CPW
3 1GB ~300 18.9 W 15.87 CPW
OS Total memory Max Streaming Clients
Watt Client/Watt
x86 with Linux
4GB ~750 121.5 W 6.17 CPW
• x86 case: (see slide 24)
Software Center
Concluding Remark
• ARM server is a good candidate for green data centers – Even ARM mobile processors with virtualization
results in better CPW compared to x86 – Virtualization in ARM servers can leverage the
energy efficiency by server consolidation
• Pass-through to DomU could significantly increase the performance