penn state cse “optimizing network virtualization in xen” aravind menon, alan l. cox, willy...

20
Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath

Upload: godwin-gallagher

Post on 31-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath

Penn State CSE

“Optimizing Network Virtualization in Xen”Aravind Menon, Alan L. Cox, Willy Zwaenepoel

Presented by : Arjun R. Nath

Page 2: Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath

Introduction

● Paper to appear in USENIX 2006– Alan L. Cox, Rice University– Aravind and Willy, EPFL, Lausanne

● Outline: Paper introduces three methods of optimization to existing Xen (2.0) design for improving networking efficiency:– Improving network interface– Faster I/0 channel for exchange of packets

between host and guest– Virtual memory modifications to reduce TLB

misses by guest domains

Page 3: Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath

Xen Network I/0 Architecture

Page 4: Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath

1. Improving the Virtual Network Interface

● Front end, virtualized network driver for Xen guests– Simple, low-level interface : allows support for a

large number of physical NICs.– However, this prevents the virtual driver from

using advanced NIC capabilities, such as :● Checksum offload● Scatter/Gather DMA support● TCP segmentation offload

– Improve by making use of some of these NIC capabilities if possible

Page 5: Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath

1. Improving the Virtual Network Interface

● TCP Segmentation Offload (or TCP Large Send) is when the NIC's buffer is much larger than the supported maximum transmission unit (MTU) of a given medium. The work of dividing the much larger packets into smaller packets can be offloaded to the NIC. (less CPU processing)

● Scatter/Gather I/O: Enables the OS to construct packets directly from the file system buffers without needing to copy them to contiguous memory location.

● Checksum Offload: Compute TCP data checksum on NIC hardware rather than in software.

Page 6: Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath

Modified Network I/O Architecture

Offload driver in Driver Domain

Page 7: Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath

Results of Network modifications

Offload benefits

Page 8: Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath

2. I/O Channel Improvements

Pre-modification operations

● Packet transfer between guest and domain is done via a zero-copy page remapping mechanism.

● The page containing the packet is remapped into the address space of the target domain

● This operation requires each packet to be allocated on a separate page.

● Each packet receive/transfer requires 3/2 address remaps and 2 memory alloc/dealloc operations

Page 9: Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath

2. I/O Channel Improvements

Optimizations● Transmit:

– Let driver domain examine MAC header of packet, check if destination is driver domain or broadcast.

– Then construct network packet from packet header and unmapped (maybe) packet fragments and send over bridge. (needs Gather DMA support in NIC)

● Receive:– Reduce small packet overheads by doing a data-

copy instead of page remap.– Implemented using shared pages between Dom0

and guest

Page 10: Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath

Results of I/0 channel changes

I/O channel optimization benefit (transmit)

Page 11: Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath

3. Virtual Memory Improvements

● Observations– High number of TLB misses for guest domains in

Xen for network operations compared to native Linux

– Possibly due to increase in working set size.– Absence of support for virtual memory primitives

such as superpage mappings and global page table mappings

Page 12: Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath

3. Virtual Memory Improvements

Optimizations

● Modified guest OS to use superpage mappings for a virtual address range only if associated physical pages are contiguous

● Modified memory allocator in guest OS tries to group together all memory pages into a contiguous range (within a superpage).

● Modify VMM to allow use of global page mappings in their address space. (limited benefit)

● Avoid TLB flush when switching between busy to idle and back to busy domains.

Page 13: Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath

Results for virtual memory changes

TLB measurements using Xenoprof

Page 14: Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath

Results for virtual memory changes

Data TLB measurements

Page 15: Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath

Virtual Memory Issues

● Transparent page sharing between Vms breaks contiguity of physical page frames. Bad for superpages (currently not implemented in Xen)

● Ballooning driver use also breaks contiguity of physical pages. (solution – have coarse grained ballooning in units of superpage size)

Page 16: Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath

Results Transmit (Overall)

Transmit throughput measurements

Page 17: Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath

Results Receive (Overall)

Receive Throughput measurements

Page 18: Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath

Results overall

● Transmit throughput of guests improved more than 300%

● Receive throughput improved by 30%

● Receive performance is still a bottleneck

Page 19: Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath

Questions to the Authors

● Why did they use Xen 2.0 rather than 3.x– They had started this work when Xen was at 2.0.6,

they continued working on it even after Xen 3.0 was announced.

● Will these optimizations be included in the Xen 3.x codebase ?– Yes, they are looking to do that (its not done yet).

● Any More ?

Page 20: Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath

Thanks !