vsnoop: improving tcp throughput in virtualized environments via acknowledgement offload ardalan...

35
vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department of Computer Science Purdue University

Upload: cecil-riley

Post on 31-Dec-2015

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

vSnoop: Improving TCP Throughput in Virtualized Environments

via Acknowledgement Offload

Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu

Department of Computer SciencePurdue University

Page 2: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

Cloud Computing and HPC

Page 3: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

Background and Motivation

Virtualization: A key enabler of cloud computing Amazon EC2, Eucalyptus

Increasingly adopted in other real systems: High performance computing

NERSC’s Magellan system Grid/cyberinfrastructure computing

In-VIGO, Nimbus, Virtuoso

Page 4: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

Multiple VMs hosted by one physical host Multiple VMs sharing the same core

Flexibility, scalability, and economy

VM Consolidation: A Common Practice

Hardware

Virtualization Layer

VM 1 VM 3 VM 4VM 2Key Observation:

VM consolidation negatively impacts network performance!

Page 5: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

Sender

Hardware

Virtualization Layer

Investigating the Problem

Server

VM 1 VM 2 VM 3

Client

Page 6: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

40

60

80

100

120

140

160

180

5432

RT

T (

ms)

Number of VMs

US East – West

US East – Europe

US West – Australia

RTT increases in proportion to VM scheduling slice

(30ms)

Q1: How does CPU Sharing affect RTT ?

Page 7: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

RTT Increase

Q2: What is the Cause of RTT Increase ?

Sender

Hardware

Driver Domain(dom0)

VM 1

Device Driver

VM 3

bufbuf

30ms

30ms

VM scheduling latency dominates

virtualization overhead!

CD

F

VM 2

buf

+ dom0 processing x wait time in buffer

Page 8: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

Connection to the VM is much slower than dom0!

Q3: What is the Impact on TCP Throughput ?

+ dom0 x VM

Page 9: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

Our Solution: vSnoop

Alleviates the negative effect of VM scheduling on TCP throughput

Implemented within the driver domain to accelerate TCP connections

Does not require any modifications to the VM

Does not violate end-to-end TCP semantics Applicable across a wide range of VMMs

Xen, VMware, KVM, etc.

Page 10: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

Sender VM1 BufferDriver Domain

time

SYN

SYN,ACK

SYN

SYN,ACK

VM1 buffer

TCP Connection to a VMScheduled VM

VM1

VM2

VM3

VM1

VM2

VM3

SYN,ACKSYN

VM Scheduling Latency

RTT

RTT

VM Scheduling Latency

Sender establishes a TCP connection to

VM1

Page 11: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

Sender VM Shared BufferDriver Domain

time

SYN

SYN,ACK

SYN

SYN,ACK

VM1 buffer

Key Idea: Acknowledgement OffloadScheduled VM

VM1

VM2

VM3

VM1

VM2

VM3

SYN,ACK

w/ vSnoop

Faster progress during TCP slowstart

Page 12: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

vSnoop’s Impact on TCP Flows

TCP Slow Start Early acknowledgements help progress connections

faster Most significant benefit for short transfers that are more

prevalent in data centers [Kandula IMC’09], [Benson WREN’09]

TCP congestion avoidance and fast retransmit Large flows in the steady state can also benefit from

vSnoop Benefit not as much as for Slow Start

Page 13: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

Challenge 1: Out-of-order/special packets (SYN, FIN packets)

Solution: Let the VM handle these packets

Challenge 2: Packet loss after vSnoop Solution: Let vSnoop acknowledge only if room in

buffer

Challenge 3: ACKs generated by the VM Solution: Suppress/rewrite ACKs already generated by

vSnoop

Challenge 4: Throttle Receive window to keep vSnoop online

Solution: Adjusted according to the buffer size

Challenges

Page 14: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

State Machine Maintained Per-Flow

Start

Unexpected Sequence

Active(online)

No buffer(offline)

Out-of-order packet

In-order pkt Buffer space

available

Out-of-order packet

In-order pktNo buffer

In-order pkt Buffer space available

No buffer

Packet recvEarly acknowledgements

for in-order packets

Don’t acknowledge

Pass out-of-order pkts to VM

Page 15: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

vSnoop Implementation in Xen

Driver Domain (dom0)

Bridge

Netfront

Netback

vSnoop

VM1

Netfront

Netback

VM3

Netfront

Netback

VM2

buf bufbuf

Tuning Netfront

Page 16: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

Evaluation

Overheads of vSnoop

TCP throughput speedup

Application speedup Multi-tier web service (RUBiS) MPI benchmarks (Intel, High-Performance

Linpack)

Page 17: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

Evaluation – Setup

VM hosts 3.06GHz Intel Xeon CPUs, 4GB RAM Only one core/CPU enabled Xen 3.3 with Linux 2.6.18 for the driver domain (dom0)

and the guest VMs

Client machine 2.4GHz Intel Core 2 Quad CPU, 2GB RAM Linux 2.6.19

Gigabit Ethernet switch

Page 18: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

vSnoop Routines

Single Stream Multiple Streams

Cycles CPU % Cycles CPU %

vSnoop_ingress() 509 3.03 516 3.05

vSnoop_lookup_hash()

74 0.44 91 0.51

vSnoop_build_ack() 52 0.32 52 0.32

vSnoop_egress() 104 0.61 104 0.61Per-packet CPU overhead for vSnoop routines in

dom0

vSnoop Overhead

Profiling per-packet vSnoop overhead using Xenoprof [Menon VEE’05]

Minimal aggregateCPU overhead

Page 19: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

Median

0.192MB/s

0.778MB/s

6.003MB/s

TCP Throughput Improvement 3 VMs consolidated, 1000 transfers of a

100KB file Vanilla Xen, Xen+tuning,

Xen+tuning+vSnoop30x Improvement

+ Vanilla Xen x Xen+tuning * Xen+tuning+vSnoop

Page 20: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

TCP Throughput: 1 VM/Core

0.00

0.20

0.40

0.60

0.80

1.00

10

0M

B

10

MB

1M

B

50

0K

B

25

0K

B

10

0K

B

50

KBNorm

aliz

ed T

hro

ughput

Transfer Size

Xen+tuning+vSnoopXen+tuningXen

Page 21: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

TCP Throughput: 2 VMs/Core

0.00

0.20

0.40

0.60

0.80

1.00

10

0M

B

10

MB

1M

B

50

0K

B

25

0K

B

10

0K

B

50

KBNorm

aliz

ed T

hro

ughput

Transfer Size

Xen+tuning+vSnoopXen+tuningXen

Page 22: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

TCP Throughput: 3 VMs/Core

0.00

0.20

0.40

0.60

0.80

1.00

10

0M

B

10

MB

1M

B

50

0K

B

25

0K

B

10

0K

B

50

KBN

orm

aliz

ed T

hro

ughput

Transfer Size

Xen+tuning+vSnoopXen+tuningXen

Page 23: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

TCP Throughput: 5 VMs/Core

0.00

0.20

0.40

0.60

0.80

1.00

10

0M

B

10

MB

1M

B

50

0K

B

25

0K

B

10

0K

B

50

KB

Norm

aliz

ed T

hro

ughput

Transfer Size

Xen+tuning+vSnoopXen+tuningXen

vSnoop’s benefit rises with higher VM consolidation

Page 24: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

TCP Throughput: Other Setup Parameters

CPU load for VMs Number of TCP connections to VM Driver domain on separate core Sender being a VM

vSnoop consistently achieves significant TCP

throughput improvement

Page 25: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

vSnoop

dom0

dom1 dom2

Server1

vSnoop

dom0

dom1 dom2

Server2Client

Client Threads

Application-Level Performance: RUBiS

RUBiS Clients

Apache MySQL

Page 26: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

RUBiS Operation Countw/o vSnoop

Countw/ vSnoop

%Gain

Browse 421 505 19.9%

BrowseCategories 288 357 23.9%

SearchItemsInCategory 3498 4747 35.7%

BrowseRegions 128 141 10.1%

ViewItem 2892 3776 30.5%

ViewUserInfo 732 846 15.6%

ViewBidHistory 339 398 17.4%

Others 3939 4815 22.2%

Total 12237 15585 27.4%

Average Throughput 29 req/s 37 req/s 27.5%

RUBiS Results

Page 27: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

Intel MPI Benchmark: Network intensive High-performance Linpack: CPU intensive

vSnoop

dom0

dom1 dom2

Server1

dom0

dom1 dom2

Server2

dom0

dom1 dom2

Server3

dom0

dom2

Server4

dom1

MPI nodes

Application-level Performance – MPI Benchmarks

vSnoop vSnoop vSnoop

Page 28: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

Intel MPI Benchmark Results: Broadcast

0.00

0.20

0.40

0.60

0.80

1.00

8M

B

4M

B

2M

B

1M

B

51

2K

B

25

6K

B

12

8K

B

64

KB

Norm

aliz

ed

Exe

cuti

on

Tim

e

Message Size

Xen+tuning+vSnoopXen+tuningXen

40% Improvement

Page 29: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

Intel MPI Benchmark Results: All-to-All

0.00

0.20

0.40

0.60

0.80

1.00

8M

B

4M

B

2M

B

1M

B

51

2K

B

25

6K

B

12

8K

B

64

KBN

orm

aliz

ed

Exe

cuti

on

Tim

e

Message Size

Xen+tuning+vSnoopXen+tuningXen

Page 30: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

40%

HPL Benchmark Results

0.000 0.200

0.400 0.600 0.800

1.000 1.200 1.400

1.600 1.800

(8K

,16

)

(8K

,8)

(8K

,4)

(8K

,2)

(6K

,16

)

(6K

,8)

(6K

,4)

(6K

,2)

(4K

,16

)

(4K

,8)

(4K

,4)

(4K

,2)

Gflop

s

Problem Size and Block Size (N,NB)

Xen+tuning+vSnoopXen

Page 31: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

Related Work

Optimizing virtualized I/O path Menon et al. [USENIX ATC’06,’08; ASPLOS’09]

Improving intra-host VM communications XenSocket [Middleware’07], XenLoop

[HPDC’08], Fido [USENIX ATC’09], XWAY [VEE’08], IVC [SC’07]

I/O-aware VM scheduling Govindan et al. [VEE’07], DVT [SoCC’10]

Page 32: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

Conclusions

Problem: VM consolidation degrades TCP throughput

Solution: vSnoop Leverages acknowledgment offloading Does not violate end-to-end TCP semantics Is transparent to applications and OS in VMs Is generically applicable to many VMMs

Results: 30x improvement in median TCP throughput About 30% improvement in RUBiS benchmark 40-50% reduction in execution time for Intel

MPI benchmark

Page 33: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

Thank you.

For more information:

http://friends.cs.purdue.edu/dokuwiki/doku.php?id=vsnoop

Or Google “vSnoop Purdue”

Page 34: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

TCP Benchmarks cont. Testing different scenarios:

a) 10 concurrent connections b) Sender also subject to VM

scheduling c) Driver domain on a separate core

a)

b)

c)

Page 35: VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

TCP Benchmarks cont. Varying CPU load for 3 consolidated VMs:

40% CPU load:

80% CPU load:

60% CPU load: