aix-vug demystifying 10 gb ethernet performance · demystifying 10 gb ethernet performance...

66
AIX-VUG Demystifying 10 Gb Ethernet Performance Alexander Paul [email protected] ETS- Enhanced Technical Support

Upload: phunglien

Post on 29-Jun-2018

257 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Alexander [email protected] Enhanced Technical Support

Page 2: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Tools: How to benchmark ?

Method 1: AIX FTP Client and Serverftp> put "| dd if=/dev/zero bs=1M count=1000" /dev/n ull

1048576000 bytes sent in 5.991 seconds (1.709e+05 Kbytes/s)

Warning: ftp client and ftpd server are single-threaded k ernel processes!Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd 16MB

9240598 ftpd 24497 9668 0 23940 N N N

8782016 ftp 23847 9668 0 23766 N N N

Running multiple ftp client sessions in parallel to get the desired overall throughputExample:

~1.3 Gbit/s

chmod 400 .netrc

for i in 1 2 3 4 5 6 7 8> do> ftp 10gbench2 | grep seconds &> done

#vi .netrcmachine 10gbench2 login root password foomacdef initput "| dd if=/dev/zero bs=1M count=1000" /dev/nullbye<insert a blank line here>

2

Page 3: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Tools: How to benchmark ?Method 2: iperf• Open Source network benchmarking tool, written in C

• Easy to use for determining network throughput• TCP and UDP benchmarks possible• Multithreaded program• iperf binary can be started in client- or server-mode

• Available in a compiled version for AIX @Perzl.org: http://www.oss4aix.org/download/RPMS/iperf/

# iperf -c localhost -t 60 -P 8--------------------------------------------------- ---------Client connecting to localhost, TCP port 5001TCP window size: 132 KByte (default)[ ID] Interval Transfer Bandwidth[ 10] 0.0-60.0 sec 40.7 GBytes 5.82 Gbits/sec[ 3] 0.0-60.0 sec 40.6 GBytes 5.81 Gbits/sec[ 4] 0.0-60.0 sec 40.4 GBytes 5.79 Gbits/sec[ 5] 0.0-60.0 sec 40.5 GBytes 5.80 Gbits/sec[ 6] 0.0-60.0 sec 40.9 GBytes 5.86 Gbits/sec[ 7] 0.0-60.0 sec 40.7 GBytes 5.83 Gbits/sec[ 8] 0.0-60.0 sec 40.6 GBytes 5.82 Gbits/sec[ 9] 0.0-60.0 sec 40.7 GBytes 5.82 Gbits/sec[SUM] 0.0-60.0 sec 325 GBytes 46.5 Gbits/sec

#iperf –s--------------------------------------------------- ---------Server listening on TCP port 5001TCP window size: 16.0 KByte (default)--------------------------------------------------- ---------

3

Page 4: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Tools: How to benchmark ?Method 2.1: jperf• Graphical Java front end for iperf

• Requires preinstallation of iperf on sender and receiver site• GUI initiates iperf with options and in client or server mode• Corresponding site can use iperf in GUI or CLI mode

TCP settingsBuffer Length,Window SizeMSSNo Delay

Control panel

iperf –c trade3 –P 4 –i 1 –p 5001 –f k –t 10

Save and loadbenchmark runs

Graphical output

CLI output

4

Page 5: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Tools: How to benchmark ?Method 3: netperfMore advanced Open Source server and client toolset, wr itten in C� Installing Netperf:� Get netperf source code from ftp://ftp.netperf.org/netperf� gunzip netperf-2.5.0.tar.gz

� tar -xvf netperf-2.5.0.tar

� cd netperf-2.5.0

� ./configure CFLAGS="-Wl,-bnoobjreorder -lperfstat" --enable-burst

� make ; make install

#netperf -H 192.168.50.1 -t TCP_STREAM -v 0 -f m -i 3MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.50.1 (192.168.50.1) port 0 AF_INET : +/-2.500% @ 99% conf.1724.45

#netperf -H 192.168.50.1 -t TCP_RR -v 50MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.50.1 (192.168.50.1) port 0 AF_INET : first burst 0Alignment Offset RoundTrip Trans ThroughputLocal Remote Local Remote Latency Rate 10^6bits/sSend Recv Send Recv usec/Tran per sec Outbound Inbound

8 0 0 0 66.075 15134.359 0.121 0.12

Throughput benchmark:

TCP RTT benchmark:

Throughput in Mbit/s

TCP Round Trip Time in µs/Transaction

5

Page 6: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Lets test the performance of a #5287 adapterin a Power 720 system…

6

Adapter: • #5287 PCIe2 2-port 10GbE SR• Part Number: 74Y2094• Emulex chipset

Throughputtest

Benchmark system : • Power 720 (8202-E4C)• 8 P7 cores 3024 MHz• FW AL740_100

Description IBM Part Numbers

PCIe2 (Gen2) Low Profile 2-Port 10GbE SR FC 5284

PCIe2 (Gen2) Low Profile 2-Port 10GbE SFP+ Copper FC 5286

PCIe2 (Gen2) Full Height 2-Port 10GbE SR FC 5287

PCIe2 (Gen2) Full Height 2-Port 10GbE SFP+ Copper FC 528

Page 7: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Benchmark environment

AIX LPAR 2

Virt.Eth.

Virtual I/O Server

Virt.Eth.

AIX PAR 1

vSwitch

PVID 10PVID 10

Power 720 - 8202-E4CPower 750 - 8408-E8D

10GbESR

10GbESR

SEA Ether-channel

7

10GbESR

#528710 GbENetwork

AIX LPAR 1 AIX LPAR 2

~ 9 Gbit/s

~ 3 Gbit/s

Where is the problem?

Page 8: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Forwardingmemcpy

Ethernet Switching on IBM Power Systems

� Switching is a hypervisor objective and includes the following major layer 2 tasks:– Frame forwarding is performed as memory transfer, initiated with a

H_SEND_LOGICAL_LAN call at the sending side– Source MAC-Address learning from incoming ethernet frames– Broadcasting and multicast forwarding– Frame queuing and forwarding in two directions:

• Incoming: Frames received by the Hypervisor• Outgoing: Frames delivered to a Virtual Ethernet Adapter

– Processing of header information for IEEE 802.1q tagged frames (VLAN / CoS)

Client LPAR level Hypervisor level Client LPAR levelSending Direction Sending Direction

8

Page 9: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput baseline Virtual Ethernet - EC=0.4, capped

� Benchmark with 8 parallel TCP sessions� Configuration:

– Client LPAR :• Power 770 9117-MMB• AIX 6.1 TL6 SP 3• EC=0.4 Units, capped• 2 VPs• Virtual Ethernet Adapter, MTU 1500

– Server LPAR :• Power 770 9117-MMB (Same as client)• AIX 6.1 TL6 SP 3• EC=3.0 Units, uncapped• 4 VPs• Virtual Ethernet Adapter, MTU 1500

Virt.Eth.

Virt.Eth.

PHYPSwitch

Client LPAR Server LPAR

VLAN 1

PVID 1 PVID 1

Traffic direction

cappeduncapped

What do you think is the resulting throughput?

9

Page 10: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput baseline Virtual Ethernet - EC=0.4, capped

MTU 1500:

Client connecting to 192.168.2.3, TCP port 5001

TCP window size: 256 KByte (default)

--------------------------------------------------- ---------

[ ID] Interval Transfer Bandwidth

[ 4] 0.0-300.0 sec 1.97 GBytes 56.5 Mbits/sec

[ 8] 0.0-300.0 sec 2.12 GBytes 60.6 Mbits/sec

[ 5] 0.0-300.0 sec 1.94 GBytes 55.6 Mbits/sec

[ 10] 0.0-300.0 sec 2.00 GBytes 57.4 Mbits/sec

[ 3] 0.0-300.0 sec 1.98 GBytes 56.8 Mbits/sec

[ 9] 0.0-300.0 sec 1.87 GBytes 53.5 Mbits/sec

[ 6] 0.0-300.0 sec 1.93 GBytes 55.2 Mbits/sec

[ 7] 0.0-300.0 sec 1.95 GBytes 55.8 Mbits/sec

[SUM] 0.0-300.0 sec 15.8 GBytes 451 Mbits/sec

10

Page 11: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput in dependency to CPU time for Virtual Ethernet

Throughput Virtual Ethernet MTU 1500

0

0,2

0,4

0,6

0,8

1

1,2

0 0,2 0,4 0,6 0,8 1 1,2 1,4 1,6 CPU Units

Throughput [Gbps]

ThroughputMTU 1500

Maximum throughput ~ 1,25 Gbit/s

Baseline 0.4 CPU Units with 451 Mbit/s

� Benchmark with 8 parallel TCP sessions

� Configuration:– Client LPAR :

• Power 770 9117-MMB• AIX 6.1 TL6 SP 3• capped• 2 VPs• Virtual Ethernet Adapter, MTU 1500

– Server LPAR :• Power 770 9117-MMB (Same as client)• AIX 6.1 TL6 SP 3• uncapped , EC=3.0 Units• 4 VPs• Virtual Ethernet Adapter, MTU 1500

11

Page 12: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Virtual Processor dispatching

---------------------------------------------------------------Proc0 Proc465.91% 0.01%

cpu0 cpu1 cpu2 cpu3 cpu4 cpu5 cpu6 cpu732.81% 16.36% 8.24% 8.50% 0.00% 0.00% 0.00% 0.01%

------------------------------------------------------------------------------------------------------------------------------

Proc0 Proc452.82% 14.43%

cpu0 cpu1 cpu2 cpu3 cpu4 cpu5 cpu6 cpu726.09% 12.18% 7.21% 7.34% 5.33% 3.27% 3.02% 2.82%

-------------------------------------------------------------

0,68

0,7

0,72

0,74

0,76

0,78

0,8

0,82

0,59 0,64 0,69 0,74 0,79

CPU units

TP[Gbps]

mpstat –sphysc=0,66

mpstat –sphysc=0,68

physc=0,66 physc=0,68

minus 19 Mbit/s

12

Page 13: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

0

0,2

0,4

0,6

0,8

1

1,2

1,4

1,6

0 0,2 0,4 0,6 0,8 1 1,2 1,4 1,6 1,8 2

Throughput [Gbps]

CPU Units

Throughput Virtual Ethernet MTU 1500

TP 9117-MMBdefault

"TP 8202-E4Cdefault"

Throughput in dependency to CPU time for Virtual Ethernet

Maximum throughput ~ 1,6 Gbit/s

Baseline on E4C w. 0.4 CPU units with

991 Mbit/s

� Benchmark with 8 parallel TCP sessions

� Configuration:– Client LPAR :

• Power 720 8202-E4C• AIX 7.1 TL1 SP 3• EC=0.4 Units, capped• 2 VPs• Virtual Ethernet Adapter, MTU 1500

– Server LPAR :• Power 720 8202-E4C (Same as client)• AIX 7.1 TL1 SP 3• EC=3.0 Units, uncapped• 4 VPs• Virtual Ethernet Adapter, MTU 1500

13

Page 14: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Virtual Processor dispatching

---------------------------------------------------------------Proc0 Proc444.98% 0.01%

cpu0 cpu1 cpu2 cpu3 cpu4 cpu5 cpu6 cpu722.39% 11.16% 5.62% 5.80% 0.00% 0.00% 0.00% 0.01%

------------------------------------------------------------------------------------------------------------------------------

Proc0 Proc439.22% 10.72%

cpu0 cpu1 cpu2 cpu3 cpu4 cpu5 cpu6 cpu719.37% 9.04% 5.35% 5.45% 3.96% 2.43% 2.24% 2.09%

-------------------------------------------------------------

CPU units

TP[Gbps]

mpstat –sphysc=0,45

mpstat –sphysc=0,50

physc=0,45 physc=0,50

minus 210 Mbit/s

0,8

0,85

0,9

0,95

1

1,05

1,1

1,15

1,2

0,36 0,41 0,46 0,51 0,56 0,61 0,66 0,71 0,76

14

Page 15: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

What is the reason for the limited maximum throughput?

a->b: b->a:total packets: 8081 total packet s: 3639ack pkts sent: 8081 ack pkts sen t: 3639pure acks sent: 0 pure acks se nt: 3639sack pkts sent: 0 sack pkts se nt: 0dsack pkts sent: 0 dsack pkts s ent: 0max sack blks/ack: 0 max sack blk s/ack: 0unique bytes sent: 11698392 unique bytes sent: 0actual data pkts: 8081 actual data pkts: 0actual data bytes: 11701288 actual data bytes: 0rexmt data pkts: 2 rexmt data pkts: 0rexmt data bytes: 2896 rexmt data bytes: 0zwnd probe pkts: 0 zwnd probe p kts: 0zwnd probe bytes: 0 zwnd probe b ytes: 0outoforder pkts: 2 outoforder p kts: 0pushed data pkts: 5 pushed data pkts: 0SYN/FIN pkts sent: 0/0 SYN/FIN pkts sent: 0/0req 1323 ws/ts: N/Y req 1323 ws/ ts: N/Yurgent data pkts: 0 pkts urgent data pkts: 0 pktsurgent data bytes: 0 bytes urgent data bytes: 0 bytesmss requested: 0 bytes mss requeste d: 0 bytesmax segm size: 1448 bytes max segm siz e: 0 bytesmin segm size: 1448 bytes min segm siz e: 0 bytesavg segm size: 1447 bytes avg segm siz e: 0 bytesmax win adv: 32761 bytes max win adv: 65522 bytesmin win adv: 32761 bytes min win adv: 38734 byteszero win adv: 0 times zero win adv : 0 timesavg win adv: 32761 bytes avg win adv: 65509 bytesinitial window: 1448 bytes initial wind ow: 0 bytesinitial window: 1 pkts initial wind ow: 0 pktsttl stream length: NA ttl stream l ength: NAmissed data: NA missed data: NAtruncated data: 11588154 bytes truncated da ta: 0 bytestruncated packets: 8081 pkts truncated pa ckets: 0 pktsdata xmit time: 3.929 secs data xmit ti me: 0.000 secsidletime max: 211.6 ms idletime max: 211.6 msthroughput: 2977123 Bps throughput: 0 Bps

RTT samples: 2706 RTT samples: 0RTT min: 11.7 ms RTT min: 0.0 msRTT max: 46.9 ms RTT max: 0.0 msRTT avg: 25.0 ms RTT avg: 0.0 msRTT stdev: 6.9 ms RTT stdev: 0.0 ms

Less than 0,03 % of retransmissions

Sufficient TCP buffer space on receiving site

Gaps with idle time and no use of SACK results in “relativly” high segment Round trip times (RTT)

Starting with aTCP trace analysis:

15

Page 16: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

What is the reason for the limited throughput?

Hypervisor Calls Summary------------------------

Count Total Time % sys Avg Time Min Time Max Time Tot ETime Avg ETime Min ETime Max ETime HCALL (Caller Address) (msec) time (msec) (msec) (msec) (msec) (msec) (msec) (msec)

======== =========== ====== ======== ======== ======== ======== ========= ========= ========= ========================30419 157.0195 0.63% 0.0052 0.0004 0.0301 363.9006 0.0120 0.0019 7.5224 H_SEND_LOGICAL_LAN((unk nown) 41977e8)18173 26.2612 0.11% 0.0014 0.0005 0.0162 38.4263 0.0021 0.0007 6.0742 H_ADD_LOGICAL_LAN_BUFFE R((unknown) 4191d04)

3189 2.7187 0.01% 0.0009 0.0005 0.0050 2.7187 0.0009 0.0005 0.0050 H_PROD((unknown) 6ffb8)693 1.1146 0.00% 0.0016 0.0010 0.0035 1.1146 0.0016 0.0010 0.0035 H_XIRR((unknown) 41187cc)689 0.7688 0.00% 0.0011 0.0005 0.0026 0.7688 0.0011 0.0005 0.0026 H_EOI((unknown) 41149b8)689 0.3535 0.00% 0.0005 0.0003 0.0046 0.3535 0.0005 0.0003 0.0046 H_CPPR((unknown) 4112b08)

Hypervisor Calls Summary------------------------

Count Total Time % sys Avg Time Min Time Max Time Tot ETime Avg ETime Min ETime Max ETime HCALL (Caller Address) (msec) time (msec) (msec) (msec) (msec) (msec) (msec) (msec)

======== =========== ====== ======== ======== ======== ======== ========= ========= ========= ========================27187 133.6836 4.39% 0.0049 0.0005 0.0221 133.9216 0.0049 0.0020 0.0221 H_SEND_LOGICAL_LAN((unk nown) 41977e8)13489 16.9196 0.56% 0.0013 0.0008 0.0129 16.9269 0.0013 0.0008 0.0129 H_ADD_LOGICAL_LAN_BUFFE R((unknown) 4191d04)

2104 3.4490 0.11% 0.0016 0.0006 0.0081 3.4490 0.0016 0.0006 0.0081 H_PROD((unknown) 6ffb8)502 0.7127 0.02% 0.0014 0.0009 0.0026 0.7127 0.0014 0.0009 0.0026 H_XIRR((unknown) 41187cc)501 0.5384 0.02% 0.0011 0.0007 0.0157 0.5384 0.0011 0.0007 0.0157 H_EOI((unknown) 41149b8)501 0.1983 0.01% 0.0004 0.0003 0.0009 0.1983 0.0004 0.0003 0.0009 H_CPPR((unknown) 4112b08)

� Kernel trace curt report from client with 0.4 CPU u nits, capped:

� Kernel trace curt report from client with 1.3 CPU u nits:

16

Page 17: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

CPU consumption from an

overall perspective

Virt.Eth.

Virtual I/O Server Server LPAR

Virt.Eth.

Virtual I/O Server

Virt.Eth.

Client LPAR

vSwitch

PVID 1PVID 1

Power 720 - 8202-E4CPower 770 - 9117-MMB

10 GbENetwork

10GbESR

10GbESR

SEA Ether-channel

Virt.Eth.

10GbESR

SEAEther-

channel

10GbESR

PVID 1

vSwitch

PVID 1

17

Page 18: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Anatomy of seaproc

� seaproc is a 64 bit, multithreaded kernel process� Each active Shared Ethernet Adapter runs a dedicated seaproc instance� seaproc needs CPU cycles for bridging activity

� The efficiency of a particular Shared Ethernet Adapter depends on how the corresponding seaproc threads can perform

# ps -alk | grep seaproc40303 A 0 3080304 1 0 37 -- 86c0bb190 1024 * - 0:00 seaproc40303 A 10 3801156 1 0 37 -- 87cc7f190 1024 * - 22:47 seaproc40303 A 0 3866764 1 0 37 -- 82c0cb190 1024 * - 126:14 seaproc

while true; do ps -lm -p 13369558 -o THREAD;sleep 2 ;done

USER PID PPID TID ST CP PRI SC W CHAN F TT BND COMMANDroot 13369558 1 - A 71 37 7 * 40303 - - seaproc

- - - 5963991 S 0 37 1 f1000a001c3c1318 1400 - - -- - - 6160625 S 0 37 1 f1000a001bd00c78 1400 - - -- - - 15663143 S 0 37 1 f1000a001c060fc8 1400 - - -- - - 18153487 S 0 37 1 f1000a001beb0e20 1400 - - -- - - 18350131 R 36 37 1 - 1000 - - -- - - 22413341 S 0 37 1 f1000a001c5714c0 1400 - - -- - - 24117287 R 35 37 1 - 1400 - - -

18

Page 19: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Sizing diagram for shared CPU units in a SEA environment

0,4

0,95

0,73

0,74

0

0,5

1

1,5

2

2,5

3

1

CP

U U

nits

Overall CPU utilization

Sending VIOS

Receiving VIOS

Server LPAR

Client LPAR

Throughput: 930 Mbit/sHow much CPU units are needed for

a particularthroughput?

Which instances areinvolved in network

activity from an overall perspective?

Numbers are dependent on Power Systems Model and hardware configuration

2,82

Page 20: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Overall CPU consumption in a 10GbE PCIe2 environment with SEA

Max. ~1,6 Gbit/s

TP [Gb/s]

20

Page 21: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Jumbo Frames� The term Jumbo Frame specifies a payload size of more than 1500 bytes and up to

9000 bytes encapsulated within one ethernet frame� Jumbo Frames can significantly reduce the cpu time for data forwarding� Using Jumbo Frames has no effect for data packets, smaller than 1500 bytes

of payload content� Jumbo Frames must be implemented on an end-to-end basis� Networking equipment (physical and virtual) on all potential paths between sender and receiver

must be configured for Jumbo Frame– Internally on Power Systems (Examples will be in the following slides)

• Virtual Ethernet Adapters in client partitions• Shared Ethernet Adapters in VIO-Servers• Etherchannel devices• Physical network adapters

– At data center level:• Access Layer Switches• Aggregation and core Multilayer-Switches and Router• Security devices like Firewalls and Intrusion Detection Systems

� Layer 3 devices like Routers and Firewalls can fragment Jumbo Frames into smaller MTU sizeddata packets but with impact on performance (cf. see Andrew S. Tanenbaum: Computer Networks)

21

Page 22: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Using Jumbo Frames within Shared Ethernet Adapter setup

� Benchmark with 8 parallel TCP sessions� Configuration:

– Managed System: Power 720 - 8202-E4C– Client LPAR :

• AIX 7.1 TL1 SP 3, capped , weight 128 Units, 2 VPs• Virtual Ethernet Adapter, MTU 9000

10GbESR

Virt.Eth.

Virt.Eth.

PHYP

Virtual I/O Server 2 Server LPAR

PVID 1

SEA10GbE

SR

Virt.Eth.

Virtual I/O Server 1

SEA

Virt.Eth.

Client LPAR

vSwitch2

PVID 1

vSwitch1

PVID 1PVID 1

Power 720 - 8202-E4C

10 GbENetwork

To be tuned for MTU 9000

Server LPAR:AIX 7.1 TL1 SP 3, uncapped , weight 128, 4VPsVirtual Ethernet Adapter, MTU 9000

Virtual I/O Servers:EC=2.0 Units, uncapped, weight 255PCIe2 2-port 10GbE SR Adapter

22

Page 23: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Using Virtual Ethernet Adapters with MTU 9000

Virt.Eth.

Virt.Eth.

PHYP

Client LPAR Server LPAR

VLAN 1

PVID 1 PVID 1

� MTU size for Virtual Ethernet Adapters can by dynamically changed from 1500 byte (default) to 9000 byte.

#lsattr -El en0alias4alias6arp onauthoritybroadcastmtu 9000netaddr 10.31.203.194netaddr6netmask 255.255.255.0prefixlenremmtu 576rfc1323[…]

# chdev –l en0 –a mtu=9000

mtu 9000

23

Page 24: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Client and VIOS setup with Jumbo Frame support� Checklist for configuring MTU 90001. @VIOS: Bring SEA down:

vios$ rmdev -dev <SEA> –ucfgentS Defined

2. @VIOS: Enable Jumbo Frame support for the Real Adapter:vios$ chdev -dev <Real> -attr jumbo_frames=yesentR changed

3. @VIOS: Enable Jumbo Frame support for Shared Ethernet Adapter:vios$ chdev -dev <SEA> -attr jumbo_frames=yesentS changed

4. @Clients: Change MTU size for Vent interfaces to 9000 byte:client# chdev -l en0 -a mtu=9000en0 changed

5. @VIOS: Reactivate SEAvios$ cfgdev -dev <SEA>vios$ lsdev | grep <SEA>entS Available Shared Ethernet Adapter

24

Page 25: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput baseline Virtual Ethernet - EC=0.4, capped

MTU 9000:

Client connecting to 192.168.2.3, TCP port 5001

TCP window size: 262 KByte (default)

--------------------------------------------------- ---------

[ ID] Interval Transfer Bandwidth

[ 10] 0.0-300.0 sec 7.21 GBytes 207 Mbits/sec

[ 3] 0.0-300.0 sec 7.30 GBytes 209 Mbits/sec

[ 4] 0.0-300.0 sec 7.27 GBytes 208 Mbits/sec

[ 5] 0.0-300.0 sec 7.22 GBytes 207 Mbits/sec

[ 9] 0.0-300.0 sec 7.12 GBytes 204 Mbits/sec

[ 7] 0.0-300.0 sec 7.23 GBytes 207 Mbits/sec

[ 8] 0.0-300.0 sec 7.19 GBytes 206 Mbits/sec

[ 6] 0.0-300.0 sec 7.25 GBytes 208 Mbits/sec

[SUM] 0.0-300.0 sec 57.8 GBytes 1.65 Gbits/sec

25

Page 26: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput in dependency to CPU time for Virtual Ethernet

Significant throughput scale by factor ~ x3

� Benchmark with 8 parallel TCP sessions

� Configuration:– Client LPAR :

• Power 770 9117-MMB• AIX 6.1 TL6 SP 3• capped• 2 VPs• Virtual Ethernet Adapter, MTU 9000

– Server LPAR :• Power 770 9117-MMB (Same as client)• AIX 6.1 TL6 SP 3• uncapped , EC=3.0 Units• 4 VPs• Virtual Ethernet Adapter, MTU 9000

26

Page 27: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput baseline Virtual Ethernet - EC=0.4, capped

� Benchmark with 8 parallel TCP sessions� Configuration:

– Client LPAR :• Power 720 8202-E4C• AIX 7.1 TL1 SP 3• EC=0.4 Units, capped• 2 VPs• Virtual Ethernet Adapter, MTU 1500

– Server LPAR :• Power 720 8202-E4C (Same as client)• AIX 7.1 TL1 SP 3• EC=3.0 Units, uncapped• 4 VPs• Virtual Ethernet Adapter, MTU 1500

Virt.Eth.

Virt.Eth.

PHYPSwitch

Client LPAR Server LPAR

VLAN 1

PVID 1 PVID 1

Traffic direction

cappeduncapped

27

Page 28: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput in dependency to CPU time for Virtual Ethernet� Benchmark with 8 parallel TCP sessions

� Configuration:– Client LPAR :

• Power 720 8202-E4C• AIX 7.1 TL1 SP 3• EC=0.4 Units, capped• 2 VPs• Virtual Ethernet Adapter, MTU 9000

– Server LPAR :• Power 720 8202-E4C (Same as client)• AIX 7.1 TL1 SP 3• EC=3.0 Units, uncapped• 4 VPs• Virtual Ethernet Adapter, MTU 9000

0,00

1,00

2,00

3,00

4,00

5,00

6,00

7,00

0 0,2 0,4 0,6 0,8 1 1,2 1,4 1,6 1,8 2

Throughput [Gbps]

CPU Units

Throughput Virtual Ethernet MTU 9000

TP 8102-E4CMTU 9000

"TP 8202-E4CMTU1500"

Average scale ~ x3,3

28

Page 29: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Overall CPU consumption in a 10GbE PCIe2 environment with SEA

Here was the limit w. MTU 1500

TP [Gb/s]

29

Page 30: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Virtual Processor dispatching

---------------------------------------------------------------Proc0 Proc444.96% 0.00%

cpu0 cpu1 cpu2 cpu3 cpu4 cpu5 cpu6 cpu722.38% 11.16% 5.62% 5.80% 0.00% 0.00% 0.00% 0.00%

------------------------------------------------------------------------------------------------------------------------------

Proc0 Proc447.04% 12.86%

cpu0 cpu1 cpu2 cpu3 cpu4 cpu5 cpu6 cpu723.23% 10.85% 6.42% 6.54% 4.75% 2.91% 2.69% 2.51%

-------------------------------------------------------------

CPU units

TP[Gbps]

mpstat –sphysc=0,45

mpstat –sphysc=0,60

physc=0,45 physc=0,60

minus 750 Mbit/s

2,50

3,00

3,50

4,00

4,50

5,00

0,38 0,43 0,48 0,53 0,58 0,63 0,68 0,73 0,78

30

Page 31: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput baseline Virtual Ethernet MTU 65390 - EC=0.4, capped

� Benchmark with 8 parallel TCP sessions� Configuration:

– Client LPAR :• Power 770 9117-MMD• AIX 7.1 TL2 SP 2• EC=0.4 Units, capped• 4 VPs• Virtual Ethernet Adapter, MTU 65390

– Server LPAR :• Power 770 9117-MMD (Same as client)• AIX 7.1 TL2 SP 2• EC=3.0 Units, uncapped• 4 VPs• Virtual Ethernet Adapter, MTU 65390

Virt.Eth.

Virt.Eth.

PHYPSwitch

Client LPAR Server LPAR

VLAN 1

PVID 1 PVID 1

Traffic direction

cappeduncapped

31

Page 32: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput baseline Virtual Ethernet MTU 65390 - EC=0.4, capped

MTU 65390:

Client connecting to 192.168.10.81, TCP port 5001

TCP window size: 319 KByte (default)

--------------------------------------------------- ---------

[ ID] Interval Transfer Bandwidth

[ 3] 0.0-300.0 sec 13.7 GBytes 391 Mbits/sec

[ 5] 0.0-300.0 sec 12.4 GBytes 356 Mbits/sec

[ 8] 0.0-300.0 sec 13.3 GBytes 382 Mbits/sec

[ 7] 0.0-300.0 sec 14.3 GBytes 410 Mbits/sec

[ 10] 0.0-300.0 sec 14.2 GBytes 407 Mbits/sec

[ 6] 0.0-300.0 sec 15.0 GBytes 430 Mbits/sec

[ 9] 0.0-300.0 sec 12.9 GBytes 368 Mbits/sec

[ 4] 0.0-300.0 sec 11.9 GBytes 342 Mbits/sec

[SUM] 0.0-300.0 sec 108 GBytes 3.09 Gbits/sec

32

Page 33: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput baseline Virtual Ethernet MTU 65390 - uncapped

MTU 65390:

Client connecting to 192.168.10.81, TCP port 5001

TCP window size: 319 KByte (default)

--------------------------------------------------- ---------

[ ID] Interval Transfer Bandwidth

[ 10] 0.0-300.0 sec 86.2 GBytes 2.47 Gbits/sec

[ 3] 0.0-300.0 sec 86.1 GBytes 2.47 Gbits/sec

[ 4] 0.0-300.0 sec 86.3 GBytes 2.47 Gbits/sec

[ 5] 0.0-300.0 sec 86.2 GBytes 2.47 Gbits/sec

[ 6] 0.0-300.0 sec 86.1 GBytes 2.47 Gbits/sec

[ 7] 0.0-300.0 sec 85.8 GBytes 2.46 Gbits/sec

[ 8] 0.0-300.0 sec 86.2 GBytes 2.47 Gbits/sec

[ 9] 0.0-300.0 sec 86.1 GBytes 2.47 Gbits/sec

[SUM] 0.0-300.0 sec 689 GBytes 19.7 Gbits/sec

33

Page 34: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Segmentation offload / aggregation within Shared Ethernet Adapter setup� Benchmark with 8 parallel TCP sessions� Configuration:

– Managed System: Power 720 - 8202-E4C– Client LPAR :

• AIX 7.1 TL1 SP 3, capped , weight 128 Units, 2 VPs• Virtual Ethernet Adapter, MTU 1500

– Server LPAR:• AIX 7.1 TL1 SP 3, uncapped , weight 128, 4VPs• Virtual Ethernet Adapter, MTU 1500

– Virtual I/O Servers:• EC=2.0 Units, uncapped, weight 255• PCIe2 2-port 10GbE SR Adapter

10GbESR

Virt.Eth.

Virt.Eth.

PHYP

Virtual I/O Server 2 Server LPAR

PVID 1

SEA10GbE

SR

Virt.Eth.

Virtual I/O Server 1

SEA

Virt.Eth.

Client LPAR

vSwitch2

PVID 1

vSwitch1

PVID 1PVID 1

capped

Power 720 - 8202-E4C

10 GbENetwork uncapped

To be tuned

34

Page 35: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Segmentation offload (largesend)

� The task of data segmentation into frames with an appropriate MTU size is offloaded from the operating systems level to the physical network adapter

� Benefits of segmentation offload:– Significantly reduces the cpu consumption on client partitions and VIO-Servers

for packet delivery in the sending direction– Increases the effective VIO-Server outbound throughput for high-speed

network connections� Segmentation offload allows the client partition to send 64 kilobyte of data through

a Virtual Ethernet Adapter� Segmentation offload needs configuration on Power Systems client and VIOS

partitions only! It does not affect the configuration on physical network equipment� The following configuration steps are needed: (Example configuration comes in following slides)

– Client partition: Enable largesend at the interface level– VIO-Server partition:

• Configure largesend=1 for the Shared Ethernet Adapter• Ensure that the large _send=yes option is set for physical adapters• Etherchannel devices don‘t need further configuration

� Checksum offload is implicitly active in sending direction when segmentation offload is enabled

35

Page 36: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Segment aggregation (large receive)

� Segment aggregation allows the buffering of multiple Ethernet frames at VIO-Server level and passes 64 kilobyte data chunks to the client Virtual Ethernet Adapters

� Benefits of large receive:– Significantly reduces the cpu consumption on client partitions and VIO-Servers

for packet delivery in the receiving direction– Increases the effective inbound throughput from the VIO-Server to the client

partitions– Reduces the number of interrupts on VIO-Servers and client partitions

� The following configuration steps needs to be done (Example configuration in following slides)

– VIO-Server partition:• Configure large_receive=yes for the Shared Ethernet Adapter• Ensure that the large_receive=yes option is set for physical adapters• Etherchannel devices don‘t need further configuration

36

Page 37: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Enable segmentation offload and aggregationVirtual I/O Server tuning:Enable TCP segmentation offload by setting the “Largesend” attiribute on SEA:� chdev -dev <SEA> -attr largesend=1

Enable TCP receive segment aggregation on SEA:� chdev -dev <SEA> -attr large_receive=yes

LPAR tuning:Activate TCP segmentation offload to physical Adapter HW:� ifconfig <enX> largesend

Or since AIX 7.1 TL 1 / AIX 6.1 TL 7 ( lsattr -El en0)

mtu_bypass off Enable/Disable largesend for virtual Ethernet True

� chdev –l <enX> -l mtu_bypass=on

SEAPhys.Adapter

PHYPSwitch

Virtual I/O Server Client LPAR

VLAN 1

PVID 1 PVID 1

Largesend on Physical 10 GbE SR adapter is enabled by default Virt.

Eth.Virt.Eth.

37

Page 38: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Overall CPU consumption in a 10GbE PCIe2 environment with SEA

Here was the limit w. default settings

TP [Gb/s]

38

Page 39: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Overall CPU consumption in a 10GbE PCIe2 environment with SEA

TP [Gb/s]

39

Page 40: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Fixes To Known Issues

� IV07193, "aix SEA thread lock contention prevents scale up“� IV08263: LARGE SEND PACKETS CAUSES TX TIME OUT.

– U842917 devices.pciex.e4145616e4140518.rte 7.1.1.15� IV12776: ENTSTAT DISPLAYS INCORRECT HEA PACKETS DISCARDED COUNT

– U843503 devices.chrp.IBM.lhea.rte 7.1.1.15� IV12784: WHEN JUMBO FRAMES IS ENABLED THE PORTS MIGHT DROP ALL THE

– U842917 devices.pciex.e4145616e4140518.rte 7.1.1.15– U849664 devices.pciex.e4145616e4140518.rte 7.1.1.4

� IV13811: ADAPTERS USING GOENTDD MAY STOP TRANSMITTING– U840901 devices.pci.14106902.rte 7.1.1.15

� IV13813: LARGESEND FLAG ON SMALL PACKETS CAN CAUSE ADAPTER TO MISBEHAVE

– U840901 devices.pci.14106902.rte 7.1.1.15

Page 41: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Fixes To Known Issues Cont.

� IV15406: SACK - EXPONENTIAL TCP RETRANSMIT ENDS IN RESET "RST" CONNECTION– U843468 bos.net.tcp.client 7.1.1.15

� IV17613: LARGESEND DOESNT WORK FROM VIOC TO VIOS WHEN IPSEC IS ENABLED– U843468 bos.net.tcp.client 7.1.1.15

� IV17616: UNNECESSARY LARGESEND THROTTLING RESULTING IN POOR PERFORMANCE

– U843468 bos.net.tcp.client 7.1.1.15� IV17666: ALLOCATE NEW LOCK IDS FOR MASON DEVICE DRIVER� IV18708: HEA CREATES PACKET STORM WITH LARGESEND AND 0 MSS

– U843503 devices.chrp.IBM.lhea.rte 7.1.1.15� IV18714: ISNO VALUES NOT SETUP FOR 10GIGE

– U843061 devices.pciex.a21910071410d003.rte 7.1.1.15

Page 42: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Sizing diagram for shared CPU units in a SEA environment

0,4

0,95

0,73

0,74

0

0,5

1

1,5

2

2,5

3

1

CP

U U

nits

Overall CPU utilization

Sending VIOS

Receiving VIOS

Server LPAR

Client LPAR

Benchmark w. effectiveThroughput: 930 Mbit/s

How many CPU units are neccesarryfor gaining desiredthroughput results?

Which instances areinvolved in network

activity from an overall perspective?

Results are dependent on Power Systems Model and hardware configuration

2,82

0,73

Improved with appropriate tuning

42

Page 43: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Sudden very high latency events in a bottleneck situation

3x ~400 ms traffic stop in 10 second time frame

405 ms

43

Page 44: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Most important rule for Shared Processor LPARs

� Plan sufficient Entitled Capacity (EC) on:• Client LPAR working as network client

(Provides sufficient CPU time for hypervisor to process H_SEND_LOGICAL_LAN)• Client LPAR working as network server (Resembling data)• VIO-Server on outgoing site (efficient packet pickup from hypervisor switch and hand

over to Etherchannel or Physical Adapter Device Drivers)• VIO-Server on incoming site

� Look for high Virtual Context Switches (vcsw) rates:

# lparstat 2%user %sys %wait %idle physc %entc lbusy vcsw phint----- ----- ------ ------ ----- ----- ------ ----- -----

1.2 54.6 0.0 44.2 0.72 82.3 8.6 9178 1 326

1.5 52.8 0.0 45.6 0.81 92.6 17.4 10458 1 296

1.4 48.7 0.0 49.9 0.96 110.6 18.4 9532 1 312

44

Page 45: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Client LPAR with insufficient entitlement

� Client with EC=0.5, uncapped , weight=128 � Client with network load ~ 8,3 Gbit/sCurt report:

Total Physical CPU time (msec) = 750.95Physical CPU percentage = 66.24Physical processor affinity = 0.171925Dispatch Histogram for processor (PHYSICAL CPUid : times_dispatched).

PHYSICAL CPU 0 : 193PHYSICAL CPU 4 : 185PHYSICAL CPU 8 : 228PHYSICAL CPU 12 : 199PHYSICAL CPU 16 : 171PHYSICAL CPU 20 : 223PHYSICAL CPU 24 : 170PHYSICAL CPU 28 : 184

Total number of preemptions = 1553Total number of H_CEDE = 0 with preeempti on = 0Total number of H_CONFER = 0 with preeempti on = 0

45

Page 46: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Client LPAR with sufficient entitlement

� Client with EC=1.0, uncapped , weight=128 � Client with network load ~ 12,4 Gbit/sCurt report:

Total Physical CPU time (msec) = 957.54Physical CPU percentage = 87.56Physical processor affinity = 0.176370Dispatch Histogram for processor (PHYSICAL CPUid : times_dispatched).

PHYSICAL CPU 0 : 78PHYSICAL CPU 4 : 59PHYSICAL CPU 8 : 85PHYSICAL CPU 12 : 62PHYSICAL CPU 16 : 68PHYSICAL CPU 20 : 77PHYSICAL CPU 24 : 65PHYSICAL CPU 28 : 90

Total number of preemptions = 584Total number of H_CEDE = 38946822 with pr eeemption = 0Total number of H_CONFER = 0 with preeempti on = 0

46

Page 47: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Thread dispatch time and Network Virtualization

� Client with sufficient entitlement:

� Client with insufficient entitlement:

419 Virtual CPU preemption/dispatch data Preempt: Timeout, Dispatch: Timeslice vProcIndex=00 1F rtrdelta=0.000 us enqdelta=0.000 us exdelta=13.718 us start wait=12.105394 ms end wait=12.119112 ms SRR0=000000000009AC3C SRR1=8000000000001032 dist: local srad=0 assoc=0

Virtual CPU preemption/dispatch data Preempt: Timeout, Dispatch: Timeslice vProcIndex=00 05 rtrdelta=0.000 us enqdelta=500.164 us exdelta=57.32 4 us start wait=0.000000 ms end wait=0.112652 ms SRR0=000000000000A4C8 SRR1=8000000000009032 dist: local srad=0 assoc=0

TB delta until enqueued: wait on frozen Q (entitled capacity had expired)

TB delta until ready to run: number of tics VP had nothing to do(after h_cede or h_confer)

TB delta until running: wait on dispatcher for physical CPU

47

Page 48: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Flow diagram with bottleneck removed

48

Page 49: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

No Resource Errors on Virtual Ethernet Adapters

ETHERNET STATISTICS (ent8) :Device Type: Shared Ethernet AdapterHardware Address: 00:21:5e:e2:27:22Elapsed Time: 111 days 19 hours 15 minutes 32 seconds

Transmit Statistics: Receive Statistics:-------------------- -------------------Packets: 140542106901 Packets: 148017042933Bytes: 136466349514017 Bytes: 141743288103445Interrupts: 0 Interrupts: 68921391370Transmit Errors: 0 Receive Errors: 0Packets Dropped: 0 Packets Dropped: 235745

Bad Packets: 0Max Packets on S/W Transmit Queue: 321 S/W Transmit Queue Overflow: 0Current S/W+H/W Transmit Queue Length: 1

Elapsed Time: 0 days 0 hours 0 minutes 0 secondsBroadcast Packets: 107560097 Broadcast Packets: 215156995Multicast Packets: 118240081 Multicast Packets: 252467976No Carrier Sense: 0 CRC Errors: 0DMA Underrun: 0 DMA Overrun: 0Lost CTS Errors: 0 Alignment Errors: 0Max Collision Errors: 0 No Resource Errors: 235745Late Collision Errors: 0 Receive Collision Errors: 0Deferred: 0 Packet Too Short Errors: 0SQE Test: 0 Packet Too Long Errors: 0Timeout Errors: 0 Packets Discarded by Adapter: 0Single Collision Count: 0 Receiver Start Count: 0[…]Hypervisor Receive Failures: 235745

“No Resource Errors”can occur when the

appropriate amount of memory can not be

added quickly enough to vent buffer space. This

has mainly two reasons: Too much workload or too less access to CPU

time.

49

Page 50: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

No Resource Errors Virtual Ethernet Adapters� entstat –d (-all on ioscli) provides statistics about preallocated bufferspace for Virtual Ethernet

Adapters� „Min Buffers“ is the number of pre-allocated buffers, „Max Buffers“ is the absolut maximum� „Max Allocated“ represents the maximum number of buffers allocated� Number of buffers are dynamically adjusted btw. „Min Buffers“ and „Max Buffers“� Always running on the maximum („Max Allocated“ = „Max Buffers“) is not a good idea and is normally the

hint to a serious bottleneck for latency and throughput� Also buffer post-allocation („Max Allocated“ >= „Min Buffers“) takes time and can negativly affect response

time or high-speed workloads

$ entstat –all <SEA>

Move down to the Virtual (Trunk) Adapter statistics…

Receive InformationReceive Buffers

Buffer Type Tiny Small Medium Large HugeMin Buffers 512 512 128 24 24Max Buffers 2048 2048 256 64 64Allocated 512 512 128 24 24Registered 512 512 128 24 24History

Max Allocated 512 1750 128 24 24Lowest Registered 508 502 128 24 24

50

Page 51: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

No Resource Errors with high throughputs

Receive InformationReceive Buffers

Buffer Type Tiny Small Medium Large HugeMin Buffers 512 512 128 24 24Max Buffers 2048 2048 256 64 64Allocated 512 512 128 24 24Registered 512 512 128 24 24History

Max Allocated 512 523 138 39 64Lowest Registered 509 502 123 19 18

max alloc:

= min buf

< max buf

max alloc:

> min buf

< max buf

max alloc:

> min buf

< max buf

max alloc:

> min buf

< max buf

max alloc:

> min buf

= max buf

51

Page 52: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

No Resource Errors on SEA’s Virtual Adapter

� Tuning the virtual ethernet adapter buffers :– On all bridging Virtual Ethernet Adapters (VENT) configured for the SEA– Reboot required use -P option for chdev if SEA is in use

chdev -l <VENT> -a max_buf_huge=128 -Pchdev -l <VENT> -a min_buf_huge=64 -Pchdev –l <VENT> -a max_buf_large=128 -Pchdev -l <VENT> -a min_buf_large=64 -Pchdev -l <VENT> -a max_buf_medium=512 -Pchdev -l <VENT> -a min_buf_medium=256 -Pchdev -l <VENT> -a max_buf_small=4096 -Pchdev -l <VENT> -a min_buf_small=2048 -Pchdev -l <VENT> -a max_buf_tiny=4096 -Pchdev -l <VENT> -a min_buf_tiny=2048 –P

VIOS

ent0(phy)

ent2(SEA)

ent1(VENT)

52

Page 53: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

No Resource Errors disappeared after buffer adjustment

Receive InformationReceive Buffers

Buffer Type Tiny Small Medium Large HugeMin Buffers 1024 1024 256 48 48Max Buffers 4096 4096 512 128 128Allocated 1024 1024 256 48 48Registered 1024 1024 256 48 48History

Max Allocated 1024 1024 256 48 48Lowest Registered 1023 1024 256 48 48

max alloc:

= min buf

< max buf

max alloc:

= min buf

< max buf

max alloc:

= min buf

< max buf

max alloc:

= min buf

< max buf

max alloc:

= min buf

< max buf

53

Page 54: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Flow control

# entstat –d ent5PCIe2 2-port 10GbE SR Adapter (a21910071410d003) Sp ecific Statistics:--------------------------------------------------- ------------------Link Status: UpMedia Speed Running: 10 Gbps Full DuplexPCI Mode: PCI-Express X8

Relaxed Ordering: DisabledTLP Size: 512MRR Size: 4096PCIe Link Speed: 5.0 Gbps

Firmware Operating Mode: LegacyJumbo Frames: EnabledTransmit TCP segmentation offload: EnabledReceive TCP segment aggregation: EnabledTransmit and receive flow control status: Enabled

Number of XOFF packets transmitted: 0Number of XON packets transmitted: 0Number of XOFF packets received: 0Number of XON packets received: 0

54

Page 55: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Latency

AIX localhost in a SPP:

Virt. Adapter to Virt. Adapter through the same vSwitch:

Cross system through VIOS / SEA with some load:

Cross system through VIOS / SEA with very high TP load:

Alignment Offset RoundTrip Trans ThroughputLocal Remote Local Remote Latency Rate 10^6bits/sSend Recv Send Recv usec/Tran per sec Outbound Inbound

8 0 0 0 185.065 5403.498 0.043 0.043

Alignment Offset RoundTrip Trans ThroughputLocal Remote Local Remote Latency Rate 10^6bits/sSend Recv Send Recv usec/Tran per sec Outbound Inbound

8 0 0 0 2930.229 341.270 0.003 0.003

Alignment Offset RoundTrip Trans ThroughputLocal Remote Local Remote Latency Rate 10^6bits/sSend Recv Send Recv usec/Tran per sec Outbound Inbound

8 0 0 0 57.785 17305.403 0.138 0.138

Alignment Offset RoundTrip Trans ThroughputLocal Remote Local Remote Latency Rate 10^6bits/sSend Recv Send Recv usec/Tran per sec Outbound Inbound

8 0 0 0 30.769 32500.306 0.260 0.260

55

Page 56: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput in dependency to tcp_recvspace in 10 GbE VIOS environment

5,61

6,947,82

8,428,96 9,39

0

1

2

3

4

5

6

7

8

9

10

25000 50000 75000 100000 150000 262144

180 µs latency / 10 GbE VIOS environment

Throughput [Gbit/s]

56

Page 57: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput in dependency to tcp_recvspace in 10 GbE V IOS environment

5,61

6,947,82

8,428,96

9,39

7,01

9,09 9,23 9,4710,3

15

0

2

4

6

8

10

12

14

16

25000 50000 75000 100000 150000 262144

Throughput [Gbit/s] with 180 µs latency Throughput [Gbit/s] with 50 µs latency

Throughput in dependency to tcp_recvspace and latency

57

Page 58: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Latency and Transaction Rate for Virtual I/O Servers

Alignment Offset RoundTrip TransLocal Remote Local Remote Latency Rate Send Recv Send Recv usec/Tran per sec

8 0 0 0 146.819 6811.107

Sending/Receiving VIOS with dedicated donating CPUs:

Alignment Offset RoundTrip TransLocal Remote Local Remote Latency RateSend Recv Send Recv usec/Tran per sec

8 0 0 0 162.928 6137.665

Sending/Receiving VIOS with shared CPU units:

~ 11 %Better

Transaction Rate

58

Page 59: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Shared Ethernet Adapter Thread mode

� Deactivate thread operation mode for the SEA on receiving VIO-Server– Can significantly boost throughput:

• No congestion: Reduce latency for bridging by 9 %• Congestion - 8 TCP-Sessions with 7,5 Gbps load:

Reduce latency for transaction oriented traffic by 25 %

Alignment Offset RoundTrip Trans Local Remote Local Remote Latency Rate Send Recv Send Recv usec/Tran per sec

8 0 0 0 159.369 6274.746

Alignment Offset RoundTrip Trans Local Remote Local Remote Latency Rate Send Recv Send Recv usec/Tran per sec

8 0 0 0 145.862 6855.812

Receiving SEA in thread mode

Receiving SEA in non thread mode

9,2 %Better

Latency

59

Page 60: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Planning for latency sensitive workloads

� Latency sensitive is considered as a guaranteed packet delivery within a certain amount of time

� Operation examples: Transaction oriented workload to App-Servers, Database backend

� Plan Dedicated Donating CPUs for systems with high transaction rates(Reduces the particular forwarding time by avoiding VP dispatch latency)

� Deactivate thread operation mode for the SEA on receiving VIO-Server– Can significantly boost throughput– Can reduce latency for bridging by up to 25 %

60

Page 61: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

RDMA and RoCEE

� RDMA stands for Remote Direct Memory Access. RDMA enables an application to write directly to physical memory on a remote system. RDMA supports direct memory access from the memory of one system into another system's memory without operating system's overhead, as data copies from the network stack to the application memory area. By eliminating the operating system involvement, this promotes high throughput, low-latency communication

� RoCEE (RDMA over Converged Enhanced Ethernet) is a protocol that implements Remote Direct Memory Access (RDMA) over 10 Gigabit Ethernet networks.

Source: Introduction to Ethernet Latency; Qlogic http://www.qlogic.com/Resources/Documents/TechnologyBriefs/Adapters/Tech_Brief_Introduction_to_Ethernet_Latency.pdf

61

Page 62: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

PCIe2 10GbE RoCE Converged Host Bus Adapter � Supportet with AIX 6.1 TL 08, AIX 7.1 TL02 and VIOS 2.2.2.1� lsdev shows the hba and roce device but no ent adapters:

# lsdevhba0 Available 09-00 PCIe2 10GbE RoCE C onverged Host Bus Adapter (b315506714101604)roce0 Available PCIe2 10GbE RoCE Converged Network Adapter

PCIe2 10 GbE RoCE Adapter is preconfigured to operate in the RDMA configuration mode.This can be changed with the following procedure:# rmdev -dl roce0# rmdev -dl hba0# cfgmgr# rmdev -dl roce0# chdev -l hba0 -a stack_type=ofed

# lsdev | grep RoCEent7 Available 09-00-01 RoCE Converged Netw ork Adapterent8 Available 09-00-02 RoCE Converged Netw ork Adapterhba0 Available 09-00 PCIe2 10GbE RoCE C onverged Host Bus Adapter (b315506714101604)

62

Page 63: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Option Throughput gain Latency gain Ease ofimplementation

and/or risk

#1: VIOS design

#2: Jumbo Frames

#3: Large Send

#4: Large Receive

#5: EC hashing

#6: CPU entitlement

#7: Ded. CPUs

#8: TCP buffer

#9: SEA thread mode

#10: RDMA / RoCEE

Network performance tuning overview

63

HIGH HIGH GOOD

HIGH MED MED

HIGH GOOD

GOODHIGH

HIGH GOODMED

GOODHIGHHIGH

HIGHHIGH MED

HIGH HIGH GOOD

HIGH HIGH POOR

HIGH HIGH MED

++Load distribution

+Packet distribution

+++~x3 Throughput gain

0

+++~x3-4 Throughput gainfor outgoing packets

MED

MED

0May have a slight impact

0May have a slight impact

+++~x3-4 Throughput gainfor incomming packets

++ 0

++Sufficient ENTC

++Sufficient ENTC

+ +

+ +

++Sufficient ENTC

++Sufficient ENTC

Gives precedence over all other VIOS tasks

Jumbo frame support for all devices must

be ensured

Page 64: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Do you remember this problem?

AIX LPAR 2

Virt.Eth.

Virtual I/O Server

Virt.Eth.

AIX PAR 1

vSwitch

PVID 10PVID 10

Power 720 - 8202-E4CPower 750 - 8408-E8D

10GbESR

10GbESR

SEA Ether-channel

64

10GbESR

#528710 GbENetwork

AIX LPAR 1 AIX LPAR 2

~ 9 Gbit/s

~ 3 Gbit/s

Page 65: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

Don‘t let this be you!

• Wrong expectation: Operating systems and applications are simply adopting the line speed of a 10 GbE adapter.

Fact is…� A 10 GbE adapter provides a physical

line speed of up to 10 Gbit/s� The network performance of an OS or

an Application depends on….:– …available CPU power for

application and OS network stack– …maximum Transmission Unit size– …distance between sender

and receiver– …offloading features– …coalescing and aggregation

features– …TCP configuration …

„I‘ll never get 10 Gig..“

65

Page 66: AIX-VUG Demystifying 10 Gb Ethernet Performance · Demystifying 10 Gb Ethernet Performance Alexander Paul ...  2

AIX-VUG Demystifying 10 Gb Ethernet Performance

THANK YOU!

VIELEN DANK!

Alexander Paul

[email protected]

Enhanced Technical Support (ETS)

Meet you at Enterprise2014

66