end-user systems: nics, motherboards, disks, tcp stacks & applications

71
http://grid.ucl.ac.uk/nfnn.html 20th-21st June 2005 NeSC Edinburgh End-user systems: NICs, MotherBoards, Disks, TCP Stacks & Applications Richard Hughes-Jones Work reported is from many Networking Collaborations MB - NG

Upload: karlyn

Post on 15-Jan-2016

34 views

Category:

Documents


0 download

DESCRIPTION

MB - NG. End-user systems: NICs, MotherBoards, Disks, TCP Stacks & Applications. Richard Hughes-Jones Work reported is from many Networking Collaborations. Network Performance Issues. End System Issues Network Interface Card and Driver and their configuration Processor speed - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

http://grid.ucl.ac.uk/nfnn.html

20th-21st June 2005NeSC Edinburgh

End-user systems:NICs, MotherBoards, Disks, TCP Stacks & Applications

Richard Hughes-Jones

Work reported is from many Networking Collaborations

MB - NG

Page 2: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 2Richard Hughes-Jones

Network Performance Issues

End System IssuesNetwork Interface Card and Driver and their configuration Processor speedMotherBoard configuration, Bus speed and capabilityDisk SystemTCP and its configuration Operating System and its configuration

Network Infrastructure IssuesObsolete network equipmentConfigured bandwidth restrictionsTopologySecurity restrictions (e.g., firewalls)Sub-optimal routingTransport Protocols

Network Capacity and the influence of Others!Congestion – Group, Campus, Access linksMany, many TCP connectionsMice and Elephants on the path

Page 3: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 3Richard Hughes-Jones

Methodology used in testing NICs & Motherboards

Page 4: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 4Richard Hughes-Jones

UDP/IP packets sent between back-to-back systems Processed in a similar manner to TCP/IP Not subject to flow control & congestion avoidance algorithms Used UDPmon test program

Latency Round trip times measured using Request-Response UDP frames Latency as a function of frame size

Slope is given by:

Mem-mem copy(s) + pci + Gig Ethernet + pci + mem-mem copy(s)

Intercept indicates: processing times + HW latencies Histograms of ‘singleton’ measurements Tells us about:

Behavior of the IP stack The way the HW operates Interrupt coalescence

pathsdata dt

db1 s

Latency Measurements

Page 5: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 5Richard Hughes-Jones

Throughput Measurements

UDP Throughput Send a controlled stream of UDP frames spaced at regular intervals

n bytes

Number of packets

Wait timetime

Zero stats OK done

●●●

Get remote statistics Send statistics:No. receivedNo. lost + loss patternNo. out-of-orderCPU load & no. int1-way delay

Send data frames at regular intervals

●●●

Time to send Time to receive

Inter-packet time(Histogram)

Signal end of testOK done

Time

Sender Receiver

Page 6: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 6Richard Hughes-Jones

PCI Bus & Gigabit Ethernet Activity

PCI Activity Logic Analyzer with

PCI Probe cards in sending PC Gigabit Ethernet Fiber Probe Card PCI Probe cards in receiving PC

GigabitEthernetProbe

CPU

mem

chipset

NIC

CPU

mem

NIC

chipset

Logic AnalyserDisplay

PCI bus PCI bus

Possible Bottlenecks

Page 7: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 7Richard Hughes-Jones

SuperMicro P4DP8-2G (P4DP6) Dual Xeon 400/522 MHz Front side bus

6 PCI PCI-X slots 4 independent PCI buses

64 bit 66 MHz PCI 100 MHz PCI-X 133 MHz PCI-X

Dual Gigabit Ethernet Adaptec AIC-7899W

dual channel SCSI UDMA/100 bus master/EIDE channels

data transfer rates of 100 MB/sec burst

“Server Quality” Motherboards

Page 8: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 8Richard Hughes-Jones

“Server Quality” Motherboards

Boston/Supermicro H8DAR Two Dual Core Opterons 200 MHz DDR Memory

Theory BW: 6.4Gbit

HyperTransport

2 independent PCI buses 133 MHz PCI-X

2 Gigabit Ethernet SATA

( PCI-e )

Page 9: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 9Richard Hughes-Jones

NIC & Motherboard Evaluations

Page 10: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 10Richard Hughes-Jones

SuperMicro 370DLE: Latency: SysKonnect

PCI:32 bit 33 MHz Latency small 62 µs & well behaved Latency Slope 0.0286 µs/byte Expect: 0.0232 µs/byte

PCI 0.00758

GigE 0.008

PCI 0.00758

PCI:64 bit 66 MHz Latency small 56 µs & well behaved Latency Slope 0.0231 µs/byte Expect: 0.0118 µs/byte

PCI 0.00188

GigE 0.008

PCI 0.00188 Possible extra data moves ?

Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz RedHat 7.1 Kernel 2.4.14 SysKonnect 32bit 33 MHz

y = 0.0286x + 61.79

y = 0.0188x + 89.756

0

20

40

60

80

100

120

140

160

0 500 1000 1500 2000 2500 3000Message length bytes

Late

ncy u

s SysKonnect 64 bit 66 MHz

y = 0.0231x + 56.088

y = 0.0142x + 81.975

0

20

40

60

80

100

120

140

160

0 500 1000 1500 2000 2500 3000

Message length bytes

Late

ncy u

s

Page 11: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 11Richard Hughes-Jones

SuperMicro 370DLE: Throughput: SysKonnect

PCI:32 bit 33 MHz Max throughput 584Mbit/s No packet loss >18 us

spacing

PCI:64 bit 66 MHz Max throughput 720 Mbit/s No packet loss >17 us

spacing Packet loss during BW drop

95-100% Kernel mode

Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz RedHat 7.1 Kernel 2.4.14

SysKonnect 32bit 33 MHz

0

200

400

600

800

1000

0 5 10 15 20 25 30 35 40Transmit Time per frame us

Rec

v W

ire r

ate

Mbi

ts/s

50 bytes

100 bytes

200 bytes

400 bytes

600 bytes

800 bytes

1000 bytes

1200 bytes

1400 bytes

1472 bytes

SysKonnnect 64bit 66MHz

0

200

400

600

800

1000

0 5 10 15 20 25 30 35 40

Transmit Time per frame us

Rec

v W

ire r

ate

Mbi

ts/s

50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

SysKonnect 64 bit 66 MHz

0

20

40

60

80

100

0 10 20 30 40Spacing between frames us

% C

PU

Kern

el

Receiv

er

50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

Page 12: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 12Richard Hughes-Jones

SuperMicro 370DLE: PCI: SysKonnect Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14

Receive transfer

Send PCI

Packet on Ethernet Fibre

Send setup

Send transfer

Receive PCI

1400 bytes sent Wait 100 us ~8 us for send or receive Stack & Application overhead ~ 10 us / node

~36 us

Page 13: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 13Richard Hughes-Jones

Send PCI

Receive PCI

Send setup

Data Transfers

Receive Transfers

Send PCI

Receive PCI

Send setup

Data Transfers

Receive Transfers

Signals on the PCI bus 1472 byte packets every 15 µs Intel Pro/1000

PCI:64 bit 33 MHz

82% usage

PCI:64 bit 66 MHz

65% usage

Data transfers half as long

Page 14: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 14Richard Hughes-Jones

SuperMicro 370DLE: PCI: SysKonnect

1400 bytes sent Wait 20 us

1400 bytes sent Wait 10 us

Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14

Frames on Ethernet Fiber 20 us spacing

Frames are back-to-back800 MHz Can drive at line speed

Cannot go any faster !

Page 15: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 15Richard Hughes-Jones

SuperMicro 370DLE: Throughput: Intel Pro/1000

Max throughput 910 Mbit/s No packet loss >12 us spacing

Packet loss during BW drop CPU load 65-90% spacing < 13

us

Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14 Intel Pro/1000 64bit 66MHz

0

200

400

600

800

1000

0 5 10 15 20 25 30 35 40Transmit Time per frame us

Recv

Wire

rate

M

bits

/s

50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

UDP Intel Pro/1000 : 370DLE 64bit 66MHz

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25 30 35 40Transmit Time per frame us

% P

acke

t los

s 50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

Page 16: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 16Richard Hughes-Jones

SuperMicro 370DLE: PCI: Intel Pro/1000 Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14

Send PCI

Receive PCI

Send 64 byte request

Send Interrupt processing

Interrupt delay

Request received

Receive Interrupt processing

1400 byte response

Request – Response Demonstrates interrupt coalescence No processing directly after each transfer

Page 17: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 17Richard Hughes-Jones

SuperMicro P4DP6: Latency Intel Pro/1000

Some steps Slope 0.009 us/byte Slope flat sections : 0.0146

us/byte Expect 0.0118 us/byte

No variation with packet size FWHM 1.5 us Confirms timing reliable

Motherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia 2.2 GHz PCI, 64 bit, 66 MHz RedHat 7.2 Kernel 2.4.19 Intel 64 bit 66 MHz

y = 0.0093x + 194.67

y = 0.0149x + 201.75

0

50

100

150

200

250

300

0 500 1000 1500 2000 2500 3000Message length bytes

Late

ncy u

s

64 bytes Intel 64 bit 66 MHz

0

100

200

300

400

500

600

700

800

900

170 190 210

Latency us

N(t)

512 bytes Intel 64 bit 66 MHz

0

100

200

300

400

500

600

700

800

170 190 210Latency us

N(t)

1024 bytes Intel 64 bit 66 MHz

0

100

200

300

400

500

600

700

800

190 210 230

Latency us

N(t)

1400 bytes Intel 64 bit 66 MHz

0

100

200

300

400

500

600

700

800

190 210 230

Latency us

N(t)

Page 18: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 18Richard Hughes-Jones

SuperMicro P4DP6: Throughput Intel Pro/1000

Max throughput 950Mbit/s No packet loss

Averages are misleading CPU utilisation on the

receiving PC was ~ 25 % for packets > than 1000 bytes

30- 40 % for smaller packets

Motherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia 2.2 GHz PCI, 64 bit, 66 MHz RedHat 7.2 Kernel 2.4.19

gig6-7 Intel pci 66 MHz 27nov02

0

200

400

600

800

1000

0 5 10 15 20 25 30 35 40Transmit Time per frame us

Recv

Wire

rate

M

bits

/s

50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

gig6-7 Intel pci 66 MHz 27nov02

0

20

40

60

80

100

0 5 10 15 20 25 30 35 40Transmit Time per frame us

% C

PU

Idl

e R

ecei

ver

50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

Page 19: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 19Richard Hughes-Jones

SuperMicro P4DP6: PCI Intel Pro/1000

1400 bytes sent Wait 12 us ~5.14us on send PCI bus PCI bus ~68% occupancy ~ 3 us on PCI for data recv

CSR access inserts PCI STOPs NIC takes ~ 1 us/CSR CPU faster than the NIC !

Similar effect with the SysKonnect NIC

Motherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia 2.2 GHz PCI, 64 bit, 66 MHz RedHat 7.2 Kernel 2.4.19

Page 20: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 20Richard Hughes-Jones

SuperMicro P4DP8-G2: Throughput Intel onboard

Max throughput 995Mbit/s No packet loss

Averages are misleading 20% CPU utilisation receiver

packets > 1000 bytes 30% CPU utilisation smaller

packets

Motherboard: SuperMicro P4DP8-G2 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia 2.4 GHz PCI-X:64 bit RedHat 7.3 Kernel 2.4.19 Intel-onboard

0

200

400

600

800

1000

0 5 10 15 20 25 30 35 40Transmit Time per frame us

Recv

Wire

rate

M

bits

/s

50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

Intel-onboard

0

20

40

60

80

100

0 5 10 15 20 25 30 35 40Transmit Time per frame us

% C

PU

Idle

Receiv

er 50 bytes

100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

Page 21: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 21Richard Hughes-Jones

Interrupt Coalescence: Throughput

Intel Pro 1000 on 370DLE

Throughput 1472 byte packets

0

100

200

300

400

500

600

700

800

900

0 10 20 30 40

Delay between transmit packets us

Re

ce

ive

d W

ire

ra

te M

bit

/s coa5

coa10

coa20

coa40

coa64

coa100

Throughput 1000 byte packets

0

100

200

300

400

500

600

700

800

0 10 20 30 40

Delay between transmit packets us

Receiv

ed

Wir

e r

ate

Mb

it/s

coa5

coa10

coa20

coa40

coa64

coa100

Page 22: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 22Richard Hughes-Jones

Interrupt Coalescence Investigations Set Kernel parameters for

Socket Buffer size = rtt*BW

TCP mem-mem lon2-man1 Tx 64 Tx-abs 64 Rx 0 Rx-abs 128 820-980 Mbit/s +- 50 Mbit/s

Tx 64 Tx-abs 64 Rx 20 Rx-abs 128 937-940 Mbit/s +- 1.5 Mbit/s

Tx 64 Tx-abs 64 Rx 80 Rx-abs 128 937-939 Mbit/s +- 1 Mbit/s

Page 23: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 23Richard Hughes-Jones

Supermarket Motherboards

and other

“Challenges”

Page 24: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 24Richard Hughes-Jones

Tyan Tiger S2466N

Motherboard: Tyan Tiger S2466N PCI: 1 64bit 66 MHz CPU: Athlon MP2000+ Chipset: AMD-760 MPX

3Ware forces PCI bus to 33 MHz

Tyan to MB-NG SuperMicroNetwork mem-mem 619 Mbit/s

Page 25: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 25Richard Hughes-Jones

IBM das: Throughput: Intel Pro/1000

Max throughput 930Mbit/s No packet loss > 12 us Clean behaviour Packet loss during drop

UDP Intel pro1000 IBMdas 64bit 33 MHz

0

100

200

300

400

500

600

700

800

900

1000

0 5 10 15 20 25 30 35 40

Transmit Time per frame usR

ecv W

ire r

ate

Mb

its/s

50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

Motherboard: IBM das Chipset:: ServerWorks CNB20LE CPU: Dual PIII 1GHz PCI:64 bit 33 MHz RedHat 7.1 Kernel 2.4.14

1400 bytes sent 11 us spacing Signals clean ~9.3us on send PCI bus PCI bus ~82%

occupancy ~ 5.9 us on PCI for data

recv.

Page 26: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 26Richard Hughes-Jones

Network switch limits behaviour End2end UDP packets from udpmon

Only 700 Mbit/s throughput

Lots of packet loss

Packet loss distributionshows throughput limited

w05gva-gig6_29May04_UDP

0

100

200

300

400

500

600

700

800

900

1000

0 5 10 15 20 25 30 35 40Spacing between frames us

Recv W

ire r

ate

Mb

its/s

50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

w05gva-gig6_29May04_UDP

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20 25 30 35 40Spacing between frames us

% P

acke

t lo

ss

50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

w05gva-gig6_29May04_UDP wait 12us

0

2000

4000

6000

8000

10000

12000

14000

0 100 200 300 400 500 600Packet No.

1-w

ay d

ela

y u

s

0

2000

4000

6000

8000

10000

12000

14000

500 510 520 530 540 550Packet No.

1-w

ay d

elay

us

Page 27: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 27Richard Hughes-Jones

10 Gigabit Ethernet

Page 28: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 28Richard Hughes-Jones

10 Gigabit Ethernet: UDP Throughput 1500 byte MTU gives ~ 2 Gbit/s Used 16144 byte MTU max user length

16080 DataTAG Supermicro PCs Dual 2.2 GHz Xenon CPU FSB 400 MHz PCI-X mmrbc 512 bytes wire rate throughput of 2.9 Gbit/s

CERN OpenLab HP Itanium PCs Dual 1.0 GHz 64 bit Itanium CPU FSB 400

MHz PCI-X mmrbc 4096 bytes wire rate of 5.7 Gbit/s

SLAC Dell PCs giving a Dual 3.0 GHz Xenon CPU FSB 533 MHz PCI-X mmrbc 4096 bytes wire rate of 5.4 Gbit/s

an-al 10GE Xsum 512kbuf MTU16114 27Oct03

0

1000

2000

3000

4000

5000

6000

0 5 10 15 20 25 30 35 40Spacing between frames us

Rec

v W

ire

rate

Mbi

ts/s

16080 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes

Page 29: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 29Richard Hughes-Jones

10 Gigabit Ethernet: Tuning PCI-X

16080 byte packets every 200 µs Intel PRO/10GbE LR Adapter PCI-X bus occupancy vs mmrbc

Measured times Times based on PCI-X times

from the logic analyser Expected throughput ~7 Gbit/s Measured 5.7 Gbit/s

mmrbc1024 bytes

mmrbc2048 bytes

mmrbc4096 bytes5.7Gbit/s

mmrbc512 bytes

CSR Access

PCI-X Sequence

Data Transfer

Interrupt & CSR UpdateKernel 2.6.1#17 HP Itanium Intel10GE Feb04

0

2

4

6

8

10

0 1000 2000 3000 4000 5000Max Memory Read Byte Count

PC

I-X

Tra

nsfe

r tim

e

us

measured Rate Gbit/srate from expected time Gbit/s Max throughput PCI-X

DataTAG Xeon 2.2 GHz

0

2

4

6

8

10

0 1000 2000 3000 4000 5000Max Memory Read Byte Count

PC

I-X

Tra

nsfe

r tim

e

us

measured Rate Gbit/srate from expected time Gbit/s Max throughput PCI-X

Page 30: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 30Richard Hughes-Jones

Different TCP Stacks

Page 31: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 31Richard Hughes-Jones

Investigation of new TCP Stacks The AIMD Algorithm – Standard TCP (Reno)

For each ack in a RTT without loss:

cwnd -> cwnd + a / cwnd - Additive Increase, a=1 For each window experiencing loss:

cwnd -> cwnd – b (cwnd) - Multiplicative Decrease, b= ½

High Speed TCP

a and b vary depending on current cwnd using a table a increases more rapidly with larger cwnd – returns to the ‘optimal’ cwnd size sooner

for the network path b decreases less aggressively and, as a consequence, so does the cwnd. The effect

is that there is not such a decrease in throughput.

Scalable TCP

a and b are fixed adjustments for the increase and decrease of cwnd a = 1/100 – the increase is greater than TCP Reno b = 1/8 – the decrease on loss is less than TCP Reno Scalable over any link speed.

Fast TCP

Uses round trip time as well as packet loss to indicate congestion with rapid convergence to fair equilibrium for throughput.

Page 32: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 32Richard Hughes-Jones

Comparison of TCP Stacks TCP Response Function

Throughput vs Loss Rate – further to right: faster recovery Drop packets in kernel

MB-NG rtt 6ms DataTAG rtt 120 ms

Page 33: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 33Richard Hughes-Jones

High Throughput Demonstrations

Manchester (Geneva)

man03lon01

2.5 Gbit SDHMB-NG Core

1 GEth1 GEth

Cisco GSR

Cisco GSR

Cisco7609

Cisco7609

London (Chicago)

Dual Zeon 2.2 GHz Dual Zeon 2.2 GHz

Send data with TCPDrop Packets

Monitor TCP with Web100

Page 34: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 34Richard Hughes-Jones

Drop 1 in 25,000 rtt 6.2 ms Recover in 1.6 s

High Performance TCP – MB-NG

Standard HighSpeed Scalable

Page 35: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 35Richard Hughes-Jones

High Performance TCP – DataTAG Different TCP stacks tested on the DataTAG Network rtt 128 ms Drop 1 in 106

High-SpeedRapid recovery

ScalableVery fast recovery

StandardRecovery would

take ~ 20 mins

Page 36: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 36Richard Hughes-Jones

Disk and RAID Sub-Systems

Based on talk at NFNN 2004

by

Andrew Sansum

Tier 1 Manager at RAL

Page 37: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 37Richard Hughes-Jones

Reliability and Operation

The performance of a system that is broken is 0.0 MB/S! When staff are fixing broken servers they cannot spend their time

optimising performance. With a lot of systems, anything that can break – will break

Typical ATA failure rate 2-3% per annum at RAL, CERN and CASPUR. That’s 30-45 drives per annum on the RAL Tier1 i.e. One/week. One failure for every 10**15 bits read. MTBF 400K hours.

RAID arrays may not handle block re-maps satisfactorily or take so long that the system has given up. RAID5 not perfect protection – finite chance of 2 disks failing!

SCSI interconnects corrupts data or give CRC errors Filesystems (ext2) corrupt for no apparent cause (EXT2 errors) Silent data corruption happens (checksums are vital).

Fixing data corruptions can take a huge amount of time (> 1 week). Data loss possible.

Page 38: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 38Richard Hughes-Jones

You need to Benchmark – but how?

Andrew’s golden rule: Benchmark Benchmark Benchmark Choose a benchmark that matches your application (we

use Bonnie++ and IOZONE). Single stream of sequential I/O. Random disk accesses. Multi-stream – 30 threads more typical.

Watch out for caching effects (use large files and small memory).

Use a standard protocol that gives reproducible results. Document what you did for future reference Stick with the same benchmark suite/protocol as long as

possible to gain familiarity. Much easier to spot significant changes

Page 39: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 39Richard Hughes-Jones

Hard Disk Performance

Factors affecting performance: Seek time (rotation speed is a big component) Transfer Rate Cache size Retry count (vibration)

Watch out for unusual disk behaviours: write/verify (drive verifies writes for first <n> writes) drive spin down etc etc.

Storage review is often worth reading: http://www.storagereview.com/

Page 40: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 40Richard Hughes-Jones

Impact of Drive Transfer Rate

0

1020

3040

50

6070

80

1 2 4 8 16 32

MB/S7200 Read

5400 Read

A slower drive may not cause significant sequential performance degradation in a RAID array.This 5400 drive was 25% slower than the 7200

Threads

Page 41: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 41Richard Hughes-Jones

Disk Parameters

From: www.storagereview.com

ms

Distribution of Seek Time

Mean 14.1 ms Published tseek time 9 ms Taken off rotational latency

Head Position & Transfer Rate

Outside of disk has more blocksso faster

58 MB/s to 35 MB/s 40% effect

Position on disk

Page 42: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 42Richard Hughes-Jones

Drive Interfaces

http://www.maxtor.com/

Page 43: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 43Richard Hughes-Jones

RAID levels

Choose Appropriate RAID level RAID 0 (stripe) is fastest but no redundancy RAID 1 (mirror) single disk performance RAID 2, 3 and 4 not very common/supported

sometimes worth testing with your application RAID 5 – Read is almost as fast as RAID-0, write

substantially slower. RAID 6 – extra parity info – good for unreliable drives

but could have considerable impact on performance. Rare but potentially very useful for large IDE disk arrays.

RAID 50 & RAID 10 - RAID0 2 (or more) controllers

Page 44: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 44Richard Hughes-Jones

Kernel Versions

Kernel versions can make an enormous difference. It is not unusual for I/O to be badly broken in Linux. In 2.4 series – Virtual Memory Subsystem problems

only 2.4.5 some 2.4.9 >2.4.17 OK Redhat Enteprise 3 seems badly broken

(although most recent RHE Kernel may be better) 2.6 kernel contains new functionality and may be better

(although first tests show little difference) Always check after an upgrade that performance remains

good.

Page 45: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 45Richard Hughes-Jones

Kernel Tuning

Tunable kernel parameters can improve I/O performance, but depends what you are trying to achieve. IRQ balancing. Issues with IRQ handling on some Xeon

chipsets/kernel combinations (all IRQs handled on CPU zero). Hyperthreading I/O schedulers – can tune for latency by minimising the seeks

/sbin/elvtune ( number sectors of writes before reads allowed) Choice of scheduler: elevator=xx orders I/O to minimise seeks,

merges adjacent requests. VM on 2.4 tuningBdflush and readahead (single thread helped)

Sysctl –w vm.max-readahead=512Sysctl –w vm.min-readahead=512Sysctl –w vm.bdflush=10 500 0 0 3000 10 20 0

On 2.6 readahead command is [default 256]blockdev --setra 16384 /dev/sda

Page 46: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 46Richard Hughes-Jones

Example Effect of VM Tuneup

020406080

100120140160180200

1 2 4 8 16 32

Default

Tuned

[email protected]

MB/s

threads

Redhat 7.3 Read Performance

Page 47: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 47Richard Hughes-Jones

IOZONE Tests

0

10

20

30

40

50

60

70

MB/s

1 2 4 8 16 32

threads

ext2

ext3

Read Performance File System ext2 ext3 Due to journal behaviour 40% effect

0

10

20

30

40

50

60

70

MB/s

1 2 4 8 16 32

threads

0

10

20

30

40

50

60

MB/s

1 2 4 8 16 32

threads

writeback

ordered

journal

ext3 Write ext3 Read Journal write data + write journal

Page 48: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 48Richard Hughes-Jones

RAID Controller Performance RAID5 (stripped with redundancy) 3Ware 7506 Parallel 66 MHz 3Ware 7505 Parallel 33 MHz 3Ware 8506 Serial ATA 66 MHz ICP Serial ATA 33/66 MHz Tested on Dual 2.2 GHz Xeon Supermicro P4DP8-G2 motherboard Disk: Maxtor 160GB 7200rpm 8MB Cache Read ahead kernel tuning: /proc/sys/vm/max-readahead = 512

Rates for the same PC RAID0 (stripped) Read 1040 Mbit/s, Write 800 Mbit/s

Disk – Memory Read Speeds Memory - Disk Write Speeds

Stephen Dallison

Page 49: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 49Richard Hughes-Jones

SC2004 RAID Controller Performance Supermicro X5DPE-G2 motherboards loaned from Boston Ltd. Dual 2.8 GHz Zeon CPUs with 512 k byte cache and 1 M byte memory 3Ware 8506-8 controller on 133 MHz PCI-X bus Configured as RAID0 64k byte stripe size Six 74.3 GByte Western Digital Raptor WD740 SATA disks

75 Mbyte/s disk-buffer 150 Mbyte/s buffer-memory Scientific Linux with 2.6.6 Kernel + altAIMD patch (Yee) + packet loss patch Read ahead kernel tuning: /sbin/blockdev --setra 16384 /dev/sda

RAID0 (stripped) 2 GByte file: Read 1500 Mbit/s, Write 1725 Mbit/s

Disk – Memory Read Speeds Memory - Disk Write Speeds

RAID0 6disks Read 3w8506-8

0

2000

4000

6000

8000

10000

12000

0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0 160.0 180.0 200.0

Buffer size kbytes

Thro

ughp

ut M

bit/s

1 G Byte Read

0.5 G Byte Read

2 G Byte Read

RAID0 6disks Write 3w8506-8 16

0

500

1000

1500

2000

2500

0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0 160.0 180.0 200.0

Buffer size kbytes

Thro

ughp

ut M

bit/s

1 G Byte write

0.5 G Byte write

2 G Byte write

Page 50: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 50Richard Hughes-Jones

Applications

Throughput for Real Users

Page 51: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 51Richard Hughes-Jones

Topology of the MB – NG Network

KeyGigabit Ethernet2.5 Gbit POS Access

MPLS Admin. Domains

UCL Domain

Edge Router Cisco 7609

man01

man03

Boundary Router Cisco 7609

Boundary Router Cisco 7609

RAL Domain

Manchester Domain

lon02

man02

ral01

UKERNADevelopment

Network

Boundary Router Cisco 7609

ral02

ral02

lon03

lon01

HW RAID

HW RAID

Page 52: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 52Richard Hughes-Jones

Gridftp Throughput + Web100 RAID0 Disks:

960 Mbit/s read 800 Mbit/s write

Throughput Mbit/s:

See alternate 600/800 Mbit and zero Data Rate: 520 Mbit/s

Cwnd smooth No dup Ack / send stall /

timeouts

Page 53: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 53Richard Hughes-Jones

http data transfers HighSpeed TCP

Same Hardware Bulk data moved by web servers Apachie web server

out of the box! prototype client - curl http library 1Mbyte TCP buffers 2Gbyte file Throughput ~720 Mbit/s Cwnd - some variation No dup Ack / send stall / timeouts

Page 54: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 54Richard Hughes-Jones

bbftp: What else is going on? Scalable TCP

BaBar + SuperJANET

SuperMicro + SuperJANET

Congestion window – duplicate ACK

Variation not TCP related? Disk speed / bus transfer Application

Page 55: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 55Richard Hughes-Jones

SC2004 UKLIGHT Topology

MB-NG 7600 OSRManchester

ULCC UKLight

UCL HEP

UCL network

K2

Ci

Chicago Starlight

Amsterdam

SC2004

Caltech BoothUltraLight IP

SLAC Booth

Cisco 6509

UKLight 10GFour 1GE channels

UKLight 10G

Surfnet/ EuroLink 10GTwo 1GE channels

NLR LambdaNLR-PITT-STAR-10GE-16

K2

K2 Ci

Caltech 7600

Page 56: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 56Richard Hughes-Jones

SC2004 Disk-Disk bbftp bbftp file transfer program uses TCP/IP UKLight: Path:- London-Chicago-London; PCs:- Supermicro +3Ware RAID0 MTU 1500 bytes; Socket size 22 Mbytes; rtt 177ms; SACK off Move a 2 Gbyte file Web100 plots:

Standard TCP Average 825 Mbit/s (bbcp: 670 Mbit/s)

Scalable TCP Average 875 Mbit/s (bbcp: 701 Mbit/s

~4.5s of overhead)

Disk-TCP-Disk at 1Gbit/s

0

500

1000

1500

2000

2500

0 5000 10000 15000 20000

time ms

TC

PA

chiv

e M

bit

/s050000001000000015000000200000002500000030000000350000004000000045000000

Cw

nd

InstaneousBW

AveBW

CurCwnd (Value)

0

500

1000

1500

2000

2500

0 5000 10000 15000 20000

time ms

TC

PA

chiv

e M

bit

/s

050000001000000015000000200000002500000030000000350000004000000045000000

Cw

nd

InstaneousBWAveBWCurCwnd (Value)

Page 57: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 57Richard Hughes-Jones

RAID0 6disks 1 Gbyte Write 64k 3w8506-8

0

500

1000

1500

2000

0.0 20.0 40.0 60.0 80.0 100.0Trial number

Thro

ughput

Mbit/s

Network & Disk Interactions (work in progress) Hosts:

Supermicro X5DPE-G2 motherboards dual 2.8 GHz Zeon CPUs with 512 k byte cache and 1 M byte memory 3Ware 8506-8 controller on 133 MHz PCI-X bus configured as RAID0 six 74.3 GByte Western Digital Raptor WD740 SATA disks 64k byte stripe size

Measure memory to RAID0 transfer rates with & without UDP trafficRAID0 6disks 1 Gbyte Write 64k 3w8506-8

y = -1.017x + 178.32

y = -1.0479x + 174.440

20

40

60

80

100

120

140

160

180

200

0 20 40 60 80 100 120 140 160 180 200% cpu system mode L1+2

8k

64k

R0 6d 1 Gbyte udp Write 64k 3w8506-8

0

500

1000

1500

2000

0.0 20.0 40.0 60.0 80.0 100.0Trial number

Thro

ughput

Mbit/s

R0 6d 1 Gbyte udp9000 write 64k 3w8506-8

0

500

1000

1500

2000

0.0 20.0 40.0 60.0 80.0 100.0Trial number

Thro

ughput

Mbit/s

R0 6d 1 Gbyte udp Write 64k 3w8506-8

0

20

40

60

80

100

120

140

160

180

200

0 20 40 60 80 100 120 140 160 180 200% cpu system mode L1+2

8k64ky=178-1.05x

R0 6d 1 Gbyte udp9000 write 8k 3w8506-8 07Jan05 16384

0

20

40

60

80

100

120

140

160

180

200

0 20 40 60 80 100 120 140 160 180 200% cpu system mode L1+2

% c

pu

syste

m m

od

e L

3+

4

8k

64k

y=178-1.05x

Disk write1735 Mbit/s

Disk write +1500 MTU UDP

1218 Mbit/sDrop of 30%

Disk write +9000 MTU UDP

1400 Mbit/sDrop of 19%

% CPU kernel mode

Page 58: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 58Richard Hughes-Jones

Remote Processing Farms: Manc-CERN

0

50000

100000

150000

200000

250000

0 200 400 600 800 1000 1200 1400 1600 1800 2000time

Dat

a B

ytes

Ou

t

0

50

100

150

200

250

300

350

400

Dat

a B

ytes

In

DataBytesOut (Delta DataBytesIn (Delta Round trip time 20 ms

64 byte Request green1 Mbyte Response blue

TCP in slow start 1st event takes

19 rtt or ~ 380 ms

0

50000

100000

150000

200000

250000

0 200 400 600 800 1000 1200 1400 1600 1800 2000time ms

Dat

a B

ytes

Ou

t

0

50000

100000

150000

200000

250000

Cu

rCw

nd

DataBytesOut (Delta DataBytesIn (Delta CurCwnd (Value

TCP Congestion windowgets re-set on each Request

TCP stack implementation detail to reduce Cwnd after inactivity

Even after 10s, each response takes 13 rtt or ~260 ms

020406080

100120140160180

0 200 400 600 800 1000 1200 1400 1600 1800 2000time ms

TC

PA

chiv

e M

bit

/s

0

50000

100000

150000

200000

250000

Cw

nd

Transfer achievable throughput120 Mbit/s

●●●

Page 59: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 59Richard Hughes-Jones

Host is critical: Motherboards NICs, RAID controllers and Disks matter The NICs should be well designed:

NIC should use 64 bit 133 MHz PCI-X (66 MHz PCI can be OK)NIC/drivers: CSR access / Clean buffer management / Good interrupt

handling Worry about the CPU-Memory bandwidth as well as the PCI bandwidth

Data crosses the memory bus at least 3 times Separate the data transfers – use motherboards with multiple 64 bit PCI-X buses

Test Disk Systems with representative Load Choose a modern high throughput RAID controller

Consider SW RAID0 of RAID5 HW controllers Need plenty of CPU power for sustained 1 Gbit/s transfers and disk access Packet loss is a killer

Check on campus links & equipment, and access links to backbones New stacks are stable give better response & performance

Still need to set the tcp buffer sizes & other kernel settings e.g. window-scale Application architecture & implementation is important Interaction between HW, protocol processing, and disk sub-system complex

Summary, Conclusions & Thanks

MB - NG

Page 60: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 60Richard Hughes-Jones

More Information Some URLs

UKLight web site: http://www.uklight.ac.uk DataTAG project web site: http://www.datatag.org/ UDPmon / TCPmon kit + writeup:

http://www.hep.man.ac.uk/~rich/ (Software & Tools) Motherboard and NIC Tests:

http://www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.ppt& http://datatag.web.cern.ch/datatag/pfldnet2003/ “Performance of 1 and 10 Gigabit Ethernet Cards with Server Quality Motherboards” FGCS Special issue 2004 http:// www.hep.man.ac.uk/~rich/ (Publications)

TCP tuning information may be found at:http://www.ncne.nlanr.net/documentation/faq/performance.html & http://www.psc.edu/networking/perf_tune.html

TCP stack comparisons:“Evaluation of Advanced TCP Stacks on Fast Long-Distance Production Networks” Journal of Grid Computing 2004http:// www.hep.man.ac.uk/~rich/ (Publications)

PFLDnet http://www.ens-lyon.fr/LIP/RESO/pfldnet2005/ Dante PERT http://www.geant2.net/server/show/nav.00d00h002 Real-Time Remote Farm site http://csr.phys.ualberta.ca/real-time Disk information http://www.storagereview.com/

Page 61: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 61Richard Hughes-Jones

Any Questions?

Page 62: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 62Richard Hughes-Jones

Backup Slides

Page 63: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 63Richard Hughes-Jones

TCP (Reno) – Details Time for TCP to recover its throughput from 1 lost packet given by:

for rtt of ~200 ms:

MSS

RTTC

*2

* 2

2 min

0.00010.0010.010.1

110

1001000

10000100000

0 50 100 150 200rtt ms

Tim

e to

rec

ove

r se

c

10Mbit100Mbit1Gbit2.5Gbit10Gbit

UK 6 ms Europe 20 ms USA 150 ms

Page 64: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 64Richard Hughes-Jones

Packet Loss and new TCP Stacks TCP Response Function

UKLight London-Chicago-London rtt 180 ms 2.6.6 Kernel

Agreement withtheory good

sculcc1-chi-2 iperf 13Jan05

1

10

100

1000

100100010000100000100000010000000100000000Packet drop rate 1 in n

TC

P A

chie

vabl

e th

roug

hput

M

bit/s

A0 1500

A1 HSTCP

A2 Scalable

A3 HTCP

A5 BICTCP

A8 Westwood

A7 Vegas

A0 Theory

Series10

Scalable Theory

sculcc1-chi-2 iperf 13Jan05

0

100

200

300

400

500

600

700

800

900

1000

100100010000100000100000010000000100000000Packet drop rate 1 in n

TC

P A

chie

vabl

e th

roug

hput

M

bit/s

A0 1500

A1 HSTCP

A2 Scalable

A3 HTCP

A5 BICTCP

A8 Westwood

A7 Vegas

Page 65: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 65Richard Hughes-Jones

Chose 3 paths from SLAC (California) Caltech (10ms), Univ Florida (80ms), CERN (180ms)

Used iperf/TCP and UDT/UDP to generate traffic

Each run was 16 minutes, in 7 regions

Test of TCP Sharing: Methodology (1Gbit/s)

Ping 1/s

Iperf or UDT

ICMP/ping traffic

TCP/UDPbottleneck

iperf

SLACCaltech/UFL/CERN

2 mins 4 mins

Les Cottrell PFLDnet

2005

Page 66: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 66Richard Hughes-Jones

Low performance on fast long distance paths AIMD (add a=1 pkt to cwnd / RTT, decrease cwnd by factor b=0.5 in

congestion) Net effect: recovers slowly, does not effectively use available bandwidth,

so poor throughput Unequal sharing

TCP Reno single stream

Congestion has a dramatic effect

Recovery is slow

Increase recovery rate

SLAC to CERN

RTT increases when achieves best throughput

Les Cottrell PFLDnet 2005

Remaining flows do not take up slack when flow removed

Page 67: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 67Richard Hughes-Jones

Average Transfer Rates Mbit/sApp TCP Stack SuperMicro on

MB-NGSuperMicro on

SuperJANET4

BaBar on

SuperJANET4

SC2004 on UKLight

Iperf Standard 940 350-370 425 940

HighSpeed 940 510 570 940

Scalable 940 580-650 605 940

bbcp Standard 434 290-310 290

HighSpeed 435 385 360

Scalable 432 400-430 380

bbftp Standard 400-410 325 320 825

HighSpeed 370-390 380

Scalable 430 345-532 380 875

apache Standard 425 260 300-360

HighSpeed 430 370 315

Scalable 428 400 317

Gridftp Standard 405 240

HighSpeed 320

Scalable 335

New stacksgive more

throughput

Rate decreases

Page 68: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 68Richard Hughes-Jones

iperf Throughput + Web100 SuperMicro on MB-NG network HighSpeed TCP Linespeed 940 Mbit/s DupACK ? <10 (expect ~400)

BaBar on Production network Standard TCP 425 Mbit/s DupACKs 350-400 – re-transmits

Page 69: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 69Richard Hughes-Jones

Applications: Throughput Mbit/s HighSpeed TCP 2 GByte file RAID5 SuperMicro + SuperJANET

bbcp

bbftp

Apachie

Gridftp

Previous work used RAID0(not disk limited)

Page 70: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 70Richard Hughes-Jones

bbcp & GridFTP Throughput 2Gbyte file transferred RAID5 - 4disks Manc – RAL bbcp Mean 710 Mbit/s

GridFTP See many zeros

Mean ~710

Mean ~620

DataTAG altAIMD kernel in BaBar & ATLAS

Page 71: End-user systems: NICs, MotherBoards, Disks,  TCP Stacks & Applications

Slide: 71Richard Hughes-Jones

tcpdump of the T/DAQ dataflow at SFI (1)Cern-Manchester 1.0 Mbyte event

Remote EFD requests event from SFI

Incoming event request

Followed by ACK

N 1448 byte packets

SFI sends event

Limited by TCP receive buffer

Time 115 ms (~4 ev/s)

When TCP ACKs arrive

more data is sent.

●●●