ggf4 toronto feb 2002 r. hughes-jones manchester initial performance measurements gigabit ethernet...

27
GGF4 Toronto Feb 2002 R. Hughes-Jones Manchester Initial Performance Measurements Gigabit Ethernet NICs 64 bit PCI Motherboards (Work in progress Mar 02) Collaboration: Boston Ltd. (Watford) – SuperMicro Motherboards, CPUs, Intel GE NICs Brunel University – Peter Van Santen University of Manchester – Richard Hughes-Jones www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.ppt

Upload: isaac-francis

Post on 24-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

Initial Performance MeasurementsGigabit Ethernet NICs

64 bit PCI Motherboards(Work in progress Mar 02)

Collaboration:Boston Ltd. (Watford) – SuperMicro Motherboards, CPUs, Intel GE NICsBrunel University – Peter Van SantenUniversity of Manchester – Richard Hughes-Jones

www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.ppt

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

The Measurements (1)

Latency Round trip times measured using Request-Response

UDP frames Latency as a function of frame size

Slope gives sum of individual data transfer rates end-to-end Mem copy + pci + Gig Ethernet + pci + mem copy

Histograms of individual measurements

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

The Measurements (2)

UDP Throughput Send a burst of UDP frames spaced at regular intervals Vary the frame size and the frame transmit spacing

Record The time to send and the time to receive the frames The number received, the number lost, number out of order The received inter-packet spacing CPU load, Number of interrupts

Zero stats

OK done

●●●

Get remote statistics

Send statistics

Send data frames atregular intervals ●●●

Time to send Time to receive

Inter-packet time

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

The Measurements (3)

PCI Activity Logic Analyzer with

PCI Probe cards in sending PC Gigabit Ethernet Fiber Probe Card PCI Probe cards in receiving PC

GigabitEthernet

ProbeCPU

mem

chipset

NIC

CPU

mem

NIC

chipset

Logic AnalyserDisplay

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

Latency: Alteon AceNIC Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14

0

50

100

150

200

250

300

350

400

0 500 1000 1500 2000 2500 3000

Message length bytesL

ate

ncy u

s

PC=PC Alteon

0

100

200

300

400

500

600

0 2000 4000 6000 8000 10000 12000 14000 16000

Message length bytes

Late

ncy u

s

PC=PC Alteon

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

UDP Throughput: Alteon AceNIC Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14

UDP Alteon

0

100

200

300

400

500

600

700

800

900

1000

0 5 10 15 20 25 30 35 40

Transmit Time per frame us

Rec

v W

ire

rate

Mb

its/

s

50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

PCI: Alteon AceNIC Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit RedHat 7.1 Kernel 2.4.14

ALT33102 PCI 33 MHz 1400 bytes sent Wait 16 us

ALT66101 66 MHz 1400 bytes sent Wait 16 us NIC cannot sustain 66 MHz

Send PCI

Receive PCI

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

Latency: SysKonnect SK-9843 Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 33 MHz RedHat 7.1 Kernel 2.4.14

y = 0.0252x + 46.887

y = 0.0127x + 64.183

0

20

40

60

80

100

0 500 1000 1500 2000 2500 3000

Message length bytes

Late

ncy u

s

PC=PC UDP SysKonnet

0

50

100

150

200

250

300

0 2000 4000 6000 8000 10000 12000 14000 16000

Message length bytes

Late

ncy u

s

ave time

Latency low good Latency well behaved Slope 0.0252 us/byte Expect:

PCI 0.00758

GigE 0.008

PCI 0.00758

0.0236 us/byte

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

UDP Throughput: SysKonnect SK-9843 Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 33 MHz RedHat 7.1 Kernel 2.4.14

UDP SysKonnect

0

100

200

300

400

500

600

700

800

0 5 10 15 20 25 30 35 40

Transmit Time per frame us

Rec

v W

ire

rate

Mb

its/

s

50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

Max throughput 690Mbit/s No packet loss

Packet loss during dropUDP SysKonnnect : 370DLE 64bit 33MHz

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30 35 40

Transmit Time per frame us

% P

acket

loss

50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

PCI: SysKonnect SK-9843 Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14

SK300 1400 bytes sent Wait 100 us ~8 us for send or receive

Gigabit Ethernet frame

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

PCI: SysKonnect SK-9843 Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14

SK301 1400 bytes sent Wait 20 us

Sk303 1400 bytes sent Wait 10 us Frames are back-to-back Cannot go any faster !

Gig Eth frames back to back

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

Latency: Intel Pro/1000 Motherboard: SuperMicro 370DLE Chipset:: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14

y = 0.0187x + 167.86

0

50

100

150

200

250

0 200 400 600 800 1000 1200 1400

Message length bytesL

ate

ncy u

s

Intel pro/1000 ave time

0

50

100

150

200

250

300

350

400

450

0 2000 4000 6000 8000 10000 12000 14000 16000

Message length bytes

Late

ncy u

s

Intel pro/1000 ave time

Latency high Latency well behaved Slope 0.0187 us/byte Expect:

PCI 0.00188

GigE 0.008

PCI 0.00188

0.0118 us/byte

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

PCI: Intel Pro/1000 Motherboard: SuperMicro 370DLE Chipset:: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14

IT66M200 64 bytes sent CSR time: 1.75 us Data time 0.25 us Interrupt delay:~70 us

1400 response

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

Throughput: Intel Pro/1000 Motherboard: SuperMicro 370DLE Chipset:: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14 UDP Intel Pro/1000 board: 370DLE

0

100

200

300

400

500

600

700

800

900

1000

0 5 10 15 20 25 30 35 40

Transmit Time per frame us

Rec

v W

ire

rate

Mb

its/

s

50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

Max throughput 910Mbit/s No packet loss

Packet loss during drop

UDP Intel Pro/1000 : 370DLE 64bit 66MHz

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30 35 40

Transmit Time per frame us

% P

acket

loss

50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

Throughput: Intel Pro/1000 Motherboard: SuperMicro 370DLE Chipset:: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14

losses occur in groups ~50 pkts every 140

1400 bytes Intel Pro/1000 : 370DLE 64bit 66MHz

0

1000

2000

3000

4000

5000

6000

7000

0 100 200 300 400 500 600

Frame sequence no

Rec

v. T

ime

us

wait 8

wait 10

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

PCI: Intel Pro/1000 Motherboard: SuperMicro 370DLE Chipset:: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14

IT66M212 1400 bytes sent Wait 11 us ~4.7us on send PCI bus PCI bus ~45% occupancy ~ 3.25 us on PCI for data recv

IT66M212 1400 bytes sent Wait 11 us Packets lost Action of pause packet?

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

Latency: Intel Pro/1000 on P4CD6+ Motherboard: SuperMicro P4CD6+ Chipset: Intel i860 CPU: Dual Xeon Prestonia (2cpu/die) 2.2 GHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.2

y = 0.0195x + 186.01

0

50

100

150

200

250

300

0 500 1000 1500 2000 2500 3000

Message length bytes

Late

ncy u

s

ave time

0

50

100

150

200

250

300

350

400

450

500

0 2000 4000 6000 8000 10000 12000 14000 16000

Message length bytes

Late

ncy u

s

ave time

Latency high Slope 0.0195 us/byte Expect:

PCI 0.00188

GigE 0.008

PCI 0.00188

0.0118 us/byte

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

Throughput: Intel Pro/1000 on P4CD6+ Motherboard: SuperMicro P4CD6+ Chipset: Intel i860 CPU: Dual Xeon Prestonia (2cpu/die) 2.2 GHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.2 UDP Intel Pro/1000 board:

0

100

200

300

400

500

600

700

800

900

1000

0 5 10 15 20 25 30 35 40

Transmit Time per frame us

Rec

v W

ire

rate

Mb

its/

s

50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

UDP Intel Pro/1000 : P4CD6+ 64bit 66MHz

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30 35 40

Transmit Time per frame us

% P

acket

loss

50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

Max throughput 950Mbit/s No packet loss

Negligible Packet loss

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

PCI: Intel Pro/1000 on P4CD6+ Motherboard: SuperMicro P4CD6+ Chipset: Intel i860 CPU: Dual Xeon Prestonia (2cpu/die) 2.2 GHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.2

IT66M220 1400 bytes sent Wait 1000 us CSR time: 12.25 us Data time 5.0 us Interrupt delay:~79 us

IT66M224 1400 bytes sent Wait 100 us Detail Chipset limits PCI transfers

with STOPs Try i870 Chipset

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

PCI: Intel Pro/1000 on P4CD6+ Motherboard: SuperMicro P4CD6+ Chipset: Intel i860 CPU: Dual Xeon Prestonia (2cpu/die) 2.2 GHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.2

IT66M221 1400 bytes sent Wait 11 us

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

0

50

100

150

200

250

300

350

400

450

0 2000 4000 6000 8000 10000 12000 14000 16000

Message length bytes

Late

ncy u

s

Intel pro1000 IBMdas 64bit 33 MHz

y = 0.0206x + 181.47

0

50

100

150

200

250

300

0 200 400 600 800 1000 1200 1400

Message length bytesL

ate

ncy u

s

Latency: Intel Pro/1000 on IBM board Motherboard: IBM das Chipset:: ServerWorks CNB20LE CPU: Dual PIII 1GHz PCI:64 bit 33 MHz RedHat 7.1 Kernel 2.4.14

Latency high Latency well behaved Slope 0.0206 us/byte Expect:

PCI 0.00376

GigE 0.008

PCI 0.00376

0.0155 us/byte

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

UDP Intel pro1000 IBMdas 64bit 33 MHz

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30 35 40

Transmit Time per frame us

Recv W

ire r

ate

Mb

its/s

50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

Throughput: Intel Pro/1000 on IBM board Motherboard: IBM das Chipset:: ServerWorks CNB20LE CPU: Dual PIII 1GHz PCI:64 bit 33 MHz RedHat 7.1 Kernel 2.4.14

Max throughput 930Mbit/s No packet loss

Packet loss during drop

UDP Intel pro1000 IBMdas 64bit 33 MHz

0

100

200

300

400

500

600

700

800

900

1000

0 5 10 15 20 25 30 35 40

Transmit Time per frame us

Rec

v W

ire

rate

Mb

its/

s

50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

PCI: Intel Pro/1000 on IBM board Motherboard: IBM das Chipset:: ServerWorks CNB20LE CPU: Dual PIII 1GHz PCI:64 bit 33 MHz RedHat 7.1 Kernel 2.4.14

uva64m02 1400 bytes sent Wait 11 us ~9.3us on send PCI bus PCI bus ~82% occupancy ~ 5.9 us on PCI for data recv.

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

IntelPro1000 : P4DP6 64bit 66MHz PCI slot4

0

50

100

150

200

250

300

350

400

0 2000 4000 6000 8000 10000 12000 14000 16000

Message length bytes

Late

ncy u

s

IntelPro1000 : P4DP6 64bit 66MHz PCI slot4

y = 0.0135x + 174.05

y = 0.0121x + 178.31

0

50

100

150

200

250

0 500 1000 1500 2000 2500 3000

Message length bytes

Late

ncy u

s

Latency: Intel Pro/1000 on P4DP6 Motherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia (2cpu/die) 2.2 GHz Slot 4: PCI, 64 bit, 66 MHz RedHat 7.2 Kernel 2.4.14

Latency high but smooth Indicates Interrupt coalescence Slope 0.0136 us/byte Expect:

PCI 0.00188

GigE 0.008

PCI 0.00188

0.0118 us/byte

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

UDP IntelPro1000 : P4DP6 64bit 66MHz PCI slot4

0

100

200

300

400

500

600

700

800

900

1000

0 5 10 15 20 25 30 35 40

Transmit Time per frame usR

ecv

Wir

e ra

te M

bit

s/s

50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

Throughput: Intel Pro/1000 on P4DP6

Max throughput 950Mbit/s Some throughput drop for packets

>1000 bytes

Packet loss small 800 – 1000 byte packets

Motherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia (2cpu/die) 2.2 GHz Slot 4: PCI, 64 bit, 66 MHz RedHat 7.2 Kernel 2.4.14

UDP IntelPro1000 : P4DP6 64bit 66MHz PCI slot4

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20 25 30 35 40

Transmit Time per frame us

% P

acket

loss

50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

PCI: Intel Pro/1000 on P4DP6

ITP4007 1400 bytes sent Wait 1000 us Send: CSR time: 2.0 us Send: Data time 3.25 us Recv: Data time 2.2 us Slot 3 to slot 5

ITP4001 Detail of 1400 bytes sent CSR time 2.2 us Data time 3.2 us Slot 4 to slot 4

Small differences between slots

Motherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia (2cpu/die) 2.2 GHz Slots: PCI, 64 bit, 66 MHz RedHat 7.2 Kernel 2.4.14

GGF4 Toronto Feb 2002R. Hughes-Jones Manchester

PCI: Intel Pro/1000 on P4DP6

ITP4010 1400 bytes sent Wait 8 us ~5.14us on send PCI bus PCI bus ~68% occupancy ~ 2 us on PCI for data recv

Motherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia (2cpu/die) 2.2 GHz Slot 3-5: PCI, 64 bit, 66 MHz RedHat 7.2 Kernel 2.4.14