ggf4 toronto feb 2002 r. hughes-jones manchester initial performance measurements gigabit ethernet...
TRANSCRIPT
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
Initial Performance MeasurementsGigabit Ethernet NICs
64 bit PCI Motherboards(Work in progress Mar 02)
Collaboration:Boston Ltd. (Watford) – SuperMicro Motherboards, CPUs, Intel GE NICsBrunel University – Peter Van SantenUniversity of Manchester – Richard Hughes-Jones
www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.ppt
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
The Measurements (1)
Latency Round trip times measured using Request-Response
UDP frames Latency as a function of frame size
Slope gives sum of individual data transfer rates end-to-end Mem copy + pci + Gig Ethernet + pci + mem copy
Histograms of individual measurements
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
The Measurements (2)
UDP Throughput Send a burst of UDP frames spaced at regular intervals Vary the frame size and the frame transmit spacing
Record The time to send and the time to receive the frames The number received, the number lost, number out of order The received inter-packet spacing CPU load, Number of interrupts
Zero stats
OK done
●●●
Get remote statistics
Send statistics
Send data frames atregular intervals ●●●
Time to send Time to receive
Inter-packet time
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
The Measurements (3)
PCI Activity Logic Analyzer with
PCI Probe cards in sending PC Gigabit Ethernet Fiber Probe Card PCI Probe cards in receiving PC
GigabitEthernet
ProbeCPU
mem
chipset
NIC
CPU
mem
NIC
chipset
Logic AnalyserDisplay
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
Latency: Alteon AceNIC Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14
0
50
100
150
200
250
300
350
400
0 500 1000 1500 2000 2500 3000
Message length bytesL
ate
ncy u
s
PC=PC Alteon
0
100
200
300
400
500
600
0 2000 4000 6000 8000 10000 12000 14000 16000
Message length bytes
Late
ncy u
s
PC=PC Alteon
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
UDP Throughput: Alteon AceNIC Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14
UDP Alteon
0
100
200
300
400
500
600
700
800
900
1000
0 5 10 15 20 25 30 35 40
Transmit Time per frame us
Rec
v W
ire
rate
Mb
its/
s
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
PCI: Alteon AceNIC Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit RedHat 7.1 Kernel 2.4.14
ALT33102 PCI 33 MHz 1400 bytes sent Wait 16 us
ALT66101 66 MHz 1400 bytes sent Wait 16 us NIC cannot sustain 66 MHz
Send PCI
Receive PCI
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
Latency: SysKonnect SK-9843 Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 33 MHz RedHat 7.1 Kernel 2.4.14
y = 0.0252x + 46.887
y = 0.0127x + 64.183
0
20
40
60
80
100
0 500 1000 1500 2000 2500 3000
Message length bytes
Late
ncy u
s
PC=PC UDP SysKonnet
0
50
100
150
200
250
300
0 2000 4000 6000 8000 10000 12000 14000 16000
Message length bytes
Late
ncy u
s
ave time
Latency low good Latency well behaved Slope 0.0252 us/byte Expect:
PCI 0.00758
GigE 0.008
PCI 0.00758
0.0236 us/byte
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
UDP Throughput: SysKonnect SK-9843 Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 33 MHz RedHat 7.1 Kernel 2.4.14
UDP SysKonnect
0
100
200
300
400
500
600
700
800
0 5 10 15 20 25 30 35 40
Transmit Time per frame us
Rec
v W
ire
rate
Mb
its/
s
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
Max throughput 690Mbit/s No packet loss
Packet loss during dropUDP SysKonnnect : 370DLE 64bit 33MHz
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20 25 30 35 40
Transmit Time per frame us
% P
acket
loss
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
PCI: SysKonnect SK-9843 Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14
SK300 1400 bytes sent Wait 100 us ~8 us for send or receive
Gigabit Ethernet frame
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
PCI: SysKonnect SK-9843 Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14
SK301 1400 bytes sent Wait 20 us
Sk303 1400 bytes sent Wait 10 us Frames are back-to-back Cannot go any faster !
Gig Eth frames back to back
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
Latency: Intel Pro/1000 Motherboard: SuperMicro 370DLE Chipset:: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14
y = 0.0187x + 167.86
0
50
100
150
200
250
0 200 400 600 800 1000 1200 1400
Message length bytesL
ate
ncy u
s
Intel pro/1000 ave time
0
50
100
150
200
250
300
350
400
450
0 2000 4000 6000 8000 10000 12000 14000 16000
Message length bytes
Late
ncy u
s
Intel pro/1000 ave time
Latency high Latency well behaved Slope 0.0187 us/byte Expect:
PCI 0.00188
GigE 0.008
PCI 0.00188
0.0118 us/byte
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
PCI: Intel Pro/1000 Motherboard: SuperMicro 370DLE Chipset:: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14
IT66M200 64 bytes sent CSR time: 1.75 us Data time 0.25 us Interrupt delay:~70 us
1400 response
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
Throughput: Intel Pro/1000 Motherboard: SuperMicro 370DLE Chipset:: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14 UDP Intel Pro/1000 board: 370DLE
0
100
200
300
400
500
600
700
800
900
1000
0 5 10 15 20 25 30 35 40
Transmit Time per frame us
Rec
v W
ire
rate
Mb
its/
s
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
Max throughput 910Mbit/s No packet loss
Packet loss during drop
UDP Intel Pro/1000 : 370DLE 64bit 66MHz
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20 25 30 35 40
Transmit Time per frame us
% P
acket
loss
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
Throughput: Intel Pro/1000 Motherboard: SuperMicro 370DLE Chipset:: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14
losses occur in groups ~50 pkts every 140
1400 bytes Intel Pro/1000 : 370DLE 64bit 66MHz
0
1000
2000
3000
4000
5000
6000
7000
0 100 200 300 400 500 600
Frame sequence no
Rec
v. T
ime
us
wait 8
wait 10
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
PCI: Intel Pro/1000 Motherboard: SuperMicro 370DLE Chipset:: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14
IT66M212 1400 bytes sent Wait 11 us ~4.7us on send PCI bus PCI bus ~45% occupancy ~ 3.25 us on PCI for data recv
IT66M212 1400 bytes sent Wait 11 us Packets lost Action of pause packet?
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
Latency: Intel Pro/1000 on P4CD6+ Motherboard: SuperMicro P4CD6+ Chipset: Intel i860 CPU: Dual Xeon Prestonia (2cpu/die) 2.2 GHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.2
y = 0.0195x + 186.01
0
50
100
150
200
250
300
0 500 1000 1500 2000 2500 3000
Message length bytes
Late
ncy u
s
ave time
0
50
100
150
200
250
300
350
400
450
500
0 2000 4000 6000 8000 10000 12000 14000 16000
Message length bytes
Late
ncy u
s
ave time
Latency high Slope 0.0195 us/byte Expect:
PCI 0.00188
GigE 0.008
PCI 0.00188
0.0118 us/byte
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
Throughput: Intel Pro/1000 on P4CD6+ Motherboard: SuperMicro P4CD6+ Chipset: Intel i860 CPU: Dual Xeon Prestonia (2cpu/die) 2.2 GHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.2 UDP Intel Pro/1000 board:
0
100
200
300
400
500
600
700
800
900
1000
0 5 10 15 20 25 30 35 40
Transmit Time per frame us
Rec
v W
ire
rate
Mb
its/
s
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
UDP Intel Pro/1000 : P4CD6+ 64bit 66MHz
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20 25 30 35 40
Transmit Time per frame us
% P
acket
loss
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
Max throughput 950Mbit/s No packet loss
Negligible Packet loss
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
PCI: Intel Pro/1000 on P4CD6+ Motherboard: SuperMicro P4CD6+ Chipset: Intel i860 CPU: Dual Xeon Prestonia (2cpu/die) 2.2 GHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.2
IT66M220 1400 bytes sent Wait 1000 us CSR time: 12.25 us Data time 5.0 us Interrupt delay:~79 us
IT66M224 1400 bytes sent Wait 100 us Detail Chipset limits PCI transfers
with STOPs Try i870 Chipset
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
PCI: Intel Pro/1000 on P4CD6+ Motherboard: SuperMicro P4CD6+ Chipset: Intel i860 CPU: Dual Xeon Prestonia (2cpu/die) 2.2 GHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.2
IT66M221 1400 bytes sent Wait 11 us
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
0
50
100
150
200
250
300
350
400
450
0 2000 4000 6000 8000 10000 12000 14000 16000
Message length bytes
Late
ncy u
s
Intel pro1000 IBMdas 64bit 33 MHz
y = 0.0206x + 181.47
0
50
100
150
200
250
300
0 200 400 600 800 1000 1200 1400
Message length bytesL
ate
ncy u
s
Latency: Intel Pro/1000 on IBM board Motherboard: IBM das Chipset:: ServerWorks CNB20LE CPU: Dual PIII 1GHz PCI:64 bit 33 MHz RedHat 7.1 Kernel 2.4.14
Latency high Latency well behaved Slope 0.0206 us/byte Expect:
PCI 0.00376
GigE 0.008
PCI 0.00376
0.0155 us/byte
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
UDP Intel pro1000 IBMdas 64bit 33 MHz
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20 25 30 35 40
Transmit Time per frame us
Recv W
ire r
ate
Mb
its/s
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
Throughput: Intel Pro/1000 on IBM board Motherboard: IBM das Chipset:: ServerWorks CNB20LE CPU: Dual PIII 1GHz PCI:64 bit 33 MHz RedHat 7.1 Kernel 2.4.14
Max throughput 930Mbit/s No packet loss
Packet loss during drop
UDP Intel pro1000 IBMdas 64bit 33 MHz
0
100
200
300
400
500
600
700
800
900
1000
0 5 10 15 20 25 30 35 40
Transmit Time per frame us
Rec
v W
ire
rate
Mb
its/
s
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
PCI: Intel Pro/1000 on IBM board Motherboard: IBM das Chipset:: ServerWorks CNB20LE CPU: Dual PIII 1GHz PCI:64 bit 33 MHz RedHat 7.1 Kernel 2.4.14
uva64m02 1400 bytes sent Wait 11 us ~9.3us on send PCI bus PCI bus ~82% occupancy ~ 5.9 us on PCI for data recv.
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
IntelPro1000 : P4DP6 64bit 66MHz PCI slot4
0
50
100
150
200
250
300
350
400
0 2000 4000 6000 8000 10000 12000 14000 16000
Message length bytes
Late
ncy u
s
IntelPro1000 : P4DP6 64bit 66MHz PCI slot4
y = 0.0135x + 174.05
y = 0.0121x + 178.31
0
50
100
150
200
250
0 500 1000 1500 2000 2500 3000
Message length bytes
Late
ncy u
s
Latency: Intel Pro/1000 on P4DP6 Motherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia (2cpu/die) 2.2 GHz Slot 4: PCI, 64 bit, 66 MHz RedHat 7.2 Kernel 2.4.14
Latency high but smooth Indicates Interrupt coalescence Slope 0.0136 us/byte Expect:
PCI 0.00188
GigE 0.008
PCI 0.00188
0.0118 us/byte
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
UDP IntelPro1000 : P4DP6 64bit 66MHz PCI slot4
0
100
200
300
400
500
600
700
800
900
1000
0 5 10 15 20 25 30 35 40
Transmit Time per frame usR
ecv
Wir
e ra
te M
bit
s/s
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
Throughput: Intel Pro/1000 on P4DP6
Max throughput 950Mbit/s Some throughput drop for packets
>1000 bytes
Packet loss small 800 – 1000 byte packets
Motherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia (2cpu/die) 2.2 GHz Slot 4: PCI, 64 bit, 66 MHz RedHat 7.2 Kernel 2.4.14
UDP IntelPro1000 : P4DP6 64bit 66MHz PCI slot4
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30 35 40
Transmit Time per frame us
% P
acket
loss
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
PCI: Intel Pro/1000 on P4DP6
ITP4007 1400 bytes sent Wait 1000 us Send: CSR time: 2.0 us Send: Data time 3.25 us Recv: Data time 2.2 us Slot 3 to slot 5
ITP4001 Detail of 1400 bytes sent CSR time 2.2 us Data time 3.2 us Slot 4 to slot 4
Small differences between slots
Motherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia (2cpu/die) 2.2 GHz Slots: PCI, 64 bit, 66 MHz RedHat 7.2 Kernel 2.4.14
GGF4 Toronto Feb 2002R. Hughes-Jones Manchester
PCI: Intel Pro/1000 on P4DP6
ITP4010 1400 bytes sent Wait 8 us ~5.14us on send PCI bus PCI bus ~68% occupancy ~ 2 us on PCI for data recv
Motherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia (2cpu/die) 2.2 GHz Slot 3-5: PCI, 64 bit, 66 MHz RedHat 7.2 Kernel 2.4.14