udp flow charts for linux-2.6.36.2 and different cpus...
TRANSCRIPT
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 1
Table of Contents
UDP PROCESS TIMING TEST INFORMATION ........................................................................................................... 3
SYSTEM INFORMATION FOR TESTS ..................................................................................................................................... 3
Linux Information .................................................................................................................................................. 3
Network Interface Information ............................................................................................................................. 3
Test for: Westmere ............................................................................................................................................... 3 netperf Side ........................................................................................................................................................................ 3 netserver Side: GC-NH5 test system .................................................................................................................................. 3
Test for: Jaketown B0 to B0 ................................................................................... Error! Bookmark not defined. netperf Side .......................................................................................................................... Error! Bookmark not defined. netserver Side ...................................................................................................................... Error! Bookmark not defined.
Test for: DELL Westmere ....................................................................................................................................... 4 netperf Side ........................................................................................................................................................................ 4 netserver Side: GC-NH5 test system .................................................................................................................................. 4
Test for: Single Core DELL Westmere .................................................................................................................... 4 netperf Side ........................................................................................................................................................................ 4 netserver Side: GC-NH5 test system .................................................................................................................................. 4
Test for: 2S Nehalem ............................................................................................................................................. 4 netperf Side ........................................................................................................................................................................ 5 netserver Side: GC-NH5 test system .................................................................................................................................. 5
UDP PROCESS TIMING RANGE ACROSS MULTIPLE PROCESSORS ............................................................................ 6
UDP PROCESS TIMING RANGE FOR WESTMERE PROCESSOR ............................................................................... 11
UDP PROCESS TIMING RANGE FOR JAKETOWN B0 TO JAKETOWN B0 ................ ERROR! BOOKMARK NOT DEFINED.
UDP PROCESS TIMING FOR NONE-SMP SINGLE CORE WESTMERE CPU ................................................................ 16
UDP PROCESS TIMING RANGE FOR DELL PRODUCTION WESTMERE .................................................................... 22
UDP PROCESS TIMING RANGE FOR NEHALEM ..................................................................................................... 28
UDP PROCESS TIMING RANGE FOR NEHALEM WITH 1 SOCKETED CPU................................................................. 33
Table of Figures
FIGURE 1: MULTIPLE SYSTEMS SMP SOCKET & UDP TX PROCESSING ABOVE IP STACK ................................................................... 6
FIGURE 2: MULTIPLE SYSTEMS SMP UDP TX PROCESSING BELOW IP STACK ................................................................................ 7
FIGURE 3: MULTIPLE SYSTEMS SMP SOCKET & UDP RX PROCESSING ABOVE IP STACK .................................................................. 8
FIGURE 4: MULTIPLE SYSTEMS SMP IXGBE INTERRUPT DRIVEN UDP RX PROCESSING BELOW IP STACK ........................................... 9
FIGURE 5: MULTIPLE SYSTEMS SMP RX PROTOCOL STACK PROCESSING OF UDP ABOVE IP STACK .................................................. 10
FIGURE 6: MULTIPLE SYSTEMS SMP IXGBE_LOW_LATENCY_RECV() DRIVER RX FLUSH PROCESSING .... ERROR! BOOKMARK NOT DEFINED.
FIGURE 7: WESTMERE SMP SOCKET & UDP TX PROCESSING ABOVE IP STACK............................................................................ 11
FIGURE 8: WESTMERE SMP UDP TX PROCESSING BELOW IP STACK ......................................................................................... 12
FIGURE 9: WESTMERE SMP SOCKET & UDP RX PROCESSING ABOVE IP STACK ........................................................................... 13
FIGURE 10: WESTMERE SMP IXGBE INTERRUPT DRIVEN UDP RX PROCESSING BELOW IP STACK .................................................. 14
FIGURE 11: WESTMERE SMP RX PROTOCOL STACK PROCESSING OF UDP ABOVE IP STACK ........................................................... 15
FIGURE 12: WESTMERE SMP IXGBE_LOW_LATENCY_RECV() DRIVER RX FLUSH PROCESSING ............. ERROR! BOOKMARK NOT DEFINED.
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 2
FIGURE 13: JAKETOWN B0 TO B0 SMP SOCKET & UDP TX PROCESSING ABOVE IP STACK ................ ERROR! BOOKMARK NOT DEFINED.
FIGURE 14: JAKETOWN B0 TO B0 SMP UDP TX PROCESSING BELOW IP STACK .............................. ERROR! BOOKMARK NOT DEFINED.
FIGURE 15: JAKETOWN B0 TO B0 SMP SOCKET & UDP RX PROCESSING ABOVE IP STACK ................ ERROR! BOOKMARK NOT DEFINED.
FIGURE 16: JAKETOWN B0 TO B0 SMP IXGBE INTERRUPT DRIVEN UDP RX PROCESSING BELOW IP STACK...... ERROR! BOOKMARK NOT
DEFINED.
FIGURE 17: JAKETOWN B0 TO B0 SMP RX PROTOCOL STACK PROCESSING OF UDP ABOVE IP STACK .. ERROR! BOOKMARK NOT DEFINED.
FIGURE 18: JAKETOWN B0 TO B0 SMP IXGBE_LOW_LATENCY_RECV() DRIVER RX FLUSH PROCESSING ERROR! BOOKMARK NOT DEFINED.
FIGURE 19: SINGLE CORE DELL WESTMERE SOCKET & UDP TX PROCESSING ABOVE IP STACK ........................................................ 16
FIGURE 20: SINGLE CORE DELL WESTMERE UDP TX PROCESSING BELOW IP STACK ..................................................................... 17
FIGURE 21: SINGLE CORE DELL WESTMERE SOCKET & UDP RX PROCESSING ABOVE IP STACK ....................................................... 18
FIGURE 22: SINGLE CORE DELL WESTMERE IXGBE INTERRUPT DRIVEN UDP RX PROCESSING BELOW IP STACK ................................ 19
FIGURE 23: SINGLE CORE DELL WESTMERE RX PROTOCOL STACK PROCESSING OF UDP ABOVE IP STACK ......................................... 20
FIGURE 24: SINGLE CORE DELL WESTMERE IXGBE_LOW_LATENCY_RECV() DRIVER RX FLUSH PROCESSING ....................................... 21
FIGURE 25: DELL WESTMERE SMP SOCKET & UDP TX PROCESSING ABOVE IP STACK .................................................................. 22
FIGURE 26: DELL WESTMERE SMP UDP TX PROCESSING BELOW IP STACK ................................................................................ 23
FIGURE 27: DELL WESTMERE SMP SOCKET & UDP RX PROCESSING ABOVE IP STACK .................................................................. 24
FIGURE 28: DELL WESTMERE SMP IXGBE INTERRUPT DRIVEN UDP RX PROCESSING BELOW IP STACK........................................... 25
FIGURE 29: DELL WESTMERE SMP RX PROTOCOL STACK PROCESSING OF UDP ABOVE IP STACK.................................................... 26
FIGURE 30: DELL WESTMERE SMP IXGBE_LOW_LATENCY_RECV() DRIVER RX FLUSH PROCESSING.................................................. 27
FIGURE 31: 2S NEHALEM SMP SOCKET & UDP TX PROCESSING ABOVE IP STACK ....................................................................... 28
FIGURE 32: 2S NEHALEM SMP UDP TX PROCESSING BELOW IP STACK ..................................................................................... 29
FIGURE 33: 2S NEHALEM SMP SOCKET & UDP RX PROCESSING ABOVE IP STACK ....................................................................... 30
FIGURE 34: 2S NEHALEM SMP IXGBE INTERRUPT DRIVEN UDP RX PROCESSING BELOW IP STACK ............................................... 31
FIGURE 35: 2S NEHALEM SMP RX PROTOCOL STACK PROCESSING OF UDP ABOVE IP STACK ........................................................ 32
FIGURE 36: 2S NEHALEM SMP IXGBE_LOW_LATENCY_RECV() DRIVER RX FLUSH PROCESSING .......... ERROR! BOOKMARK NOT DEFINED.
FIGURE 37: NEHALEM, 1 CPU, SMP SOCKET & UDP TX PROCESSING ABOVE IP STACK................................................................ 33
FIGURE 38: NEHALEM, 1 CPU, SMP UDP TX PROCESSING BELOW IP STACK ............................................................................. 34
FIGURE 39: NEHALEM, 1 CPU, SMP SOCKET & UDP RX PROCESSING ABOVE IP STACK ............................................................... 35
FIGURE 40: NEHALEM, 1 CPU, SMP IXGBE INTERRUPT DRIVEN UDP RX PROCESSING BELOW IP STACK ........................................ 36
FIGURE 41: NEHALEM, 1 CPU, SMP RX PROTOCOL STACK PROCESSING OF UDP ABOVE IP STACK ................................................. 37
FIGURE 42: NEHALEM, 1 CPU, SMP IXGBE_LOW_LATENCY_RECV() DRIVER RX FLUSH PROCESSING ... ERROR! BOOKMARK NOT DEFINED.
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 3
UDP Process Timing Test Information The tests used netperf to netserver UDP_RR data transfer to provide dataflow for collecting network
process timing data. Timing test data was collected on the side running netperf.
The timing data was collected by logging the current processor clock cycles of a moderate number of
locations using Linux kernel get_cycles() function then calculating time and averaging over 4096 packets,
then printing results and writing results to the kernel log. After the test the relevant kernel log data is
saved and put into a spread sheet to calculate the average over the whole test. Many tests with
measurements at different kernel locations with a complete kernel build and reboot were used to
collect the full timing data set.
System Information for Tests
Linux Information
The Linux release kernel 2.6.36.2 was used as the base kernel with and flush kernel change patch. An
IXGBE device driver patch to change the IXGBE device driver to the Intel low latency IXGBE test driver
version LL84P (Low Latency ixgbe-2.0.84.9P+LL-tx+tx-Fdir) for the base test kernel. Each test used the
base kernel with a test kernel patch to create the test kernel for each test.
For the test case of low latency polling disabled which is the base interrupt based test, the Low Latency
Receive Flush, Low Latency TX Flow Change and Low Latency TCP Receive Flush are disabled in the
Linux kernel configuration.
Network Interface Information
NIANTIC interface with interrupts affinitized on both sides.
Test for: Westmere
netperf Side
3.3 GHz Westmere 6 core, CPU Family 6, Model 44-X5680, Step 2 (B1 206C2) with 2 CPUs socketed,
12288KB cache, QPI 6.4 GT/s rate, Hyperthreading enabled.
1333 MHz memory bus, 12 GB memory (6 GB using 3 X 2GB memory modules per socket)
CPU Socket 0 core was used for test.
netserver Side: GC-NH5 test system
2.932 GHz Nehalem 4 core, CPU Family 6, Model 26 Nehalem B0/B1, Step 2 (106A2), with only 1 CPU
socketed. 8192 KB cache, QPI 5.866 GT/s rate, Hyperthreading enabled.
1067 MHz memory bus, 6 GB memory (6 GB using 3 X 2GB memory modules for socket 0).
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 4
Test for: DELL Westmere
System is a DELL PowerEdge 710 production server.
netperf Side
2.93 GHz Westmere 6 core, CPU Family 6, Model 44-X5670, Step 2 with 2 CPUs socketed, 12288KB
cache, Hyperthreading enabled.
12 GB memory installed.
No CPU Socket core was set for test.
netserver Side: GC-NH5 test system
2.932 GHz Nehalem 4 core, CPU Family 6, Model 26 Nehalem B0/B1, Step 2 (106A2), with only 1 CPU
socketed. 8192 KB cache, QPI 5.866 GT/s rate, Hyperthreading enabled.
1067 MHz memory bus, 6 GB memory (6 GB using 3 X 2GB memory modules for socket 0).
Test for: Single Core DELL Westmere
System is a DELL PowerEdge 710 production server.
System is same as DELL Westmere, but with SMP disabled in kernel build for single core, non-SMP timing
data. This test provides for comparing non-SMP network data timing with SMP data timing.
netperf Side
2.93 GHz Westmere 6 core, CPU Family 6, Model 44-X5670, Step 2 with 2 CPUs socketed, 12288KB
cache, single core non-SMP kernel build.
12 GB memory installed.
No CPU Socket core was set for test.
netserver Side: GC-NH5 test system
2.932 GHz Nehalem 4 core, CPU Family 6, Model 26 Nehalem B0/B1, Step 2 (106A2), with only 1 CPU
socketed. 8192 KB cache, QPI 5.866 GT/s rate, Hyperthreading enabled.
1067 MHz memory bus, 6 GB memory (6 GB using 3 X 2GB memory modules for socket 0).
Test for: 2S Nehalem
2S is for two CPU socketed in system. Data, data for same system with one CPU pulled out was collected
(but not included in the diagrams as of this time).
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 5
netperf Side
3.07 GHz Nehalem 4 core, CPU Family 6, Model 26, Step 2 with 2 CPUs socketed, 8192 KB cache,
Hyperthreading enabled.
Netperf was not affinitized to any CPU Socket.
netserver Side: GC-NH5 test system
2.932 GHz Nehalem 4 core, CPU Family 6, Model 26 Nehalem B0/B1, Step 2 (106A2), with only 1 CPU
socketed. 8192 KB cache, QPI 5.866 GT/s rate, Hyperthreading enabled.
1067 MHz memory bus, 6 GB memory (6 GB using 3 X 2GB memory modules for socket 0).
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 6
UDP Process Timing Range across Multiple Processors
Figure 1: Multiple Systems SMP Socket & UDP TX Processing above IP Stack
Application{}
sock_alloc_send_skb()
ip_local_out()
(Delivers to Protocol Stack)
udp_push_pending_frames ()
126-238
241-339
56-179
66-123
255-444 (size 1)
262-440 (size 1400)
25-52
373-519
44-193
sendto()
recvfrom()
ip_append_data()
ip_push_pending_frames ()
161-263 (size 1)
168-267 (size 1400)
123-269 (size 1)
382-830 (size 1400)
2648-4071 (size 1)
3028-4762 (size 1400)
24-62
58-84
1582-2271 (size 1)
1694-2517 (size 1400)
sock_sendmsg()
sock->ops->sendmsg()
udp_sendmsg()
__sock_sendmsg()
45-329
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 7
Figure 2: Multiple Systems SMP UDP TX Processing Below IP Stack
__dev_xmit_skb()
ixgbe_tx_queue()
ip_local_out()
(Delivers to Protocol Stack)
227-626
656-950
51-100
978-1312
54-103
33
dev_hard_start_xmit()
+dev()
dev_queue_xmit()
25-35
46-64
1582-2271 (size 1)
1694-2517 (size 1400)
sch_direct_xmit()
op->ndo_start_xmit()
ixgbe_xmit_frame()
33-72
33
152-247
33
24-31
49-430
24-346
34-46
45-329
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 8
Figure 3: Multiple Systems SMP Socket & UDP RX Processing above IP Stack
Application{}
sock->ops->recvmsg()
udp_recvmsg()
wait_for_packet()
44-138
24-29
24-89
255-444 (size 1)
262-440 (size 1400)
94-186
326-554
sendto()
recvfrom()
__skb_recv_datagram()
161-263 (size 1)
168-267 (size 1400)
145-239 (size 1)
403-788 (size 1400)
sock_recvmsg()
__sock_recvmsg()
__sock_recvmsg_nosec()
skb_free_datagram_locked()
De-Q Loop
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 9
Figure 4: Multiple Systems SMP IXGBE Interrupt Driven UDP RX Processing Below IP Stack
ixgbe_msix_clean_many()
ixgbe_clean_tx_irq()
net_rx_action()
(napi[] by n->poll())
napi_gro_receive()
__netif_receive_skb()
(Delivers to Protocol Stack)
Interrupt Based IXGBE 84 Network Packet Processing
159-251
76-111
98-142
174-210
3063-4204
(Async Wake)
1658-1911
(Async-Wake)
1312-1484 to
Dequeue
Note: RPS not shown
396-890
HW ISR SW ISR __do_softirq()
ixgbe_clean_rxtx_many()
ixgbe_clean_rx_irq()
ixgbe_receive_skb()
__napi_gro_receive()
dev_gro_receive()
napi_skb_finish()
netif_receive_skb()
441-849
42-78
132-158
24-27
32-46
27-51
24-27
45-52
41-55
575-687
68-101
24-38
RX Packet DeQed
122-161
337-739
1928-2285
(Async Wake) 24-36
ixgbe_alloc_rx_buff
ers ()
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 10
Figure 5: Multiple Systems SMP RX Protocol Stack Processing of UDP above IP Stack
__netif_receive_skb()
pt_prev->func()
Protocol Stack
Processing
__udp4_lib_rcv()
Queue SKB
In Socket Queue
393-576
24-39
60-134
134-221
56-78 (No-Wake)
714-895 (Async Wake)
24-56
101-130
24-27
udp_queue_rcv_skb()
ip_queue_rcv_skb()
sock_queue_rcv_skb()
sk->sk_data_ready()
(sock_def_readable())
__udp_queue_rcv_skb() skb_add_backlog()
or
wake_up_interruptible()
(Wake for Poll wait)
sk_wake_async()
(Wake for Asynchronous wait)
24-32
24-54
752-884 (No-Wake)
1658-1911 (Async Wake)
(for netif_receive_skb())
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 11
UDP Process Timing Range for Westmere Processor
Figure 6: Westmere SMP Socket & UDP TX Processing above IP Stack
Application{}
sock_alloc_send_skb()
ip_local_out()
(Delivers to Protocol Stack)
udp_push_pending_frames ()
151-174
256-336
57-179
100-108
291-347 (size 1)
289-343 (size 1400)
26-47
394-413
44-193
sendto()
recvfrom()
ip_append_data()
ip_push_pending_frames ()
172-251 (size 1)
178-251 (size 1400)
160-241 (size 1)
752-830 (size 1400)
3393-3723 (size 1)
4163-4461 (size 1400)
29-44
62-78
1996-2271 (size 1)
2370-2517 (size 1400)
sock_sendmsg()
sock->ops->sendmsg()
udp_sendmsg()
__sock_sendmsg()
70-88
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 12
Figure 7: Westmere SMP UDP TX Processing Below IP Stack
__dev_xmit_skb()
ixgbe_tx_queue()
ip_local_out()
(Delivers to Protocol Stack)
297-527
880-950
56-68
1224-1278
65-86
33
dev_hard_start_xmit()
+dev()
dev_queue_xmit()
25-35
52-64
1996-2271 (size 1)
2370-2517 (size 1400)
sch_direct_xmit()
op->ndo_start_xmit()
ixgbe_xmit_frame()
54-68
33
160-211
33
24-31
422-430
24-26
37-45
70-88
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 13
Figure 8: Westmere SMP Socket & UDP RX Processing above IP Stack
Application{}
sock->ops->recvmsg()
udp_recvmsg()
wait_for_packet()
50-105
25-85
291-347 (size 1)
289-343 (size 1400)
95-165
465-554
sendto()
recvfrom()
__skb_recv_datagram()
172-251 (size 1)
178-251 (size 1400)
181-206 (size 1)
774-788 (size 1400)
sock_recvmsg()
__sock_recvmsg()
__sock_recvmsg_nosec()
skb_free_datagram_locked()
De-Q Loop
24
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 14
Figure 9: Westmere SMP IXGBE Interrupt Driven UDP RX Processing Below IP Stack
ixgbe_msix_clean_many()
ixgbe_clean_tx_irq()
net_rx_action()
(napi[] by n->poll())
napi_gro_receive()
__netif_receive_skb()
(Delivers to Protocol Stack)
Interrupt Based IXGBE 84 Network Packet Processing
199-200
111
141-142
190-201
4131-4204
(Async Wake)
1818-1828
(Async-Wake)
1443-1484 to
Dequeue
Note: RPS not shown
885-890
HW ISR SW ISR __do_softirq()
ixgbe_clean_rxtx_many()
ixgbe_clean_rx_irq()
ixgbe_receive_skb()
__napi_gro_receive()
dev_gro_receive()
napi_skb_finish()
netif_receive_skb()
667-717
42-53
132-134
24
39
29
24
50-51
55
622-670
78-80
24
RX Packet DeQed
156-161
530-541
2200-2285
(Async Wake) 24
ixgbe_alloc_rx_buff
ers ()
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 15
Figure 10: Westmere SMP RX Protocol Stack Processing of UDP above IP Stack
__netif_receive_skb()
pt_prev->func()
Protocol Stack
Processing
__udp4_lib_rcv()
Queue SKB
In Socket Queue
464-576
25-34
65-134
170-197
57 (No-Wake)
891-895 (Async Wake)
26-32
109-113
24
udp_queue_rcv_skb()
ip_queue_rcv_skb()
sock_queue_rcv_skb()
sk->sk_data_ready()
(sock_def_readable())
__udp_queue_rcv_skb() skb_add_backlog()
or
wake_up_interruptible()
(Wake for Poll wait)
sk_wake_async()
(Wake for Asynchronous wait)
24-31
49
809-819 (No-Wake)
1818-1828 (Async Wake)
(for netif_receive_skb())
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 16
UDP Process Timing for Non-SMP Single Core Westmere CPU
Figure 11: Single Core Dell Westmere Socket & UDP TX Processing above IP Stack
Application{}
sock_alloc_send_skb()
ip_local_out()
(Delivers to Protocol Stack)
udp_push_pending_frames ()
114-120
228-363
54-71
97-101
239-301 (size 1)
240-292 (size 1400)
119-236
326-357
49-261
sendto()
recvfrom()
ip_append_data()
ip_push_pending_frames ()
163-251 (size 1)
159-251 (size 1400)
161-177 (size 1)
653-671 (size 1400)
2057-2379 (size 1)
2771-3087 (size 1400)
37-230
69-85
1020-1315 (size 1)
1081-1206 (size 1400)
sock_sendmsg()
sock->ops->sendmsg()
udp_sendmsg()
__sock_sendmsg()
58-68
Note: Swap in Delay
Across Data Sets
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 17
Figure 12: Single Core Dell Westmere UDP TX Processing Below IP Stack
__dev_xmit_skb()
ixgbe_tx_queue()
ip_local_out()
(Delivers to Protocol Stack)
249-354
240-335
94-139
514-632
36-57
33
dev_hard_start_xmit()
+dev()
dev_queue_xmit()
24-27
37-45
1020-1315 (size 1)
1081-1206 (size 1400)
sch_direct_xmit()
op->ndo_start_xmit()
ixgbe_xmit_frame()
35-41
33
142-187
33
24-27
36-39
24-26
37-42
58-68
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 18
Figure 13: Single Core Dell Westmere Socket & UDP RX Processing above IP Stack
Application{}
sock->ops->recvmsg()
udp_recvmsg()
wait_for_packet()
57-105
24
37-79
239-301 (size 1)
240-292 (size 1400)
93-170
266-291
sendto()
recvfrom()
__skb_recv_datagram()
163-251 (size 1)
159-251 (size 1400)
210-829 (size 1)
729-1405 (size 1400)
76-81
sock_recvmsg()
__sock_recvmsg()
__sock_recvmsg_nosec()
skb_free_datagram_locked()
De-Q Loop
88
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 19
Note: Non-SMP Interrupt Processing different interrupt and NAPI subroutines then SMP
processing.
Figure 14: Single Core Dell Westmere IXGBE Interrupt Driven UDP RX Processing Below IP
Stack
ixgbe_intr()
ixgbe_clean_tx_irq()
ixgbe_alloc_rx_buff
ers ()
net_rx_action()
(napi[] by n->poll())
napi_gro_receive()
__netif_receive_skb()
(Delivers to Protocol Stack)
Interrupt Based IXGBE 84 Network Packet Processing
406
2078
2610
(Async Wake)
1189-1216
(Async-Wake)
1481-1720 to
Dequeue
Note: RPS not shown
471-472
HW ISR SW ISR __do_softirq()
Ixgbe_poll()
ixgbe_clean_rx_irq()
ixgbe_receive_skb()
__napi_gro_receive()
dev_gro_receive()
napi_skb_finish()
netif_receive_skb()
469-470
33-35
137-148
46-48
45
26-29
24
39
48-49
732-733
86-87
24
RX Packet DeQed
86-87
391-393
1668-1669
(Async Wake) 24
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 20
Figure 15: Single Core Dell Westmere RX Protocol Stack Processing of UDP above IP Stack
__netif_receive_skb()
pt_prev->func()
Protocol Stack
Processing
__udp4_lib_rcv()
Queue SKB
In Socket Queue
433-463
25
51-68
165-187
24 (No-Wake)
428-444 (Async Wake)
24-34
62-68
24
udp_queue_rcv_skb()
ip_queue_rcv_skb()
sock_queue_rcv_skb()
sk->sk_data_ready()
(sock_def_readable())
__udp_queue_rcv_skb() skb_add_backlog()
or
wake_up_interruptible()
(Wake for Poll wait)
sk_wake_async()
(Wake for Asynchronous wait)
24-27
24-25
753-769 (No-Wake)
1189-1216 (Async Wake)
(for netif_receive_skb())
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 21
Figure 16: Single Core Dell Westmere ixgbe_low_latency_recv() Driver RX Flush Processing
netif_rx_ni()
netif_ni() __do_softirq()
net_rx_action()
(backlog in napi[] list,
by n->poll() call)
process_backlog()
__netif_receive_skb()
(Delivers to Protocol Stack)
49-50
94-97
75-82
71
30-35
846-853
(No-Wake)
753-769
(No-Wake)
75-80
48-55
40
Ixgbe_low_latency_recv()
Packet Detected
ixgbe_alloc_rx_buffer()
Ownership Released
62-123
25
300-304
24 402-410
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 22
UDP Process Timing Range for Dell Production Westmere
Figure 17: Dell Westmere SMP Socket & UDP TX Processing above IP Stack
Application{}
sock_alloc_send_skb()
ip_local_out()
(Delivers to Protocol Stack)
udp_push_pending_frames ()
126-172
285-328
62-138
82-100
255-337 (size 1)
262-333 (size 1400)
27-52
396-519
103-180
sendto()
recvfrom()
ip_append_data()
ip_push_pending_frames ()
161-240 (size 1)
168-241 (size 1400)
175-238 (size 1)
693-714 (size 1400)
3187-4071 (size 1)
3895-4762 (size 1400)
28-39
58-75
1986-2194 (size 1)
2130-2261 (size 1400)
sock_sendmsg()
sock->ops->sendmsg()
udp_sendmsg()
__sock_sendmsg()
61-121
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 23
Figure 18: Dell Westmere SMP UDP TX Processing Below IP Stack
__dev_xmit_skb()
ixgbe_tx_queue()
ip_local_out()
(Delivers to Protocol Stack)
269-626
812-950
53-60
1145-1268
54-103
33
dev_hard_start_xmit()
+dev()
dev_queue_xmit()
25-28
56-63
1986-2194 (size 1)
2130-2261 (size 1400)
sch_direct_xmit()
op->ndo_start_xmit()
ixgbe_xmit_frame()
50-56
33
152-242
33
24
393-424
24
37-46
61-121
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 24
Figure 19: Dell Westmere SMP Socket & UDP RX Processing above IP Stack
Application{}
sock->ops->recvmsg()
udp_recvmsg()
wait_for_packet()
75-88
24
42-75
255-337 (size 1)
262-333 (size 1400)
100-186
364-497
sendto()
recvfrom()
__skb_recv_datagram()
161-240 (size 1)
168-241 (size 1400)
158-221 (size 1)
685-741 (size 1400)
801-827
sock_recvmsg()
__sock_recvmsg()
__sock_recvmsg_nosec()
skb_free_datagram_locked()
De-Q Loop
108
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 25
Figure 20: Dell Westmere SMP IXGBE Interrupt Driven UDP RX Processing Below IP Stack
ixgbe_msix_clean_many()
ixgbe_clean_tx_irq()
net_rx_action()
(napi[] by n->poll())
napi_gro_receive()
__netif_receive_skb()
(Delivers to Protocol Stack)
Interrupt Based IXGBE 84 Network Packet Processing
251
101-106
101-107
177-196
3439-3544
(Async Wake)
1844-1911
(Async-Wake)
1325-1331 to
Dequeue
Note: RPS not shown
533-543
HW ISR SW ISR __do_softirq()
ixgbe_clean_rxtx_many()
ixgbe_clean_rx_irq()
ixgbe_receive_skb()
__napi_gro_receive()
dev_gro_receive()
napi_skb_finish()
netif_receive_skb()
610-651
55-56
153-158
24-25
44
46-51
26-27
45-52
41-42
592-635
79-81
33-38
RX Packet DeQed
122-123
720-739
2195-2270
(Async Wake) 33-36
ixgbe_alloc_rx_buff
ers ()
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 26
Figure 21: Dell Westmere SMP RX Protocol Stack Processing of UDP above IP Stack
__netif_receive_skb()
pt_prev->func()
Protocol Stack
Processing
__udp4_lib_rcv()
Queue SKB
In Socket Queue
397-560
24-25
60-110
156-208
57 (No-Wake)
838-874 (Async Wake)
874-884 (No-Wake)
24-39
112-115
24-26
udp_queue_rcv_skb()
ip_queue_rcv_skb()
sock_queue_rcv_skb()
sk->sk_data_ready()
(sock_def_readable())
__udp_queue_rcv_skb() skb_add_backlog()
or
wake_up_interruptible()
(Wake for Poll wait)
sk_wake_async()
(Wake for Asynchronous wait)
24-26
48-50
1844-1911 (Async Wake)
(for netif_receive_skb())
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 27
Figure 22: Dell Westmere SMP ixgbe_low_latency_recv() Driver RX Flush Processing
netif_rx_ni()
netif_ni() __do_softirq()
net_rx_action()
(backlog in napi[] list,
by n->poll() call)
process_backlog()
__netif_receive_skb()
(Delivers to Protocol Stack)
39-47
157-171
91-96
91-92
18-22
958-863
(No-Wake)
874-884
(No-Wake)
70-75
65-70
40
Ixgbe_low_latency_recv()
Packet Detected
ixgbe_alloc_rx_buffer()
Ownership Released
29-32
415-417
293-384
24 402-428
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 28
UDP Process Timing Range for Nehalem
Figure 23: 2S Nehalem SMP Socket & UDP TX Processing above IP Stack
Application{}
sock_alloc_send_skb()
ip_local_out()
(Delivers to Protocol Stack)
udp_push_pending_frames ()
153-238
246-339
57-143
97-123
295-329 (size 1)
295-329 (size 1400)
25-48
373-414
102-115
sendto()
recvfrom()
ip_append_data()
ip_push_pending_frames ()
208-222 (size 1)
206-221 (size 1400)
176-269 (size 1)
701-815 (size 1400)
3210-3500 (size 1)
4028-4249 (size 1400)
24-62
74-84
2010-2097 (size 1)
2115-2287 (size 1400)
sock_sendmsg()
sock->ops->sendmsg()
udp_sendmsg()
__sock_sendmsg()
259-329
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 29
Figure 24: 2S Nehalem SMP UDP TX Processing Below IP Stack
__dev_xmit_skb()
ixgbe_tx_queue()
ip_local_out()
(Delivers to Protocol Stack)
248-498
878-914
57-69
1125-1294
63-89
33
dev_hard_start_xmit()
+dev()
dev_queue_xmit()
25-27
51-53
2010-2097 (size 1)
2115-2287 (size 1400)
sch_direct_xmit()
op->ndo_start_xmit()
ixgbe_xmit_frame()
33-68
33
158-198
33
24
51-93
295-346
34-46
259-329
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 30
Figure 25: 2S Nehalem SMP Socket & UDP RX Processing above IP Stack
Application{}
sock->ops->recvmsg()
udp_recvmsg()
wait_for_packet()
53-126
24
26-80
295-329 (size 1)
295-329 (size 1400)
98-159
415-464
sendto()
recvfrom()
__skb_recv_datagram()
208-222 (size 1)
206-221 (size 1400)
193-205 (size 1)
460-744 (size 1400)
794-810
sock_recvmsg()
__sock_recvmsg()
__sock_recvmsg_nosec()
skb_free_datagram_locked()
De-Q Loop
106-107
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 31
Figure 26: 2S Nehalem SMP IXGBE Interrupt Driven UDP RX Processing Below IP Stack
ixgbe_msix_clean_many()
ixgbe_clean_tx_irq()
net_rx_action()
(napi[] by n->poll())
napi_gro_receive()
__netif_receive_skb()
(Delivers to Protocol Stack)
Interrupt Based IXGBE 84 Network Packet Processing
199-201
95-96
132
174-177
3492-3623
(Async Wake)
1392-1400 to
Dequeue
Note: RPS not shown
808-810
HW ISR SW ISR __do_softirq()
ixgbe_clean_rxtx_many()
ixgbe_clean_rx_irq()
ixgbe_receive_skb()
__napi_gro_receive()
dev_gro_receive()
napi_skb_finish()
netif_receive_skb()
762-849
43-50
143-144
24
41
27-28
24
47-48
49-50
676-679
68-69
24
RX Packet DeQed
139
536-551
2200-2278
(Async Wake) 24
1748-1780
(Async-Wake)
ixgbe_alloc_rx_buff
ers ()
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 32
Figure 27: 2S Nehalem SMP RX Protocol Stack Processing of UDP above IP Stack
__netif_receive_skb()
pt_prev->func()
Protocol Stack
Processing
__udp4_lib_rcv()
Queue SKB
In Socket Queue
472-561
25-33
72-121
162-201
56-57 (No-Wake)
817-830 (Async Wake)
36-56
102-108
24
udp_queue_rcv_skb()
ip_queue_rcv_skb()
sock_queue_rcv_skb()
sk->sk_data_ready()
(sock_def_readable())
__udp_queue_rcv_skb() skb_add_backlog()
or
wake_up_interruptible()
(Wake for Poll wait)
sk_wake_async()
(Wake for Asynchronous wait)
24-32
24-25
849-863 (No-Wake)
1748-1780 (Async Wake)
(for netif_receive_skb())
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 33
UDP Process Timing Range for Nehalem with 1 Socketed CPU
Figure 28: Nehalem, 1 CPU, SMP Socket & UDP TX Processing above IP Stack
Application{}
sock_alloc_send_skb()
ip_local_out()
(Delivers to Protocol Stack)
udp_push_pending_frames ()
141-152
276-339
56-114
101-117
289-345 (size 1)
284-350 (size 1400)
28-45
381-392
111-120
sendto()
recvfrom()
ip_append_data()
ip_push_pending_frames ()
176-249 (size 1)
176-243 (size 1400)
179-257 (size 1)
543-627 (size 1400)
3150-3498 (size 1)
3752-4137 (size 1400)
24-51
71-78
1970-2067 (size 1)
1994-2102 (size 1400)
sock_sendmsg()
sock->ops->sendmsg()
udp_sendmsg()
__sock_sendmsg()
273-303
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 34
Figure 29: Nehalem, 1 CPU, SMP UDP TX Processing Below IP Stack
__dev_xmit_skb()
ixgbe_tx_queue()
ip_local_out()
(Delivers to Protocol Stack)
288-419
884-926
60-76
1143-1312
62-81
33
dev_hard_start_xmit()
+dev()
dev_queue_xmit()
25-27
51-63
1970-2067 (size 1)
1994-2102 (size 1400)
sch_direct_xmit()
op->ndo_start_xmit()
ixgbe_xmit_frame()
33-65
33
160-191
33
24-25
49-84
296-343
35-42
273-303
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 35
Figure 30: Nehalem, 1 CPU, SMP Socket & UDP RX Processing above IP Stack
Application{}
sock->ops->recvmsg()
udp_recvmsg()
wait_for_packet()
47-138
24
24-78
289-345 (size 1)
284-350 (size 1400)
94-162
374-383
sendto()
recvfrom()
__skb_recv_datagram()
176-249 (size 1)
176-243 (size 1400)
197-239 (size 1)
463-765 (size 1400)
816
sock_recvmsg()
__sock_recvmsg()
__sock_recvmsg_nosec()
skb_free_datagram_locked()
De-Q Loop
106
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 36
Figure 31: Nehalem, 1 CPU, SMP IXGBE Interrupt Driven UDP RX Processing Below IP Stack
ixgbe_msix_clean_many()
ixgbe_clean_tx_irq()
net_rx_action()
(napi[] by n->poll())
napi_gro_receive()
__netif_receive_skb()
(Delivers to Protocol Stack)
Interrupt Based IXGBE 84 Network Packet Processing
202-203
98
136-137
178-181
3465-3471
(Async Wake)
1336-1367 to
Dequeue
Note: RPS not shown
841-853
HW ISR SW ISR __do_softirq()
ixgbe_clean_rxtx_many()
ixgbe_clean_rx_irq()
ixgbe_receive_skb()
__napi_gro_receive()
dev_gro_receive()
napi_skb_finish()
netif_receive_skb()
732-765
69-78
136-137
24
45-46
27
24-25
47-50
49-52
648-687
72-75
24
RX Packet DeQed
145-147
587-604
2235-2254
(Async Wake) 24
1737-1759
(Async-Wake)
ixgbe_alloc_rx_buff
ers ()
UDP Flow Charts for linux-2.6.36.2 and different CPUs
Page 37
Figure 32: Nehalem, 1 CPU, SMP RX Protocol Stack Processing of UDP above IP Stack
__netif_receive_skb()
pt_prev->func()
Protocol Stack
Processing
__udp4_lib_rcv()
Queue SKB
In Socket Queue
393-563
25-33
70-134
134-193
56-57 (No-Wake)
826-848 (Async Wake)
37-52
101-103
24-26
udp_queue_rcv_skb()
ip_queue_rcv_skb()
sock_queue_rcv_skb()
sk->sk_data_ready()
(sock_def_readable())
__udp_queue_rcv_skb() skb_add_backlog()
or
wake_up_interruptible()
(Wake for Poll wait)
sk_wake_async()
(Wake for Asynchronous wait)
24-30
24-25
752-766 (No-Wake)
1737-1759 (Async Wake)
(for netif_receive_skb())