tibco, hp and mellanox high performance extreme low ... · executive summary: with the recent...

19
TIBCO Hig Extreme L O, HP and Mellanox gh Performance Low Latency Messag ging

Upload: vudiep

Post on 23-Apr-2018

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: TIBCO, HP and Mellanox High Performance Extreme Low ... · Executive Summary: With the recent release of TIBCO FTL TM, TIBCO is once again changing the game when it comes to providing

TIBCO, HP and Mellanox

High Performance

Extreme Low Latency

TIBCO, HP and Mellanox

High Performance

Extreme Low Latency Messaging

Messaging

Page 2: TIBCO, HP and Mellanox High Performance Extreme Low ... · Executive Summary: With the recent release of TIBCO FTL TM, TIBCO is once again changing the game when it comes to providing

Executive Summary: With the recent release of TIBCO FTL TM, TIBCO is once again changing the game when it comes to providing high performance messaging middleware. Many solutions have emerged to try and provide next generation systems with extreme low latency but they are doing this by sacrificing the traditional features and functions that mission critical middleware solutions require. TIBCO’s approach is to offer a middleware solution that offers extreme low latency without sacrifice, allowing for the scalability not only to meet the demands for low latency data distribution but also to meet the demands as the application grows from a few instances to thousands of instances. In this report with the assistance of HP and Mellanox/Voltaire, TIBCO provides benchmarks using TIBCO FTL 1.0 across a number of physical transports to show the latency, average latency for a given transport. In addition TIBCO is showing how variable message size has minimal impact on the latency metrics depending on what transport is being used. The goal of this report is not to show all the different use case, as there are other benchmarks that provide these types of reports. For this report TIBCO, HP and Mellanox/Voltaire wanted to show the performance benefits of the end-to-end solution and give a general overview of how infrastructure and data distribution decisions can impact overall latency. The final results of these test show that in all categories using TIBCO FTL, HP DL380 G7 systems and network equipment for Mellanox/Voltaire, customers can get the lowest latency data distribution. For more information about testing methodology and system configuration please contact TIBCO Software at http://forms2.tibco.com/contactus_us

Page 3: TIBCO, HP and Mellanox High Performance Extreme Low ... · Executive Summary: With the recent release of TIBCO FTL TM, TIBCO is once again changing the game when it comes to providing

Test Setup: The purpose of these tests is to show the relative latency performance of a given architectural setup and to show comparisons in latency for a given distribution transport over a number of varying message sizes. For these tests TIBCO, HP and Voltaire had a simple setup of 2 DL380 G7 Server Machines each having 2 Intel Xeon X5687 processors (4 cores each) and at least 48 Gigs of Memory per machine operating at 1333 MHz. The DL380s have RHEL 5.5, OFED 1.5.2, BIOS Release Date 01/30/2011Firmware Version: 1.16.

Providing the network layer connectivity the 2 DL380 G7 systems were connect by a Voltaire 10 Gigabit Ethernet Switch (Vantage 6024) and by a Voltaire QDR Infiniband Switch (Voltaire 4036). NIC Interfaces: Mellanox ConnectXMellanox Technologies MT26428 Firmware 2.8.000 10GbE Mellanox/Voltaire switch is model “Vantage 6024.” Mellanox Technologies MT26448 firmware 2.8.000.

The purpose of these tests is to show the relative latency performance of a given architectural setup and to show comparisons in latency for a given distribution transport over a number of varying message sizes. For these tests TIBCO, HP and Voltaire had a simple setup of 2 DL380 G7 Server Machines each having 2 Intel Xeon X5687 processors (4 cores each) and at least 48 Gigs of Memory per machine operating at

The DL380s have RHEL 5.5, OFED 1.5.2, BIOS Release Date 01/30/2011

Figure 1 (System Setup)

Providing the network layer connectivity the 2 DL380 G7 systems were connect by a Voltaire 10 Gigabit Ethernet Switch (Vantage 6024) and by a Voltaire QDR Infiniband

NIC Interfaces: Mellanox ConnectX-2 InfiniBand/10GbE PCIe Adapters: Mellanox Technologies MT26428 Firmware 2.8.000

10GbE Mellanox/Voltaire switch is model “Vantage 6024.” Mellanox Technologies

The purpose of these tests is to show the relative latency performance of a given architectural setup and to show comparisons in latency for a given distribution transport over a number of varying message sizes. For these tests TIBCO, HP and Voltaire had a simple setup of 2 DL380 G7 Server Machines each having 2 Intel Xeon X5687 processors (4 cores each) and at least 48 Gigs of Memory per machine operating at

The DL380s have RHEL 5.5, OFED 1.5.2, BIOS Release Date 01/30/2011 and iLO

Providing the network layer connectivity the 2 DL380 G7 systems were connect by a Voltaire 10 Gigabit Ethernet Switch (Vantage 6024) and by a Voltaire QDR Infiniband

10GbE Mellanox/Voltaire switch is model “Vantage 6024.” Mellanox Technologies

Page 4: TIBCO, HP and Mellanox High Performance Extreme Low ... · Executive Summary: With the recent release of TIBCO FTL TM, TIBCO is once again changing the game when it comes to providing

BIOS Settings:

BIOS parameter Value

Hyperthreading Disabled

HP_Power_Regulator HP_Static_High_Performance_Mode

CPU_Virtualization Disabled

Intel_Processor_Turbo_Mode Disabled

Intel_VT-d2 Disabled

HP_Power_Profile Maximum_Performance

Intel_QPI_Link_Power_Management Disabled

Intel_Minimum_Processor_Idle_Power_State No_C-States

Intel_Hyperthreading Disabled

Collaborative_Power_Control Disabled

Intel_Turbo_Boost_Optimization Optimized_for_Performance

PowerMonitoring Disabled

DisableMemoryPrefailureNotification Yes

All tests used where conducted using the sample latency tools provided with TIBCO FTL 1.0 the two sample programs used where the C implementations of tibping and tibpong. The Shared Memory tests were all run on the DL380 G7 system that had 96 Gigs of Memory. All tests using TCP and Reliable Multicast where tested over the 10 Gigabit Ethernet Switch. RDMA tests where conducted using both the 10 Gigabit Ethernet switch and the QDR Infiniband switch. The 2 DL380 G7 systems used Mellanox Connect X-2 10 Gig Ethernet/Infiniband adapters for interconnectivity.

Page 5: TIBCO, HP and Mellanox High Performance Extreme Low ... · Executive Summary: With the recent release of TIBCO FTL TM, TIBCO is once again changing the game when it comes to providing

Test Results: Below are the individual results for each unique transport. Transports tested were Shared Memory, RDMA over Infiniband, RDMA over 10 Gigabit Ethernet, TCP over 10 Gigabit Ethernet and Reliable Multicast over 10 Gigabit Ethernet. In these tests we did not include tests that used technology like Voltaire VMA kernel bypass for transports like TCP and Reliable Multicast as we wanted to show the raw performance of TIBCO FTL operating on the native transport with our assistance.

Shared Memory Transport:

Variable Message Size Latency for Shared Memory Transport

Message Test Total Average One-Way Avg. One-Way

Size No. Time Total Time Latency Latency

(Bytes) (Seconds) (Seconds) (Nano Seconds) (Nano Seconds)

16

1 142.52

142.64

356.3

356.60 2 142.74 356.86

3 142.66 356.65

32

1 142.87

143.13

357.18

357.83 2 142.49 356.21

3 144.03 360.09

64

1 146.38

145.48

365.96

363.70 2 144.95 362.38

3 145.1 362.76

128

1 153.25

153.44

383.14

383.62 2 152.64 381.61

3 154.44 386.11

256

1 162.78

162.41

406.94

406.02 2 161.91 404.78

3 162.54 406.35

512

1 182.31

182.37 455.77

455.91 2 182.68 456.69

3 182.11 455.27

1024

1 230.06

229.92

575.15

574.81 2 229.83 574.57

3 229.88 574.7

Page 6: TIBCO, HP and Mellanox High Performance Extreme Low ... · Executive Summary: With the recent release of TIBCO FTL TM, TIBCO is once again changing the game when it comes to providing

The Shared Memory Transport for TIBCO FTL allows for extremely high performance ultra low latency message distribution for components that are operating in a single host environment. With TIBCO FTL’s multi-transport send functionality these components can send a message once and have the message delivered to local components via shared memory and distributed components via a network transport like RDMA, TCP or Reliable UDP.

0

250

500

750

16 32 64 128 256 512 1024

Latency in Nanoseconds

Message Size in Bytes

Latency for Shared Memory Transport with Variable Message Size

Page 7: TIBCO, HP and Mellanox High Performance Extreme Low ... · Executive Summary: With the recent release of TIBCO FTL TM, TIBCO is once again changing the game when it comes to providing

RDMA Transport over Infiniband:

Variable Message Size Latency for RDMA Transport over Infiniband

Message Test Total Average One-Way Avg. One-Way

Size No. Time Total Time Latency Latency

(Bytes) (Seconds) (Seconds) (Micro Seconds) (Micro Seconds)

16

1 130.31

130.07

2.17

2.17 2 129.52 2.16

3 130.38 2.17

32

1 194.74

194.90

3.25

3.25 2 195.35 3.26

3 194.6 3.24

64

1 201.47

201.32

3.36

3.36 2 201.01 3.35

3 201.48 3.36

128

1 206.47

206.45

3.44

3.44 2 206.53 3.44

3 206.35 3.44

256

1 215.29

215.19

3.59

3.59 2 214.99 3.58

3 215.28 3.59

512

1 232.91

232.80

3.88

3.88 2 232.82 3.88

3 232.67 3.88

1024

1 266.08

266.18

4.43

4.44 2 266.36 4.44

3 266.11 4.44

Page 8: TIBCO, HP and Mellanox High Performance Extreme Low ... · Executive Summary: With the recent release of TIBCO FTL TM, TIBCO is once again changing the game when it comes to providing

For network infrastructures that support RDMA, RDMA provides the lowest latency data distribution of any of the network transports available. Some latency gains can be had by using Infiniband over 10 Gigabit Ethernet and comparisons between these to physical distribution layers is available later in this document.

0

1875

3750

5625

16 32 64 128 256 512 1024

Latency in Nanoseconds

Message Size in Bytes

Latency for RDMA (Infiniband) Transport with Varible Message Size

Page 9: TIBCO, HP and Mellanox High Performance Extreme Low ... · Executive Summary: With the recent release of TIBCO FTL TM, TIBCO is once again changing the game when it comes to providing

RDMA Transport over 10 Gigabit Ethernet (RoCe):

Variable Message Size Latency for RDMA Transport over 10 Gigabit Ethernet

Message Test Total Average One-Way Avg. One-Way

Size No. Time Total Time Latency Latency

(Bytes) (Seconds) (Seconds) (Micro

Seconds) (Micro Seconds)

16

1 170.3

170.53

2.84

2.84 2 170.73 2.85

3 170.55 2.84

32

1 234.5

234.75

3.91

3.91 2 235.23 3.92

3 234.53 3.91

64

1 242.99

243.18

4.05

4.05 2 243.17 4.05

3 243.38 4.06

128

1 251.39

251.32

4.19

4.19 2 251.33 4.19

3 251.24 4.19

256

1 264.42

264.96

4.41

4.42 2 265.36 4.42

3 265.09 4.42

512

1 291.58

291.08

4.86

4.85 2 290.83 4.85

3 290.83 4.85

1024

1 358.26

358.25

5.97

5.97 2 358.09 5.97

3 358.4 5.97

Page 10: TIBCO, HP and Mellanox High Performance Extreme Low ... · Executive Summary: With the recent release of TIBCO FTL TM, TIBCO is once again changing the game when it comes to providing

While Infiniband provides some minor (~1 microsecond) latency gains over 10 Gigabit Ethernet, Infiniband is not as pervasive as ethernet based deployments. Because of this, Mellanox’s 10 Gigabit Ethernet support using RoCe allows for all the benefits of using RDMA over an existing 10 Gigabit infrastructure.

0

1875

3750

5625

7500

16 32 64 128 256 512 1024

Latency in Nanoseconds

Message Size in Bytes

Latency for RDMA (10 GigE) Transport with Varible Message Size

Page 11: TIBCO, HP and Mellanox High Performance Extreme Low ... · Executive Summary: With the recent release of TIBCO FTL TM, TIBCO is once again changing the game when it comes to providing

TCP Transport over 10 Gigabit Ethernet:

Variable Message Size Latency for TCP Transport over 10 Gigabit Ethernet

Message Test Total Average One-Way Avg. One-Way

Size No. Time Total Time Latency Latency

(Bytes) (Seconds) (Seconds) (Micro Seconds) (Micro Seconds)

16

1 119.22

119.40

9.94

9.95 2 119.73 9.98

3 119.24 9.94

32

1 119.9

119.73

9.99

9.98 2 119.07 9.92

3 120.23 10.02

64

1 121.55

121.71

10.13

10.14 2 121.27 10.11

3 122.31 10.19

128

1 122.99

124.03

10.25

10.34 2 124.06 10.34

3 125.04 10.42

256

1 130.92

130.77

10.91

10.90 2 130.94 10.91

3 130.45 10.87

512

1 138.54

138.51 11.55

11.54 2 138.59 11.55

3 138.4 11.53

1024

1 149.88

149.83

12.49

12.49 2 150.45 12.54

3 149.17 12.43

Page 12: TIBCO, HP and Mellanox High Performance Extreme Low ... · Executive Summary: With the recent release of TIBCO FTL TM, TIBCO is once again changing the game when it comes to providing

If latency is a significant priority RDMA over either 10 Gigabit Ethernet or Infiniband is clearly a superior choice for data distribution, however many applications still need to distribute data to application endpoints that either don’t have RDMA support or don’t require the extreme low latency that RDMA can provide. For these applications TIBCO FTL’s TCP transport can provide low latency distribution without having to support new networking paradigms.

5000

7500

10000

12500

15000

16 32 64 128 256 512 1024

Latency in Nanoseconds

Message Size in Bytes

Latency for TCP (10 GigE) Transport with Varible Message Size

Page 13: TIBCO, HP and Mellanox High Performance Extreme Low ... · Executive Summary: With the recent release of TIBCO FTL TM, TIBCO is once again changing the game when it comes to providing

Reliable UDP Transport over 10 Gigabit Ethernet:

Variable Message Size Latency for Reliable Multicast Transport over 10 Gigabit Ethernet

Message Test Total Average One-Way Avg. One-Way

Size No. Time Total Time Latency Latency

(Bytes) (Seconds) (Seconds) (Micro

Seconds) (Micro

Seconds)

16

1 119.08

118.79

10.83

10.80 2 118.94 10.81

3 118.35 10.76

32

1 120.68

120.27

10.97

10.93 2 120.31 10.94

3 119.81 10.89

64

1 119.8

119.46

10.89

10.86 2 119.9 10.9

3 118.68 10.79

128

1 121.28

121.47

11.03

11.05 2 121.73 11.07

3 121.39 11.04

256

1 121.03

120.96

11

10.99 2 121.27 11.02

3 120.58 10.96

512

1 127.88

127.08

11.63

11.55 2 126.65 11.51

3 126.7 11.52

1024

1 137.7

137.28

12.52

12.48 2 137.36 12.49

3 136.78 12.43

Page 14: TIBCO, HP and Mellanox High Performance Extreme Low ... · Executive Summary: With the recent release of TIBCO FTL TM, TIBCO is once again changing the game when it comes to providing

Even though the race to extremely low latency is encouraging the adoption of new network distribution technology, there is still a requirement to provide a extremely scalable low latency distribution pattern for high fanout situations. TIBCO FTL’s Reliable UDP transport allows for applications that require high speed message distribution to multiple nodes within the infrastructure.

5000

7500

10000

12500

15000

16 32 64 128 256 512 1024

Latency in Nanoseconds

Message Size in Bytes

Latency for Reliable Multicast (10 GigE) Transport with Variable Message Size

Page 15: TIBCO, HP and Mellanox High Performance Extreme Low ... · Executive Summary: With the recent release of TIBCO FTL TM, TIBCO is once again changing the game when it comes to providing

Transport Comparisons: TIBCO FTL provides the flexibility to dynamically change what transport is being used by the application without requiring any code changes. This provides the application administrator a simplified model for adopting new data distribution paradigms as they are introduced into the application environment. Because of this flexibility it becomes necessary to evaluate what benefit a given transport has over another transport. In addition to individual transport test results reported above a number of comparison results can be provided with regards to performance and benefits for a given solution. Below are comparisons for RDMA over Infiniband and 10 Gigabit Ethernet, comparisons for TCP and Reliable Multicast and finally a comparison chart showing the latency for each transport.

RDMA over Infiniband and 10 Gigabit with Variable Message Size

Message Size 16 32 64 128 256 512 1024

Infiniband Latency (Nanoseconds)

2170 3250 3360 3440 3590 3880 4440

10 Gig Latency (Nanoseconds)

2840 3910 4050 4190 4420 4850 5970

0

1875

3750

5625

7500

16 32 64 128 256 512 1024

Latency in Nanoseconds

Message Size in Bytes

RDMA Transport Infiniband versus 10 GigE

RDMA over Infiniband RDMA over 10 GigE

Page 16: TIBCO, HP and Mellanox High Performance Extreme Low ... · Executive Summary: With the recent release of TIBCO FTL TM, TIBCO is once again changing the game when it comes to providing

TCP versus Reliable Multicast with Variable Message Size

Message Size 16 32 64 128 256 512 1024

TCP Latency (Nanoseconds)

9950 9980 10140 10340 10900 11540 12490

Multicast Latency (Nanoseconds)

10800 10930 10860 11050 10990 11550 12480

Latency Comparison between All Transports with Variable Message Size

0

3750

7500

11250

15000

16 32 64 128 256 512 1024

Latency in Nanoseconds

Message Size

TCP Transport versus Reliable Multicast over 10 GigE

TCP over 10 GigE Reliable Multicast over 10 GigE

Page 17: TIBCO, HP and Mellanox High Performance Extreme Low ... · Executive Summary: With the recent release of TIBCO FTL TM, TIBCO is once again changing the game when it comes to providing

Latency Comparison between All Transports with Variable Message Size

Message Size 16 32 64 128 256 512 1024

Shared Memory Latency

(Nanoseconds) 357 358 364 384 406 456 575

Infiniband Latency (Nanoseconds)

2170 3250 3360 3440 3590 3880 4440

10 Gig Latency (Nanoseconds)

2840 3910 4050 4190 4420 4850 5970

TCP Latency (Nanoseconds)

9950 9980 10140 10340 10900 11540 12490

Multicast Latency (Nanoseconds)

10800 10930 10860 11050 10990 11550 12480

0

7500

15000

16 32 64 128 256 512 1024

Latency (Nanoseconds)

Message Size

Shared Memory RDMA over Infiniband

RDMA over 10 Gig TCP

Page 18: TIBCO, HP and Mellanox High Performance Extreme Low ... · Executive Summary: With the recent release of TIBCO FTL TM, TIBCO is once again changing the game when it comes to providing

Conclusions:

The Shared Memory Transport for TIBCO FTL allows for extremely high performance ultra low latency message distribution for components that are operating in a single host environment. The average one way latency for Shared memory transport is 356.60 nano seconds for 16 Bytes message.

For network infrastructures that support RDMA, RDMA provides the lowest latency data distribution of any of the network transports available. While Infiniband provides some minor (~1 microsecond) latency gains over 10 Gigabit Ethernet.

If latency is a significant priority RDMA over either 10 Gigabit Ethernet or Infiniband is clearly a superior choice for data distribution, however many applications still need to distribute data to application endpoints that either don’t have RDMA support or don’t require the extreme low latency that RDMA can provide. TIBCO FTL’s TCP transport can provide low latency distribution without having to support new networking paradigms.

Even though the race to extremely low latency is encouraging the adoption of new network distribution technology, there is still a requirement to provide a extremely scalable low latency distribution pattern for high fanout situations. TIBCO FTL’s Reliable UDP transport allows for applications that require high speed message distribution to multiple nodes within the infrastructure.

With TIBCO FTL’s multi-transport send functionality these components can send a message once and have the message delivered to local components via shared memory and distributed components via a network transport like RDMA, TCP or Reliable UDP.

Another set of tests were performed to determine FTL 1.0 performance improvements on changing processor clock speed. Using the above test environment, testing was done using Intel X5680, 3.33 GHz processors.

Message Average One-Way Latency comparison for X5680 (3.33 GHz) and X5687 (3.60 GHz)

Size Shared Memory RDMA IB RDMA 10GbE TCP MCAST

(Nano Seconds) (Micro Seconds) (Micro Seconds) (Micro Seconds) (Micro Seconds)

(Bytes)

3.33

GHz

3.60

GHz

3.33

GHz

3.60

GHz

3.33

GHz

3.60

GHz

3.33

GHz

3.60

GHz

3.33

GHz

3.60

GHz

16 385.5 356.6 2.24 2.17 2.92 2.84 10.56 9.95 10.67 10.8

32 390.28 357.83 3.35 3.25 4.02 3.91 10.6 9.98 10.78 10.93

64 396.26 363.7 3.48 3.36 4.18 4.05 10.7 10.14 10.62 10.86

128 411.42 383.62 3.59 3.44 4.34 4.19 10.96 10.34 11.17 11.05

256 435.36 406.02 3.79 3.59 4.63 4.42 11.6 10.9 10.81 10.99

512 492.3 455.91 4.19 3.88 5.18 4.85 12.35 11.54 11.75 11.55

1024 621.23 574.81 5.05 4.44 6.51 5.97 13.61 12.49 13.1 12.48

Page 19: TIBCO, HP and Mellanox High Performance Extreme Low ... · Executive Summary: With the recent release of TIBCO FTL TM, TIBCO is once again changing the game when it comes to providing

As indicated in table above, for shared memory transport average performance improvement for 3.60 GHz processor is about 7.5% compared to 3.33 GHz processor. Similarly noticeable improvements were noticed for remaining transports like RDMA over IB (~ 5.5 %), RDMA over 10GbE (~ 4.5 %), TCP (~ 6 %). As newer processors with higher clock speeds are introduced in markets in future, FTL performance improvement is expected.