scaling apache* web server performance on intel®-based ...€¦ · in apache server. prior to...

Scaling the Performance of Apache* Web Server on Intel®-Based Platforms

Today’s web server core counts are dramatically increasing in

data centers, allowing processing platforms to support large

numbers of simultaneous processes/threads of Apache* Web

Server. However, the default 1:1 socket to port assignment of

Linux* prior to version 3.9 limits the ability of Apache to effec-

tively use increasing core counts and scale to handle large

amounts of traffic. The introduction of the SO_REUSEPORT

option in Linux kernel 3.9 allows multiple listening sockets to

bind to a single port on a host. Using SO_REUSEPORT, a now

available patch for Apache, developed by Intel engineers, can

scale performance from nearly 2X to nearly 4X throughput on

large, multi-core Intel®-based platforms. This paper looks at

the new patch and performance testing results.

Authors: Yingqi Lu System Technology Optimization, Intel Corporation [email protected]

Rob Shiveley Open Source Software Marketing, Intel Corporation [email protected]

Jeff Ruby Datacenter Software Planning and Strategy, Intel Corporation [email protected]

White PaperApache* Web Server Open Source Software Solutions

Table of ContentsLinux,* Sockets, and Scalability .......... 2

Linux* Improves Scalability with SO_REUSEPORT ......................................3

Enabling Apache* Scalability ............... 3

What the Patch Does .............................3

Dynamic Workload Testing .................. 3

Up to 1.9X Performance Gains on Apache Pre-Fork MPM .........................4

Up to 1.7X Improvement on Threaded MPMs ......................................4

Static Workload Testing— Up to 3.9X Improvement ....................... 8

Conclusion ................................................ 9

Appendix A – Processor Utilization and Frequency Measurements (Figures 2a - 7b) ............................... 10-15

Appendix B – Testing Configuration ............................16

Appendix C – Acknowledgements ...............................16

List of FiguresFigure 1. Workload Performance Testing Configuration ...................................4

Figure 2. Pre-Fork MPM Patch Results, Single Listen Statement .............5

Figure 3. Pre-Fork MPM Patch Results, Two Listen Statements ...............5

Figure 4. Event MPM Patch Results, Single Listen Statement .............6

Figure 5. Event MPM Patch Results, Two Listen Statements ................................6

Figure 6. Worker MPM Patch Results, Single Listen Statement .............7

Figure 7. Worker MPM Patch Results, Two Listen Statements ...............7

Figure 8. Apache Bench Static Workload Results – Red Hat Enterprise Linux ..............................................8

Figure 9. Apache Bench Static Workload Results – Ubuntu .......................9

Linux,* Sockets, and Scalability

A recent addition to the Linux* version 3.9 kernel enables significant scalability in Apache server. Prior to Linux* kernel 3.9, Linux could only bind a single socket to a single port for server applications. The default configuration for Apache* Web Server sets up this binding with Listen statements (Listen IP:port). Using multiple Listen IP:port combinations, the default kernel binds one listening socket per combination, allowing Apache to lightly scale to handle additional incoming traffic.

# One Listen statement per IP:port# binds to one listen socket each.# The following configuration supports # 2 network interfaces and 4 sockets.#Listen 192.168.1.10:80 Listen 192.168.1.12:80Listen 192.168.1.10:443Listen 192.168.1.12:443

Apache has several Multi-process modules (MPMs), which are designed to scale moderately well to a small number of server cores.

• The Pre-Fork MPM is non-threaded; it launches multiple httpd processes according to directives in the Apache configuration files. When only a single listen statement is used, Apache relies on the kernel to manage request accepts (non-serialized). With multiple listen statements, Apache uses a single mutex to manage request accepts.

• The Event MPM is threaded; it scales with additional listen statements for a limited number of cores.

• The Worker MPM is also threaded; it manages requests similar to the Pre-Fork MPM for single and multiple listen statements.

With the default Apache configuration, when an idle web server process listening on a socket accepts a request, the main listening process relies on the kernel or uses a mutex (when a mutex is used) to lock further access to that socket, blocking additional requests until the active request is handled and the lock is released. For large multi-core servers with vast hardware resources, the block and single listening socket create a bottleneck.

2

Scaling the Performance of Apache* Web Server on Intel®-based Platforms

Linux* Improves Scalability with SO_REUSEPORT Today’s server processors can offer up to 18 cores/36 threads (and more) per processor socket. With this amount of processing power, dual-socket web servers can support a larger number of simultaneous httpd server processes or threads. But, with limited listening sockets available and the kernel or a mutex locking access, server resources go unused, user request response times suffer, and, throughput does not easily scale.

In Linux kernel 3.9, a new socket option, called SO_REUSEPORT, was added to the code. The new option allows binding multiple sockets on a host to the same port, allowing software to utilize hardware resources and enabling greater scalability and performance for multi-threaded/multi-process server applications running on large multi-core systems.

SO_REUSEPORT must be set up on each server using it, so an application can take advantage of multi- socket binding to a port. More details are provided at http://lwn.net/Articles/542629/.

Enabling Apache* Scalability

To take advantage of SO_REUSEPORT, Intel developers, with the help of the Apache community, created, tested, and committed a patch for Apache that improves through-put performance from up to 1.9X to nearly 4X. The patch was accepted into the trunk source code; it is currently available at apache.org. The patch is under consideration to become part of an upcoming release.

What the Patch DoesWhen installed with Apache, the patch intelligently and automatically sets up multiple listening sockets per IP:port combination based on the number of threads available to the Apache server. Multiple httpd servers will be assigned to a “bucket,” to which a listen socket will also be assigned. The patch also creates multiple mutexes to manage requests for multiple buckets, thus enabling more granular scalability. The patch completes the following tasks:

1. Checks for support of the SO_REUSEPORT option in the Linux kernel.

2. Checks the number of active threads on the server running Apache.

3. Based on the number of active threads, internally calculates the number of listen sockets to assign as

listen_sockets = total_number_active_threads/ ListenCoresBucketsRatio

where: ListenCoresBucketsRatio is a configurable value

4. The patch makes sure each bucket has at least one httpd child process (server) and one listening socket assigned to it. If there are not enough httpd startup serv-ers defined in the CONF file, the patch will take care of it.

5. Assigns a listening socket to a bucket. Where a mutex is used, the patch divides the single mutex into multiple mutexes and assigns a mutex to each bucket. This avoids a “thundering herd” event, while enabling granular scalability.

6. Assigns a number of httpd servers to each bucket.

7. Launches the servers.

As Apache receives requests, the kernel maintains the balance of requests within each bucket, so that servers are not blocked in one bucket while another bucket has an available socket.

In summary, for large server core counts, the patch gives Apache more granular control over request accepts than is available with the default configuration. This allows Apache to take advantage of idle resources within the platform. The patch is designed to increase scalability, while avoiding a thundering herd event.

To evaluate the patch’s impact on performance, the development team used two methodologies: a dynamic workload and a static workload.

Dynamic Workload Testing

Dynamic testing was completed with an open source three-tier webserver workload using Apache, PHP,* Memchached*, and MySQL,* running on a large multi-core web server system (Figure 1).

The workload simulates an open source, highly concurrent social event application running on a complex infrastructure using Web 2.0 technology. The test system configuration is listed in Appendix B.

During the tests, response time (throughput for scalability), processor utilization, and operating frequency were collected as requests were increased until the system became saturat-ed, causing the performance to degrade dramatically. Tests were conducted for single listen statement and two listen statements. Throughput results are summarized below, while processor utilization and operating frequency measurements are presented in Appendix A.

3


http://lwn.net/Articles/542629/.

http://www.apache.org

SLAVE CLIENT

1

SLAVE CLIENT

2

MASTER CLIENT

MASTER DB

SLAVEDB 1

SLAVEDB 2

SLAVEDB 3

SLAVEDB 4

SLAVEDB 5

SLAVEDB 6

MEMCACHEDSERVER

WEBSERVER

10Gb SWITCHSLAVE CLIENT

3

SLAVE CLIENT

4

SLAVE CLIENT

N

Up to 1.9X Performance Gains on Apache Pre-Fork MPMAdding the patch to Apache nearly doubles the throughput on the Pre-fork MPM with two listen statements. With a single listen statement, throughput scales to 1.7X.

Figure 2 charts the throughput results for a single listen statement on the Pre-fork MPM, showing the scalability with the patch. (Processor utilization and frequency measure-ments are presented in Appendix A.) The sudden rise in response time shows when the resources become saturated, and the servers cannot handle more responses. With the single listen statement, the new patch increased throughput up to 1.7X from around 13,000 to around 23,000 responses.

Simply by installing the trunk version of Apache with the integrated patch, companies can service more customers on a given configuration, or significantly improve IT savings by reducing the number of servers required in the data center to support a given load.

Figure 1. Workload Performance Testing Configuration

With two listen statements, the system scales even further to about 1.9X before saturation begins to drive up response time (Figure 3).

Up to 1.7X Improvement on Threaded MPMsApache’s Pre-fork MPM is a multi-process (non-threaded) module; multiple httpd server processes are launched at startup. Apache’s Event and Worker MPMs are already threaded, but the new patch can offer performance improve-ments here, too. With the Event MPM and a single listen statement, throughput scales up to 1.4X using the patch, due to multiple listening sockets now being available (Figure 4).

4


120,000

THROUGHPUT

with SO_REUSEPORTwithout SO_REUSEPORT


RESP

ON

SE T

IME

(ms)

100,000

80,000

60,000

40,000

20,000

00

0

0

5,000 10,000 15,000 20,000 25,000

Pre-Fork MPM, Response Time, Single Listen Statement

1.7x

120.00

THROUGHPUT

UTI

LIZA

TIO

N (%

)

100.00

80.00

60.00

40.00

20.00

0

5,000 10,000 15,000 20,000 25,000

Pre-Fork MPM, Utilization, Single Listen Statement


3.00

THROUGHPUT

FREQ

UEN

CY

(GH

z)

2.50

2.00

1.50

1.00

0.50

0.00

5,000 10,000 15,000 20,000 25,000

Pre-Fork MPM, Operating Frequency, Single Listen Statement

Figure 2. Pre-Fork MPM, Patch Throughput Results, Single Listen Statement

120,000

THROUGHPUT



RESP

ON

SE T

IME

(ms)

100,000

80,000

60,000

40,000

20,000

0

5,000 10,000 15,000 20,000 25,000

Pre-Fork MPM, Response Time, Two Listen Statements

1.9x

THROUGHPUT

UTI

LIZA

TIO

N (%

)

100.00

80.0

90.0

60.0

70.0

40.0

50.0

20.0

30.0

10.0

0.05,0000

0

0

10,000 15,000 20,000 25,000

Pre-Fork MPM, Utilization, Two Listen Statements


THROUGHPUT

OPE

RAT

ING

FRE

QU

ENC

Y (G

Hz)

2.50

2.00

1.50

1.00

0.50

.00

5,000 10,000 15,000 20,000 25,000

Pre-Fork MPM, Operating Frequency, Two Listen Statements

Figure 3. Pre-Fork MPM, Patch Throughput Results, Two Listen Statements

5


The two listen statement case, however, does not produce the same results (Figure 5). This is because the Event MPM is designed to already scale well with two listen statements, more listening sockets, and more cores. The Event MPM is already taking advantage of the available resources. However, as server core counts will undoubtedly increase in the future, the patch is expected to improve performance beyond the default implementation, because it will enable even more listening sockets.

120,000

THROUGHPUT



RESP

ON

SE T

IME

(ms)

100,000

80,000

60,000

40,000

20,000

00

0

0

5,000 10,000 15,000 20,000 25,000

Event MPM, Response Time, Single Listen Statement

1.4x

120.00

THROUGHPUT

UTI

LIZA

TIO

N (%

)

100.00

80.00

60.00

40.00

20.00

0

5,000 10,000 15,000 20,000 25,000

Event MPM, Utilization, Single Listen Statement


3.00

THROUGHPUT


OPE

RAT

ING

FRE

QU

ENC

Y (G

Hz)

2.50

2.00

1.50

1.00

0.50

0.00

5,000 10,000 15,000 20,000 25,000

Event MPM, Operating Frequency, Single Listen Statement

120,000

THROUGHPUT



RESP

ON

SE T

IME

(ms)

100,000

80,000

60,000

40,000

20,000

00

0

0

5,000 10,000 15,000 20,000 25,000

Event MPM, Response Time, Two Listen Statements

120.00

THROUGHPUT

UTI

LIZA

TIO

N (%

)

100.00

80.00

60.00

40.00

20.00

0

5,000 10,000 15,000 20,000 25,000

Event MPM, Utilization, Two Listen Statements


3.00

THROUGHPUT

OPE

RAT

ING

FRE

QU

ENC

Y (G

Hz)

2.50

2.00

1.50

1.00

0.50

0.00

5,000 10,000 15,000 20,000 25,000

Event MPM, Operating Frequency, Two Listen Statements

Figure 4. Event MPM, Patch Throughput Results, Single Listen Statement

Figure 5. Event MPM, Patch Throughput Results, Two Listen Statements

6


The multi-threaded Worker MPM does show increases in throughput similar to the non-threaded Pre-fork MPM for both single (Figure 6) and two listen statements (Figure 7). As already mentioned, the Pre-Fork and Worker MPMs manage request accepts in a similar manner, and the patch uses multiple mutexes to enable scalability.

The patch on a single listen statement configuration improves throughput by up to 1.3X, while the two listen statement case boosts throughput to nearly 1.7X.

120,000

THROUGHPUT



RESP

ON

SE T

IME

(ms)

100,000

80,000

60,000

40,000

20,000

00

0

0

5,000 10,000 15,000 20,000 25,000

Worker MPM, Response Time, Single Listen Statement

1.3x

120.00

THROUGHPUT

UTI

LIZA

TIO

N (%

)

100.00

80.00

60.00

40.00

20.00

0

5,000 10,000 15,000 20,000 25,000

Worker MPM, Utilization, Single Listen Statement


3.00

THROUGHPUT

OPE

RAT

ING

FRE

QU

ENC

Y (G

Hz)

2.50

2.00

1.50

1.00

0.50

0.00

5,000 10,000 15,000 20,000 25,000

Worker MPM, Operating Frequency, Single Listen Statement

Figure 6. Worker MPM, Patch Throughput Results, Single Listen Statement

120,000

THROUGHPUT



RESP

ON

SE T

IME

(ms)

100,000

80,000

60,000

40,000

20,000

00

0

0

5,000 10,000 15,000 20,000 25,000

Worker MPM, Response Time, Two Listen Statements

1.7x

120.00

THROUGHPUT

UTI

LIZA

TIO

N (%

)

100.00

80.00

60.00

40.00

20.00

0

5,000 10,000 15,000 20,000 25,000

Worker MPM, Utilization, Two Listen Statements


3.00

THROUGHPUT

OPE

RAT

ING

FRE

QU

ENC

Y (G

Hz)

2.50

2.00

1.50

1.00

0.50

0.00

5,000 10,000 15,000 20,000 25,000

Worker MPM, Operating Frequency, Two Listen Statements

Figure 7. Worker MPM, Patch Throughput Results, Two Listen Statements

7


Static Workload Testing—Up to 3.9X Improvement

Static workload testing was completed using Apache Bench. A 48 thread client system initializes 48 instances of Apache Bench with no binding/affinity to cores. Each request dataset covers four different sizes of the static file: 100 bytes, 1,000 bytes, 4,000 bytes, and 10,000 bytes. Keep Alive is disabled for patch testing in order to measure the real number of connections per second capacity that Apache can handle. This method also ensures the randomness of the concurrent requests.

The workload was run for servers on both Red Hat Enterprise Linux 6.5 and Ubuntu* 14.04, each with Linux kernel version 3.13. The server configuration is shown in Appendix B.

The results for Apache Bench performance on Red Hat Enterprise Linux are shown in Figure 8. The patched version of Apache performs significantly better than the non-patched version for all sizes of requests.

Figure 8. Apache* Bench Static Workload Results—Red Hat Enterprise Linux*

The results for Apache Bench performance on Ubuntu are shown in Figure 8. Here, too, the patched version of Apache performs significantly better than the non-patched version for all sizes of requests.

NO SO_REUSEPORT SO_REUSEPORT

CONCURRENT REQUESTS

REQUES

TS/S

ECOND

REQUES

TS/S

ECOND

Apache* Bench 4,000 Bytes

Apache* Bench 100 Bytes Apache* Bench 1,000 Bytes


3.5x 2.4x

20,000

CONCURRENT REQUESTS

240 480 720 960 1,200 1,440 1,680 1,920 2,160 2,4000 0

20,00040,000

60,000

80,000

100,000

120,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

REQUES

TS/S

ECOND

3.7x

20,000

CONCURRENT REQUESTS

240 480 720 960 1,200 1,440 1,680 1,920 2,160 2,4000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

REQUES

TS/S

ECOND

3.4x

20,000

CONCURRENT REQUESTS

240 480 720 960 1,200 1,440 1,680 1,920 2,160 2,400

240 480 720 960 1,200 1,440 1,680 1,920 2,160 2,400

0

40,000

60,000

80,000

100,000

120,000

140,000

160,000

HIGHER IS BETTER

8


Figure 9. Apache* Bench Static Workload Results—Ubuntu*

Conclusion

With the increasing availability of large multi-core web server platforms, Apache web server needs to be able to take advan-tage of all the hardware resources in the system. Prior to Linux 3.9, Apache was limited to a one-to-one assignment between ports and listening sockets, resulting in Apache being a bottleneck for web throughput. The SO_REUSEPORT option in Linux 3.9 and later expands port-to-socket assignment, allowing software, like Apache, to easily scale to larger capacities by utilizing more of the platform resources.

A new Apache patch uses SO_REUSEPORT to allow the software to scale with large multi-core servers. In testing of the Apache patch, Intel was able to show throughput improvements as much as 1.9X on the Apache Pre-fork MPM with two listening statements. Similar results were achieved for the multi-threaded MPMs with single and two listening statements. Even larger improvements were achieved in the static workload tests—as much as 3.9X on Ubuntu.

By scaling throughput for Apache, companies can achieve significant improvements in the number of customers being serviced on a given configuration, or in lowering cost of the required hardware to service a given workload.

The Apache patch is available in the trunk of the Apache software:

For GIT: https://github.com/apache/httpd

For SVN: http://svn.apache.org/repos/asf/httpd/httpd/trunk httpd-trunk

NO SO_REUSEPORT SO_REUSEPORT

CONCURRENT REQUESTS

REQUES

TS/S

ECOND

REQUES

TS/S

ECOND


Apache* Bench 100 Bytes Apache* Bench 1,000 Bytes

Apache* Bench 10,000 BytesHIGHER IS BETTER

3.4x 2.3x

20,000

CONCURRENT REQUESTS240 480 720 960 1,200 1,440 1,680 1,920 2,160 2,400

0 0

20,00040,000

60,000

80,000

100,000

120,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

REQUES

TS/S

ECOND

3.9x

20,000

CONCURRENT REQUESTS240 480 720 960 1,200 1440 1,680 1,920 2,160 2,400

0

40,000

60,000

80,000

100,000

120,000

140,000

160,000

180,000

REQUES

TS/S

ECOND

3.5x

20,000

CONCURRENT REQUESTS240 480 720 960 1,200 1,440 1,680 1,920 2,160 2,400

240 480 720 960 1,200 1,440 1,680 1,920 2,160 2,400

0

40,000

60,000

80,000

100,000

120,000

140,000

160,000

9


Appendix A – Processor Utilization and Frequency Measurements

Evaluating the operating frequency and processor utilization for the pre-fork MPM single-listen statement case shows the reason for the throughput improvements. Without the patch, the blocking mutex prevents the software from taking advantage of the resources available in the platform. In the single listen statement case, there is no mutex being used. The gating factor is the “single listen socket” instead of the mutex. In the double listen statements case, the gating factors are both “limited listen sockets (only 2, minor gating factor) and the single big mutex (major gating factor)”.

120,000

THROUGHPUT



RESP

ON

SE T

IME

(ms)

100,000

80,000

60,000

40,000

20,000

00

0

0

5,000 10,000 15,000 20,000 25,000


1.7x

120.00

THROUGHPUT

UTI

LIZA

TIO

N (%

)

100.00

80.00

60.00

40.00

20.00

0

5,000 10,000 15,000 20,000 25,000



3.00

THROUGHPUT

FREQ

UEN

CY

(GH

z)

2.50

2.00

1.50

1.00

0.50

0.00

5,000 10,000 15,000 20,000 25,000


Figure 2a and 2b. Pre-Fork MPM, Patch Utilization and Frequency Results, Single Listen Statement

120,000

THROUGHPUT



RESP

ON

SE T

IME

(ms)

100,000

80,000

60,000

40,000

20,000

00

0

0

5,000 10,000 15,000 20,000 25,000


1.7x

120.00

THROUGHPUT

UTI

LIZA

TIO

N (%

)

100.00

80.00

60.00

40.00

20.00

0

5,000 10,000 15,000 20,000 25,000



3.00

THROUGHPUT

FREQ

UEN

CY

(GH

z)

2.50

2.00

1.50

1.00

0.50

0.00

5,000 10,000 15,000 20,000 25,000


10


Figure 3a and 3b. Pre-Fork MPM, Patch Utilization and Frequency Results, Two Listen Statements

Processor utilization never reaches above 80 percent, so the system cannot take advantage of the available resource headroom. With the patch, the processor utilization jumps to nearly 100 percent, and the processor speed increases from 1.9 GHz to 2.6 GHz. Resources become fully available with the patch.

Pre-fork MPM utilization and processor frequency behave similarly to the single listen statement case with markedly improved throughput up to saturation.

120,000

THROUGHPUT



RESP

ON

SE T

IME

(ms)

100,000

80,000

60,000

40,000

20,000

0

5,000 10,000 15,000 20,000 25,000


1.9x

THROUGHPUT

UTI

LIZA

TIO

N (%

)

100.00

80.0

90.0

60.0

70.0

40.0

50.0

20.0

30.0

10.0

0.05,0000

0

0

10,000 15,000 20,000 25,000



THROUGHPUT

OPE

RAT

ING

FRE

QU

ENC

Y (G

Hz)

2.50

2.00

1.50

1.00

0.50

.00

5,000 10,000 15,000 20,000 25,000


120,000

THROUGHPUT



RESP

ON

SE T

IME

(ms)

100,000

80,000

60,000

40,000

20,000

0

5,000 10,000 15,000 20,000 25,000


1.9x

THROUGHPUT

UTI

LIZA

TIO

N (%

)

100.00

80.0

90.0

60.0

70.0

40.0

50.0

20.0

30.0

10.0

0.05,0000

0

0

10,000 15,000 20,000 25,000



THROUGHPUT

OPE

RAT

ING

FRE

QU

ENC

Y (G

Hz)

2.50

2.00

1.50

1.00

0.50

.00

5,000 10,000 15,000 20,000 25,000


11


With access to previously unused processor resources, utilization rose from about 85 percent to nearly 100 percent, and processor speed from 2.2 to 2.6 GHz.

120,000

THROUGHPUT



RESP

ON

SE T

IME

(ms)

100,000

80,000

60,000

40,000

20,000

00

0

0

5,000 10,000 15,000 20,000 25,000


1.4x

120.00

THROUGHPUT

UTI

LIZA

TIO

N (%

)

100.00

80.00

60.00

40.00

20.00

0

5,000 10,000 15,000 20,000 25,000



3.00

THROUGHPUT


OPE

RAT

ING

FRE

QU

ENC

Y (G

Hz)

2.50

2.00

1.50

1.00

0.50

0.00

5,000 10,000 15,000 20,000 25,000


120,000

THROUGHPUT



RESP

ON

SE T

IME

(ms)

100,000

80,000

60,000

40,000

20,000

00

0

0

5,000 10,000 15,000 20,000 25,000


1.4x

120.00

THROUGHPUT

UTI

LIZA

TIO

N (%

)

100.00

80.00

60.00

40.00

20.00

0

5,000 10,000 15,000 20,000 25,000



3.00

THROUGHPUT


OPE

RAT

ING

FRE

QU

ENC

Y (G

Hz)

2.50

2.00

1.50

1.00

0.50

0.00

5,000 10,000 15,000 20,000 25,000


Figure 4a and 4b. Event MPM, Patch Utilization and Frequency Results, Single Listen Statement

12


120,000

THROUGHPUT



RESP

ON

SE T

IME

(ms)

100,000

80,000

60,000

40,000

20,000

00

0

0

5,000 10,000 15,000 20,000 25,000


120.00

THROUGHPUT

UTI

LIZA

TIO

N (%

)

100.00

80.00

60.00

40.00

20.00

0

5,000 10,000 15,000 20,000 25,000



3.00

THROUGHPUT

OPE

RAT

ING

FRE

QU

ENC

Y (G

Hz)

2.50

2.00

1.50

1.00

0.50

0.00

5,000 10,000 15,000 20,000 25,000


120,000

THROUGHPUT



RESP

ON

SE T

IME

(ms)

100,000

80,000

60,000

40,000

20,000

00

0

0

5,000 10,000 15,000 20,000 25,000


120.00

THROUGHPUT

UTI

LIZA

TIO

N (%

)

100.00

80.00

60.00

40.00

20.00

0

5,000 10,000 15,000 20,000 25,000



3.00

THROUGHPUT

OPE

RAT

ING

FRE

QU

ENC

Y (G

Hz)

2.50

2.00

1.50

1.00

0.50

0.00

5,000 10,000 15,000 20,000 25,000


Figure 5a and 5b. Event MPM, Patch Utilization and Frequency Results, Two Listen Statements

13


120,000

THROUGHPUT



RESP

ON

SE T

IME

(ms)

100,000

80,000

60,000

40,000

20,000

00

0

0

5,000 10,000 15,000 20,000 25,000


1.3x

120.00

THROUGHPUT

UTI

LIZA

TIO

N (%

)

100.00

80.00

60.00

40.00

20.00

0

5,000 10,000 15,000 20,000 25,000



3.00

THROUGHPUT

OPE

RAT

ING

FRE

QU

ENC

Y (G

Hz)

2.50

2.00

1.50

1.00

0.50

0.00

5,000 10,000 15,000 20,000 25,000


120,000

THROUGHPUT



RESP

ON

SE T

IME

(ms)

100,000

80,000

60,000

40,000

20,000

00

0

0

5,000 10,000 15,000 20,000 25,000


1.3x

120.00

THROUGHPUT

UTI

LIZA

TIO

N (%

)

100.00

80.00

60.00

40.00

20.00

0

5,000 10,000 15,000 20,000 25,000



3.00

THROUGHPUT

OPE

RAT

ING

FRE

QU

ENC

Y (G

Hz)

2.50

2.00

1.50

1.00

0.50

0.00

5,000 10,000 15,000 20,000 25,000


Figure 6a and 6b. Worker MPM, Patch Utilization and Frequency Results, Single Listen Statement

14


Figure 7a and 7b. Worker MPM, Patch Utilization and Frequency Results, Two Listen Statements

120,000

THROUGHPUT



RESP

ON

SE T

IME

(ms)

100,000

80,000

60,000

40,000

20,000

00

0

0

5,000 10,000 15,000 20,000 25,000


1.7x

120.00

THROUGHPUT

UTI

LIZA

TIO

N (%

)

100.00

80.00

60.00

40.00

20.00

0

5,000 10,000 15,000 20,000 25,000



3.00

THROUGHPUT

OPE

RAT

ING

FRE

QU

ENC

Y (G

Hz)

2.50

2.00

1.50

1.00

0.50

0.00

5,000 10,000 15,000 20,000 25,000


120,000

THROUGHPUT



RESP

ON

SE T

IME

(ms)

100,000

80,000

60,000

40,000

20,000

00

0

0

5,000 10,000 15,000 20,000 25,000


1.7x

120.00

THROUGHPUT

UTI

LIZA

TIO

N (%

)

100.00

80.00

60.00

40.00

20.00

0

5,000 10,000 15,000 20,000 25,000



3.00

THROUGHPUT

OPE

RAT

ING

FRE

QU

ENC

Y (G

Hz)

2.50

2.00

1.50

1.00

0.50

0.00

5,000 10,000 15,000 20,000 25,000


15


Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance.

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

A “Mission Critical Application” is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL’S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS’ FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined”. Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copyright © 2015 Intel Corporation. All rights reserved. Intel, the Intel logo, and Intel Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 0115/RS/HBD/PDF 331818-001US

Appendix B – Testing Configuration

TABLE 1. CONFIGURATION OF SERVER HARDWARE FOR INTEL INTERNAL MEASUREMENTS OF DYNAMIC WORKLOAD TESTING

Processors 2 X Intel® Xeon® Processor E5 v3 family

Cores 18 cores per processor

Memory 4 x 8 GB 1RX4 DDR4-2133

BIOS 36R05, HT, Turbo, Prefetchers, NUMA enabled

Linux* OS Red Hat Enterprise Linux,* with updated kernel 3.13.9

NIC 10GbE Niantic

Apache Httpd Trunk rev. 1600656

PHP 5.6.0, PHP: Memcache extension 3.0.8

MySQL 5.6.13

TABLE 2. CONFIGURATION OF SERVER HARDWARE FOR INTEL INTERNAL MEASUREMENTS OF STATIC APACHE* BENCH WORKLOAD TESTING

Processors 2 X Intel® Xeon® Processor E5 v3 family

Cores 18 cores per processor

Memory 2 X 8 GB 1RX4 DDR4-2133

BIOS 36R05, HT, Turbo, Prefetchers, NUMA enabled

Linux OS Red Hat Enterprise Linux with updated kernel 3.13 and Ubuntu* 14.04.1 (default kernel 3.13)

NIC 10GbE Niantic

Apache Httpd Trunk rev. 1636195 with Worker MPM and 1 listen statement

Apache* Bench version running on clients

Version 2.3 Revision: 1628388

Appendix C – Acknowledgements

The following individuals assisted with many aspects developing and evaluating the patch.

Code reviewers: Graham Leggett, Mike Rumph, Tim Bannister, William A. Rowe Jr., Rüdiger Plüm, and Arkadiusz Miśkiewicz

Restart bug discovery: Kaspar Brand

Apache committers who helped review, modify, and commit the patch: Jeff Trawick, Jim Jagielski, Yann Ylavic

Prototype/data analysis and review: Vish Viswanathan, Andi Kleen


scaling apache* web server performance on intel®-based ...€¦ · in apache server. prior to...

Documents