modeling and simulation of performance analysis for a cluster-based web server

www.elsevier.com/locate/simpat

Simulation Modelling Practice and Theory 14 (2006) 188–200

Modeling and simulation of performanceanalysis for a cluster-based Web server

Jianhua Yang a,*, Di Jin b, Ye Li b,Kai-Steffen Hielscher b, Reinhard German b

a College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, Chinab Department of Computer Science, Erlangen-Nuremberg University, 91058 Erlangen, Germany

Received 3 January 2004; received in revised form 14 March 2005; accepted 29 April 2005

Available online 17 June 2005

Abstract

Higher scalability and availability of Web servers are required as the traffic on the Internet

has been increasing dramatically over the last few years. This paper focuses on modeling and

simulation of the performance analysis for the test system of a cluster-based Web server con-

sisting of five real servers. Three ways of load balancing are introduced, namely, network

address translation, IP tunneling and direct routing. Calculation of packet delay is discussed

for each part of the running system according to the transferred data in the system. The input

model and the system model to measure and simulate the system performance are derived, and

the probability distributions of the delay data are specified for using random inputs in the

model according to the Q–Q plot and the cumulative distribution function. After the solution

of performance tuning problems is evaluated, the maximum process capability of the system is

found and a possible performance bottleneck is analyzed.

� 2005 Elsevier B.V. All rights reserved.

Keywords: Cluster-based Web server; System simulation; Modeling; Load balancing; Performance; Pro-

bability distribution

1569-190X/$ - see front matter � 2005 Elsevier B.V. All rights reserved.

doi:10.1016/j.simpat.2005.04.004

* Corresponding author.

E-mail address: [email protected] (J. Yang).

mailto:[email protected]

J. Yang et al. / Simulation Modelling Practice and Theory 14 (2006) 188–200 189

1. Introduction

The Internet has grown at exponential rates with the popularity of the World

Wide Web for several years. Whether it is for internal or external use, the Web

enables individuals, researchers, and companies to communicate with each other,and share memory space, software and information, in an easy-to-use and cost-effec-

tive manner. The workload on Web servers is increasing rapidly so that the servers

will be easily overloaded in the near future, especially for some popular Web servers.

To overcome the overloading problem of the servers, there are two solutions. One is

the single server solution, i.e. to upgrade the server to a higher performance, but it

will possibly be overloaded soon when requests increase so that it has to be upgraded

again. Normally the upgrading process is complex and the cost is high. The other

solution is the multi-server solution, i.e. to build a scalable server system on a clusterof servers. When load increases, either one new server or more can simply be added

to the system to reduce the response time.

For industrial Internet sites it is often mentioned that there are certain minimal

requirements as to the performance of such systems, which is sometimes character-

ized by throughput. It is good to measure the performance within a test-environ-

ment, before the systems are introduced to their end-customers.

The schemes of the cluster systems may be classified into dispatcher-based [1–3],

which is modeled and simulated in this paper, and DNS (Domain Name Server)-based [4]. The main disadvantage of the DNS-based scheme is that only part of

the incoming requests can be controlled in the system, and DNS caching introduces

a skewed load on a clustered server by an average of ±40% of the total load [5].

Intuitively, the greater the number of computers in a cluster-based Web server, the

better the performance of the server and the higher the price of the server. The goal of

this paper is to model and simulate the performance characteristics of a cluster Web

server according to the packets that the server deals with and the number of real com-

puters in the server. The representative configuration of the server is presented,namely, a virtual server via network address translation, a virtual server via IP (Inter-

net Protocol) tunneling, and a virtual server via direct routing. Amodel for the system

performance analysis is constructed. The packet delay is simulated and calculated for

each part of the system, which depends on the data transferred to the server. In order

to use random inputs in the model, probability distributions are specified using quan-

tile–quantile (Q–Q) plots and cumulative distribution functions (CDF). The method

and the process to apply some simulation tools, e.g. Automod [6] and R [7], are dis-

cussed. After the solution of the performance tuning problems is evaluated throughstudying some performance characteristics and particular segments of the system,

the maximum process capability of the system is found and a possible performance

bottleneck is analyzed. The input modeling of the system is analyzed in detail.

2. Configuration of the cluster-based Web server with a dispatcher

A cluster-based Web server system is built on a cluster of real servers. All the realservers in a cluster must have the same application deployment configuration. The

190 J. Yang et al. / Simulation Modelling Practice and Theory 14 (2006) 188–200

architecture of the cluster is transparent to end-users, who only see a single virtual

server. The front-end of the real servers is a load balancer (or a dispatcher), which

schedules requests to the different real servers and makes parallel services of the clus-

ter to appear as a virtual service on a single IP address, called the virtual IP address.

The virtual server is implemented in three ways that depend on the load balancingtechniques:

• Virtual Server via Network Address Translation (VS/NAT),

• Virtual Server via IP Tunneling (VS/IPT),

• Virtual Server via Direct Routing (VS/DR).

NAT is a feature by which IP addresses are mapped from one group to another.

Fig. 1 gives a demonstration of how it works. When accessing the service providedby the cluster server, the request packet destined for a virtual-IP-address arrives at

the load balancer. The load balancer examines the packet�s destination address

and port number. If they are matched for a virtual server service, a real server is cho-

sen from the cluster by a scheduling algorithm. Then, the destination address and the

port of the packet are rewritten as those of the chosen real server, and the packet is

forwarded to the real server. When the reply packets come back, the load balancer

rewrites the source address and port of the packets as those of the virtual service so

that the source addresses always point to the virtual-IP-address. With this technique,request and response packets need to pass through the load balancer.

The most obvious difference between VS/IPT and VS/NAT is that the load bal-

ancer sends requests to real servers through an IP tunnel in VS/IPT. The load

balancer encapsulates the packet within an IP datagram and forwards it to the cho-

sen server. When the encapsulated packet arrives, the real server decapsulates it, pro-

cesses the request, and returns the result directly to the user.

Virtual IP Address

(5) Replies

(4) Rewriting Replies(2) Scheduling & Rewriting Requests

(1) Requests

Load Balancer

Internet

RealServer 1

RealServer 2

RealServer 3

RealServer 4

(3) Processing Requests

Fig. 1. Virtual server via NAT.


In VS/DR, the virtual-IP-address is shared by the real servers and the load bal-

ancer. The load balancer has also an interface configured with the virtual-IP-address,

which is used to accept request packets, and directly routes the packets to the chosen

servers. All the real servers have their interfaces configured with the virtual-IP-ad-

dress or redirect packets destined for the address to a local socket, so that the realservers can process the packets locally. The load balancer and the real servers must

have one of their interfaces physically linked by a hub/switch. The load balancer sim-

ply changes the Medium Access Control (MAC) address of the data frame to that of

the chosen server and retransmits it on the Local Area Network (LAN).

3. Simulated environment and project definition

Suppose there exists a running system of five real servers with VS/NAT [8], which

is set up on a Linux/Cluster platform [9] and serves as the environment to be simu-

lated. All the data that the simulation model needs is gotten from this system where

the load balancer works as a default gateway for the real servers, which process the

actual incoming requests and can run any operating system without modification.

In order to select an empirical random distribution for all the delays that will be

used to set up the models via Automod, the following data should be calculated from

the data collected from the real system:

• The transport delay from clients to the load balancer.

• The load balancer response delay.

• The transport delay from the load balancer to the real servers.

• The real server response delay.

• The transport delay from the server to the load balancer.

• The load balancer response delay.

• The transport delay from the load balancer to the clients.

The system can generate log files that record the time stamps for individual pack-

ages. These log files indirectly describe the packet delay for the client, i.e. the load

generator, the load balancer, and each real server, respectively. The files are text files

and include the data with the following format:

192.168.3.202
IP address
53295
Port
1022834955.914398316
Time stamp in seconds
1
Type
2072754558
Sequence number
192.168.3.202

53295

1022834955.914424877

4

2081793697


Whenever a request arrives into the system, one SYN (SYNchronized) package

that has a unique sequence number is generated and transferred to a certain real ser-

ver; next the server handles the SYN package and gives an answer. After the clientreceives the answer, a connection is built and there is a unique port number for each

connection. The client and the real server will communicate through this connection

and an ACK pACKage is delivered till a FIN (FINished) package is transferred. The

package type and the direction of the transformation are defined as an enumerated

type:

enum fIS1; IA1; IF1;OS1;OA1;OF1; IS2; IA2; IF2;OS2;OA2;OF2gwhere IS means incoming SYN, IA incoming ACK, IF incoming FIN, OS outgo-

ing SYN, OA outgoing ACK, OF outgoing FIN. The index indicates whether the

package is rewritten or not. Fig. 2 shows how these types correspond.

Therefore, after a package is found in the different log files—according to some

records having the same port, type and sequence number—the relative time stamp

is subtracted, and the delay of the package between points in the system can be cal-

culated, which is done by a C++ program.

4. Modeling

4.1. Conceptual model

The simulated system includes one load balancer and five real servers. The con-

ceptual model is shown in Fig. 3.

4.2. Input modeling

In order to carry out the simulation of the system by using random inputs, their

probability distributions must be specified first.

4.2.1. Graphs of the delays

The data for the delays are shown in Figs. 4–6.

All the distributions of the delays in real servers 2 through 5 are similar to the onein real server 1, shown in Fig. 5. The distributions of the delays in the other channels

are similar to the one in the channel from the real server 1 to the load balance, shown

in Fig. 6.

RealServers

OS1, OA1, OF1OS2, OA2, OF2

IS2, IA2, IF2

Clients

IS1, IA1, IF1 Load

Balancer

Fig. 2. Correspondence of the enumerated types.

Channel(from Clients

to LB)Clients

Channel(from RS to LB)

Channel(from LBto Clients)

Requests

Scheduling &Rewriting Requests

Replies

RewritingReplies

Channel(from LB to RS)

RealServers

(RS)

LoadBalancer

(LB)

Fig. 3. The conceptual model of the real system.

Fig. 4. Delays in the load balancer.

Fig. 5. Delays in real server 1.


Fig. 6. Delays in the channel from real server 1 to the load balancer.


4.2.2. Summary of the delays

The minimum values, maximum values, medians, means and variances for thedelays can be calculated; see the lines of Table 1, where LB stands for Load Balancer,

and RS Real Server. This table clearly shows that the difference between the mean

and the median of the delays per group is not close, so the distribution is not sym-

metric. It should be pointed out that the computed variances here are underesti-

mated because the autocorrelations of the observed delays are ignored. Since the

coefficient of variance is not equal to 1, the distribution is not exponential; possibly,

it is Gamma distribution.

4.2.3. Histogram of delays

The histograms of some delay data are shown in Figs. 7–9.

4.2.4. Probability distributions

None of the three histograms in Section 4.2.3 is symmetric, despite the fact that

symmetric normal input distributions are widely used in many simulation applica-

tions. The distribution appropriate for each channel seems to be the Gamma or

Table 1

Some statistics for the delays

Min. Median Mean Max. Variance Coefficient

of variance

Channel from

RS 1 to LB

6.869e�05 5.075e�04 5.764e�04 2.361e�03 9.373e�08 5.31e�01

Channel from

clients to LB

9.839e�06 1.979e�04 4.461e�04 2.555e�03 2.703e�07 1.17e+00

Channel from

LB to RS

1.570e�07 9.347e�05 1.429e�04 9.126e�04 1.783e�08 9.34e�01

Channel from

LB to clients

9.867e�05 2.497e�04 3.439e�04 2.522e�03 1.030e�07 9.33e�01

LB 5.580e�07 9.400e�07 9.635e�07 6.436e�05 1.195e�12 1.13e+00

RS 1 3.372e�06 2.252e�04 8.880e�02 2.999e+00 0.228 5.38e+00

Fig. 7. Histogram of the delays in the channel from RS 1 to LB.

Fig. 8. Histogram of the delays in LB.

Fig. 9. Histogram of the delays in RS 1.


Lognormal distribution; see Fig. 7. But there are no good fitting distributions [10] for

the load balance and the real servers of Figs. 8, 9 and Table 1.


Both the Gamma and the Lognormal distributions require two arguments,

namely, the shape parameter and the scale parameter. The relationship among mean,

variance, shape parameter, and scale parameter is nonlinear. For the channel from

real server 1 to the load balance, two methods can be used to get the parameters

of the distributions, namely, the Maximum Likelihood Estimator (MLE) [10] andthe Steepest Descent Method [11], which always gives the solution of the nonlinear

equations and is not so dependent on initial values as Newton–Raphson iterative

method. These parameters are shown in Table 2.

Some plots with the shape and the scale parameters of Table 2, such as the

density plot, the Q–Q plot and CDF, can be formed and used to determine

which distribution (either the Gamma or the Lognormal distribution) best fits

empirical data for the channel from real server 1 to the load balance. For example,

the Gamma CDF with the parameters of Table 2 and the empirical distributionare plotted in Fig. 10 using the R tool. The Q–Q plots of the Gamma and Lognor-

mal distributions with the parameters of Table 2 are shown in Figs. 11 and 12,

separately.

It is clear that the distribution of the channels must be the Gamma distribution, so

the Lognormal distribution is rejected.

Table 2

Estimated parameters of the distribution for the channel from RS 1 to LB

MLE Steepest Descent Method

Gamma Shape parameter 3.96 3.85259

Scale parameter 1.456e�04 1.49618576e�004

Lognormal Shape parameter 1.06525717e�007 6.44682462e�006Scale parameter �7.58917376 �6.93553198

Fig. 10. Plot of the Gamma CDF and the empirical distribution.

Fig. 11. Q–Q plot of the Gamma distribution.

Fig. 12. Q–Q plot of the Lognormal distribution.


5. Simulation with Automod and validation

After the input modeling and the system�s conceptual modeling are finished, the

simulation model is programmed in Automod, as shown in Fig. 13, where R stands

for Resource, Q Queue, and LG Load Generator (i.e. clients).

In Automod, the uniform distribution is selected for the load generator in the pro-

cess procedure of ‘‘channel_LG_LB’’. The load generator generates the packets that

suffer from delays before arriving at the load balancer. In the queue of the load bal-ancer, bi-direction is allowed using an attribute variable. The Round Robin strategy

is chosen for delivering packets to one of the real servers in the load balancer. The

˚

˚

˚

˚R_RS(5)

˚R_RS(4)

˚R_RS(3)

Q_RS(2)

Q_RS(4)

Q_RS(3)

Q_RS(5)

Q_RS(1)

˚R_RS(2)

R_RS(1)

˚

R_Channel_RS_LBQ_Channel_RS_LB

R_Channel_LB_RSQ_Channel_LB_RS

Q_LB

˚

R_LB

R_Channel_LG_LBQ_Channel_LG_LB

˚R_Channel_ B_LGL

Q_Channel_LB_LG

Fig. 13. Model in Automod.


validation of the simulation model is carried out by comparing the running system

and the model for the following outputs:

• The current packets in the queues of the real servers, the load balance andchannels.

• The CPU utilization of the real servers.

For example, using the observed real delay data and setting a lower resource

capacity to the real servers in Automod, most of the loads are found to be blocked

in the queues of real servers; the related plot of the current packets in the queues of

five real servers is shown in Fig. 14. This means that the process capability of the realservers is much less than the real running system because there are almost no loads

to be blocked there. If the resource capacity is increased, which means more loads

can be processed in parallel and there is higher utilization in the real servers, no

loads will be blocked in the queue. Therefore, the capacity should be adjusted

according to the real cluster system.

Fig. 14. Current packets in the queues of five real servers with a lower resource capacity.

Fig. 15. Average utilizations of five real servers with half a mean.


When the load generation rate is increased, the maximum process capability and

the bottleneck of the system can be found by inspecting the current packets in the

queues. For example, if the rate is doubled (i.e. the uniform distribution�s mean

should be half the original one), the system can still process all the arriving packages.

However, the average utilizations of the real servers are almost 80%, as shown in Fig.

15. In this case, one new real server should possibly be added.

6. Conclusions and future work

The experimental platform is based on a cluster Web server consisting of five real

servers. The service times of the system, including the dispatcher and the servers, are

measured and used to estimate the input data for the simulation. The conceptual

models of the load balancer, the real servers and the transport channels are derived.

The performance simulation of the cluster Web server is done step by step using the

input modeling technique, random distribution theory and output analysis, which isa necessary aid for cost-effectiveness in the Internet development. The maximum

process capability and the bottleneck of the system can be found by increasing the

load generation rate and inspecting the current packets in the queues. The simulation

result shows that the number of real servers possibly affects the performance of the

real system when more packets are dealt with. These methods can also be applied to

the simulation and analysis in other fields.

Some future refinements of the simulation provide more choices for the schedul-

ing and balancing strategies in the load balancer, and comparing the models for thedifferent virtual servers. The autocorrelation of observed delays in the system may be

dealt with by making some replications of the simulation.

References

[1] T. Schroeder, S. Goddard, B. Ramamurthy, Scalable Web server clustering technologies, IEEE

Network 14 (3) (2000) 38–45.

[2] A. Iyengar, J. Challenger, P. Dantzig, High-performance Web site design techniques, IEEE Internet

Computing 4 (2) (2000) 17–26.


[3] M.T. Yong, R. Ayani, Comparison of load balancing strategies on cluster-based Web servers,

Simulation 77 (5–6) (2001) 185–195.

[4] V. Cardellini, M. Colajanni, P.S. Yu, Dynamic load balancing on Web-server systems, IEEE Internet

Computing 3 (3) (1999) 28–39.

[5] H. Bryhni, E. Klovning, O. Kure, A comparison of load balancing on Web-server systems, IEEE

Network 14 (4) (2000) 58–63.

[6] J. Nikoukaran, Software selection for simulation in manufacturing: a review, Simulation Practice and

Theory 7 (1) (1999) 1–14.

[7] R Development Core Team, R Manual (Jan. 2002). <http://www.r-project.org/>.

[8] A. Chepurko, Instrumenting a Cluster-based Web Server for Performance Measuring, Master Thesis,

Erlangen-Nuremberg University, 2002.

[9] D.D. Bovet, M. Ceasti, Understanding the Linus Kernel, O�Reilly & Associates, Inc., 2000.

[10] A.M. Law, W.D. Kelton, Simulation Modeling and Analysis, third ed., McGraw-Hill Inc., 2000.

[11] R.L. Burden, J.D. Faires, Numerical Analysis, seventh ed., Addison-Wesley Inc., 2000.

http://www.r-project.org/

modeling and simulation of performance analysis for a cluster-based web server

Documents