modeling and simulation of performance analysis for a cluster-based web server
TRANSCRIPT
www.elsevier.com/locate/simpat
Simulation Modelling Practice and Theory 14 (2006) 188–200
Modeling and simulation of performanceanalysis for a cluster-based Web server
Jianhua Yang a,*, Di Jin b, Ye Li b,Kai-Steffen Hielscher b, Reinhard German b
a College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, Chinab Department of Computer Science, Erlangen-Nuremberg University, 91058 Erlangen, Germany
Received 3 January 2004; received in revised form 14 March 2005; accepted 29 April 2005
Available online 17 June 2005
Abstract
Higher scalability and availability of Web servers are required as the traffic on the Internet
has been increasing dramatically over the last few years. This paper focuses on modeling and
simulation of the performance analysis for the test system of a cluster-based Web server con-
sisting of five real servers. Three ways of load balancing are introduced, namely, network
address translation, IP tunneling and direct routing. Calculation of packet delay is discussed
for each part of the running system according to the transferred data in the system. The input
model and the system model to measure and simulate the system performance are derived, and
the probability distributions of the delay data are specified for using random inputs in the
model according to the Q–Q plot and the cumulative distribution function. After the solution
of performance tuning problems is evaluated, the maximum process capability of the system is
found and a possible performance bottleneck is analyzed.
� 2005 Elsevier B.V. All rights reserved.
Keywords: Cluster-based Web server; System simulation; Modeling; Load balancing; Performance; Pro-
bability distribution
1569-190X/$ - see front matter � 2005 Elsevier B.V. All rights reserved.
doi:10.1016/j.simpat.2005.04.004
* Corresponding author.
E-mail address: [email protected] (J. Yang).
J. Yang et al. / Simulation Modelling Practice and Theory 14 (2006) 188–200 189
1. Introduction
The Internet has grown at exponential rates with the popularity of the World
Wide Web for several years. Whether it is for internal or external use, the Web
enables individuals, researchers, and companies to communicate with each other,and share memory space, software and information, in an easy-to-use and cost-effec-
tive manner. The workload on Web servers is increasing rapidly so that the servers
will be easily overloaded in the near future, especially for some popular Web servers.
To overcome the overloading problem of the servers, there are two solutions. One is
the single server solution, i.e. to upgrade the server to a higher performance, but it
will possibly be overloaded soon when requests increase so that it has to be upgraded
again. Normally the upgrading process is complex and the cost is high. The other
solution is the multi-server solution, i.e. to build a scalable server system on a clusterof servers. When load increases, either one new server or more can simply be added
to the system to reduce the response time.
For industrial Internet sites it is often mentioned that there are certain minimal
requirements as to the performance of such systems, which is sometimes character-
ized by throughput. It is good to measure the performance within a test-environ-
ment, before the systems are introduced to their end-customers.
The schemes of the cluster systems may be classified into dispatcher-based [1–3],
which is modeled and simulated in this paper, and DNS (Domain Name Server)-based [4]. The main disadvantage of the DNS-based scheme is that only part of
the incoming requests can be controlled in the system, and DNS caching introduces
a skewed load on a clustered server by an average of ±40% of the total load [5].
Intuitively, the greater the number of computers in a cluster-based Web server, the
better the performance of the server and the higher the price of the server. The goal of
this paper is to model and simulate the performance characteristics of a cluster Web
server according to the packets that the server deals with and the number of real com-
puters in the server. The representative configuration of the server is presented,namely, a virtual server via network address translation, a virtual server via IP (Inter-
net Protocol) tunneling, and a virtual server via direct routing. Amodel for the system
performance analysis is constructed. The packet delay is simulated and calculated for
each part of the system, which depends on the data transferred to the server. In order
to use random inputs in the model, probability distributions are specified using quan-
tile–quantile (Q–Q) plots and cumulative distribution functions (CDF). The method
and the process to apply some simulation tools, e.g. Automod [6] and R [7], are dis-
cussed. After the solution of the performance tuning problems is evaluated throughstudying some performance characteristics and particular segments of the system,
the maximum process capability of the system is found and a possible performance
bottleneck is analyzed. The input modeling of the system is analyzed in detail.
2. Configuration of the cluster-based Web server with a dispatcher
A cluster-based Web server system is built on a cluster of real servers. All the realservers in a cluster must have the same application deployment configuration. The
190 J. Yang et al. / Simulation Modelling Practice and Theory 14 (2006) 188–200
architecture of the cluster is transparent to end-users, who only see a single virtual
server. The front-end of the real servers is a load balancer (or a dispatcher), which
schedules requests to the different real servers and makes parallel services of the clus-
ter to appear as a virtual service on a single IP address, called the virtual IP address.
The virtual server is implemented in three ways that depend on the load balancingtechniques:
• Virtual Server via Network Address Translation (VS/NAT),
• Virtual Server via IP Tunneling (VS/IPT),
• Virtual Server via Direct Routing (VS/DR).
NAT is a feature by which IP addresses are mapped from one group to another.
Fig. 1 gives a demonstration of how it works. When accessing the service providedby the cluster server, the request packet destined for a virtual-IP-address arrives at
the load balancer. The load balancer examines the packet�s destination address
and port number. If they are matched for a virtual server service, a real server is cho-
sen from the cluster by a scheduling algorithm. Then, the destination address and the
port of the packet are rewritten as those of the chosen real server, and the packet is
forwarded to the real server. When the reply packets come back, the load balancer
rewrites the source address and port of the packets as those of the virtual service so
that the source addresses always point to the virtual-IP-address. With this technique,request and response packets need to pass through the load balancer.
The most obvious difference between VS/IPT and VS/NAT is that the load bal-
ancer sends requests to real servers through an IP tunnel in VS/IPT. The load
balancer encapsulates the packet within an IP datagram and forwards it to the cho-
sen server. When the encapsulated packet arrives, the real server decapsulates it, pro-
cesses the request, and returns the result directly to the user.
Virtual IP Address
(5) Replies
(4) Rewriting Replies(2) Scheduling & Rewriting Requests
(1) Requests
Load Balancer
Internet
RealServer 1
RealServer 2
RealServer 3
RealServer 4
(3) Processing Requests
Fig. 1. Virtual server via NAT.
J. Yang et al. / Simulation Modelling Practice and Theory 14 (2006) 188–200 191
In VS/DR, the virtual-IP-address is shared by the real servers and the load bal-
ancer. The load balancer has also an interface configured with the virtual-IP-address,
which is used to accept request packets, and directly routes the packets to the chosen
servers. All the real servers have their interfaces configured with the virtual-IP-ad-
dress or redirect packets destined for the address to a local socket, so that the realservers can process the packets locally. The load balancer and the real servers must
have one of their interfaces physically linked by a hub/switch. The load balancer sim-
ply changes the Medium Access Control (MAC) address of the data frame to that of
the chosen server and retransmits it on the Local Area Network (LAN).
3. Simulated environment and project definition
Suppose there exists a running system of five real servers with VS/NAT [8], which
is set up on a Linux/Cluster platform [9] and serves as the environment to be simu-
lated. All the data that the simulation model needs is gotten from this system where
the load balancer works as a default gateway for the real servers, which process the
actual incoming requests and can run any operating system without modification.
In order to select an empirical random distribution for all the delays that will be
used to set up the models via Automod, the following data should be calculated from
the data collected from the real system:
• The transport delay from clients to the load balancer.
• The load balancer response delay.
• The transport delay from the load balancer to the real servers.
• The real server response delay.
• The transport delay from the server to the load balancer.
• The load balancer response delay.
• The transport delay from the load balancer to the clients.
The system can generate log files that record the time stamps for individual pack-
ages. These log files indirectly describe the packet delay for the client, i.e. the load
generator, the load balancer, and each real server, respectively. The files are text files
and include the data with the following format:
192.168.3.202
IP address53295
Port1022834955.914398316
Time stamp in seconds1
Type2072754558
Sequence number192.168.3.202
53295
1022834955.914424877
4
2081793697
192 J. Yang et al. / Simulation Modelling Practice and Theory 14 (2006) 188–200
Whenever a request arrives into the system, one SYN (SYNchronized) package
that has a unique sequence number is generated and transferred to a certain real ser-
ver; next the server handles the SYN package and gives an answer. After the clientreceives the answer, a connection is built and there is a unique port number for each
connection. The client and the real server will communicate through this connection
and an ACK pACKage is delivered till a FIN (FINished) package is transferred. The
package type and the direction of the transformation are defined as an enumerated
type:
enum fIS1; IA1; IF1;OS1;OA1;OF1; IS2; IA2; IF2;OS2;OA2;OF2gwhere IS means incoming SYN, IA incoming ACK, IF incoming FIN, OS outgo-
ing SYN, OA outgoing ACK, OF outgoing FIN. The index indicates whether the
package is rewritten or not. Fig. 2 shows how these types correspond.
Therefore, after a package is found in the different log files—according to some
records having the same port, type and sequence number—the relative time stamp
is subtracted, and the delay of the package between points in the system can be cal-
culated, which is done by a C++ program.
4. Modeling
4.1. Conceptual model
The simulated system includes one load balancer and five real servers. The con-
ceptual model is shown in Fig. 3.
4.2. Input modeling
In order to carry out the simulation of the system by using random inputs, their
probability distributions must be specified first.
4.2.1. Graphs of the delays
The data for the delays are shown in Figs. 4–6.
All the distributions of the delays in real servers 2 through 5 are similar to the onein real server 1, shown in Fig. 5. The distributions of the delays in the other channels
are similar to the one in the channel from the real server 1 to the load balance, shown
in Fig. 6.
RealServers
OS1, OA1, OF1OS2, OA2, OF2
IS2, IA2, IF2
Clients
IS1, IA1, IF1 Load
Balancer
Fig. 2. Correspondence of the enumerated types.
Channel(from Clients
to LB)Clients
Channel(from RS to LB)
Channel(from LBto Clients)
Requests
Scheduling &Rewriting Requests
Replies
RewritingReplies
Channel(from LB to RS)
RealServers
(RS)
LoadBalancer
(LB)
Fig. 3. The conceptual model of the real system.
Fig. 4. Delays in the load balancer.
Fig. 5. Delays in real server 1.
J. Yang et al. / Simulation Modelling Practice and Theory 14 (2006) 188–200 193
Fig. 6. Delays in the channel from real server 1 to the load balancer.
194 J. Yang et al. / Simulation Modelling Practice and Theory 14 (2006) 188–200
4.2.2. Summary of the delays
The minimum values, maximum values, medians, means and variances for thedelays can be calculated; see the lines of Table 1, where LB stands for Load Balancer,
and RS Real Server. This table clearly shows that the difference between the mean
and the median of the delays per group is not close, so the distribution is not sym-
metric. It should be pointed out that the computed variances here are underesti-
mated because the autocorrelations of the observed delays are ignored. Since the
coefficient of variance is not equal to 1, the distribution is not exponential; possibly,
it is Gamma distribution.
4.2.3. Histogram of delays
The histograms of some delay data are shown in Figs. 7–9.
4.2.4. Probability distributions
None of the three histograms in Section 4.2.3 is symmetric, despite the fact that
symmetric normal input distributions are widely used in many simulation applica-
tions. The distribution appropriate for each channel seems to be the Gamma or
Table 1
Some statistics for the delays
Min. Median Mean Max. Variance Coefficient
of variance
Channel from
RS 1 to LB
6.869e�05 5.075e�04 5.764e�04 2.361e�03 9.373e�08 5.31e�01
Channel from
clients to LB
9.839e�06 1.979e�04 4.461e�04 2.555e�03 2.703e�07 1.17e+00
Channel from
LB to RS
1.570e�07 9.347e�05 1.429e�04 9.126e�04 1.783e�08 9.34e�01
Channel from
LB to clients
9.867e�05 2.497e�04 3.439e�04 2.522e�03 1.030e�07 9.33e�01
LB 5.580e�07 9.400e�07 9.635e�07 6.436e�05 1.195e�12 1.13e+00
RS 1 3.372e�06 2.252e�04 8.880e�02 2.999e+00 0.228 5.38e+00
Fig. 7. Histogram of the delays in the channel from RS 1 to LB.
Fig. 8. Histogram of the delays in LB.
Fig. 9. Histogram of the delays in RS 1.
J. Yang et al. / Simulation Modelling Practice and Theory 14 (2006) 188–200 195
Lognormal distribution; see Fig. 7. But there are no good fitting distributions [10] for
the load balance and the real servers of Figs. 8, 9 and Table 1.
196 J. Yang et al. / Simulation Modelling Practice and Theory 14 (2006) 188–200
Both the Gamma and the Lognormal distributions require two arguments,
namely, the shape parameter and the scale parameter. The relationship among mean,
variance, shape parameter, and scale parameter is nonlinear. For the channel from
real server 1 to the load balance, two methods can be used to get the parameters
of the distributions, namely, the Maximum Likelihood Estimator (MLE) [10] andthe Steepest Descent Method [11], which always gives the solution of the nonlinear
equations and is not so dependent on initial values as Newton–Raphson iterative
method. These parameters are shown in Table 2.
Some plots with the shape and the scale parameters of Table 2, such as the
density plot, the Q–Q plot and CDF, can be formed and used to determine
which distribution (either the Gamma or the Lognormal distribution) best fits
empirical data for the channel from real server 1 to the load balance. For example,
the Gamma CDF with the parameters of Table 2 and the empirical distributionare plotted in Fig. 10 using the R tool. The Q–Q plots of the Gamma and Lognor-
mal distributions with the parameters of Table 2 are shown in Figs. 11 and 12,
separately.
It is clear that the distribution of the channels must be the Gamma distribution, so
the Lognormal distribution is rejected.
Table 2
Estimated parameters of the distribution for the channel from RS 1 to LB
MLE Steepest Descent Method
Gamma Shape parameter 3.96 3.85259
Scale parameter 1.456e�04 1.49618576e�004
Lognormal Shape parameter 1.06525717e�007 6.44682462e�006Scale parameter �7.58917376 �6.93553198
Fig. 10. Plot of the Gamma CDF and the empirical distribution.
Fig. 11. Q–Q plot of the Gamma distribution.
Fig. 12. Q–Q plot of the Lognormal distribution.
J. Yang et al. / Simulation Modelling Practice and Theory 14 (2006) 188–200 197
5. Simulation with Automod and validation
After the input modeling and the system�s conceptual modeling are finished, the
simulation model is programmed in Automod, as shown in Fig. 13, where R stands
for Resource, Q Queue, and LG Load Generator (i.e. clients).
In Automod, the uniform distribution is selected for the load generator in the pro-
cess procedure of ‘‘channel_LG_LB’’. The load generator generates the packets that
suffer from delays before arriving at the load balancer. In the queue of the load bal-ancer, bi-direction is allowed using an attribute variable. The Round Robin strategy
is chosen for delivering packets to one of the real servers in the load balancer. The
˚
˚
˚
˚R_RS(5)
˚R_RS(4)
˚R_RS(3)
Q_RS(2)
Q_RS(4)
Q_RS(3)
Q_RS(5)
Q_RS(1)
˚R_RS(2)
R_RS(1)
˚
R_Channel_RS_LBQ_Channel_RS_LB
R_Channel_LB_RSQ_Channel_LB_RS
Q_LB
˚
R_LB
R_Channel_LG_LBQ_Channel_LG_LB
˚R_Channel_ B_LGL
Q_Channel_LB_LG
Fig. 13. Model in Automod.
198 J. Yang et al. / Simulation Modelling Practice and Theory 14 (2006) 188–200
validation of the simulation model is carried out by comparing the running system
and the model for the following outputs:
• The current packets in the queues of the real servers, the load balance andchannels.
• The CPU utilization of the real servers.
For example, using the observed real delay data and setting a lower resource
capacity to the real servers in Automod, most of the loads are found to be blocked
in the queues of real servers; the related plot of the current packets in the queues of
five real servers is shown in Fig. 14. This means that the process capability of the realservers is much less than the real running system because there are almost no loads
to be blocked there. If the resource capacity is increased, which means more loads
can be processed in parallel and there is higher utilization in the real servers, no
loads will be blocked in the queue. Therefore, the capacity should be adjusted
according to the real cluster system.
Fig. 14. Current packets in the queues of five real servers with a lower resource capacity.
Fig. 15. Average utilizations of five real servers with half a mean.
J. Yang et al. / Simulation Modelling Practice and Theory 14 (2006) 188–200 199
When the load generation rate is increased, the maximum process capability and
the bottleneck of the system can be found by inspecting the current packets in the
queues. For example, if the rate is doubled (i.e. the uniform distribution�s mean
should be half the original one), the system can still process all the arriving packages.
However, the average utilizations of the real servers are almost 80%, as shown in Fig.
15. In this case, one new real server should possibly be added.
6. Conclusions and future work
The experimental platform is based on a cluster Web server consisting of five real
servers. The service times of the system, including the dispatcher and the servers, are
measured and used to estimate the input data for the simulation. The conceptual
models of the load balancer, the real servers and the transport channels are derived.
The performance simulation of the cluster Web server is done step by step using the
input modeling technique, random distribution theory and output analysis, which isa necessary aid for cost-effectiveness in the Internet development. The maximum
process capability and the bottleneck of the system can be found by increasing the
load generation rate and inspecting the current packets in the queues. The simulation
result shows that the number of real servers possibly affects the performance of the
real system when more packets are dealt with. These methods can also be applied to
the simulation and analysis in other fields.
Some future refinements of the simulation provide more choices for the schedul-
ing and balancing strategies in the load balancer, and comparing the models for thedifferent virtual servers. The autocorrelation of observed delays in the system may be
dealt with by making some replications of the simulation.
References
[1] T. Schroeder, S. Goddard, B. Ramamurthy, Scalable Web server clustering technologies, IEEE
Network 14 (3) (2000) 38–45.
[2] A. Iyengar, J. Challenger, P. Dantzig, High-performance Web site design techniques, IEEE Internet
Computing 4 (2) (2000) 17–26.
200 J. Yang et al. / Simulation Modelling Practice and Theory 14 (2006) 188–200
[3] M.T. Yong, R. Ayani, Comparison of load balancing strategies on cluster-based Web servers,
Simulation 77 (5–6) (2001) 185–195.
[4] V. Cardellini, M. Colajanni, P.S. Yu, Dynamic load balancing on Web-server systems, IEEE Internet
Computing 3 (3) (1999) 28–39.
[5] H. Bryhni, E. Klovning, O. Kure, A comparison of load balancing on Web-server systems, IEEE
Network 14 (4) (2000) 58–63.
[6] J. Nikoukaran, Software selection for simulation in manufacturing: a review, Simulation Practice and
Theory 7 (1) (1999) 1–14.
[7] R Development Core Team, R Manual (Jan. 2002). <http://www.r-project.org/>.
[8] A. Chepurko, Instrumenting a Cluster-based Web Server for Performance Measuring, Master Thesis,
Erlangen-Nuremberg University, 2002.
[9] D.D. Bovet, M. Ceasti, Understanding the Linus Kernel, O�Reilly & Associates, Inc., 2000.
[10] A.M. Law, W.D. Kelton, Simulation Modeling and Analysis, third ed., McGraw-Hill Inc., 2000.
[11] R.L. Burden, J.D. Faires, Numerical Analysis, seventh ed., Addison-Wesley Inc., 2000.