albl, albl/hsc algorithms: towards more scalable, more adaptive and fully utilized balancing systems
TRANSCRIPT
-
8/7/2019 ALBL, ALBL/HSC algorithms: Towards more scalable, more adaptive and fully utilized balancing systems
1/15
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 1
ALBL, ALBL/HSC algorithms: Towards morescalable, more adaptive and fully utilized
balancing systemsSotirios Kontogiannis, Stavros Valsamidis, and Alexandros Karakos
AbstractThis paper presents the performance characteristics of a non-content aware load balancing algorithm, ALBL, forcluster-based web systems. Based on ALBL potential a new content aware load balancing algorithm is also introduced; called
ALBL/HSC. This algorithm maintains classes of HTTP traffic based on content of HTTP requests.Then it uses ALBL algorithm to
load balance requests per class accordingly. That is, ALBL/HSC maintains separate ALBL processes per HTTP service class
for balancing traffic among web servers assigned to each class.
Performance and scalability gains are shown from tests of ALBL against known balancing algorithms used by web-farms, such
as: Round Robin, Least Connections and Least Loaded. Then, ALBL/HSC algorithm is put to the test against ALBL over a
cluster based load balancing system. Moreover, CPU performance tests performed at the web switch and performance tests at
the web servers indicate both positive features and drawbacks of content aware ALBL/HSC and adaptive content blind ALBL.
This paper also presents new features that can be installed to content aware algorithms. That is, clustering prediction,
bandwidth and web farm utilization estimation and sharing mechanisms among classes of HTTP traffic.
Index TermsDistributed web systems, load balancing, load balancing algorithms, web clusters.
1 INTRODUCTIONhe growth of web based applications and adoption of
services over HTTP, leads to the building of moreefficient web servers and implementation of differentscheduling policies at requests. The emerging web 2.0technologies and tools, followed by Google developmenttoolkit, api's and many social (facebook, twitter), e-health,e-learning (Learning Management Systems) and e-government (electronic elections, citizen services) webservices, are mature indications that the WWW and inturn HTTP is definitely becoming the most excessive formof traffic for routers, that shall geometrically increase webserver processing efforts over the following years.
Focusing on augmentation of web server performance,distributed web architectures consisting of web serverfarms were introduced as a unified structure and loadbalancing architectures were deployed. Such distributedweb architectures are classified to: 1. Distributed websystems, where a number of web servers that sustain dif-ferent types of web content or services, interact with cli-ents. 2. Virtual or Geographically located web clusters,
where a number of web servers exchange periodically a
virtual IP address that the clients use to connect to, orservice in a static manner, different types of clients basedon IP address geographical location and 3. Cluster-basedweb systems (or web clusters), where a centralized pointof connection exists, for serving incoming HTTP requests;called web switch [1]. In this paper we focus on cluster-based web systems. This does not mean that the algo-rithms that are presented in this paper can't be used byother distributing architectures. A short analysis of clus-ter based web systems follows.
A cluster-based web system is comprised of a farm of
web servers joined together as a single unity. These serv-
ers are interconnected and oppose a single system image.A cluster-based web system is advertised with a single
site name and a virtual IP address (VIP). The front end
node of a cluster-based web system is the web switch. The
web switch receives all in-bound connections that clients
send to the VIP address and routes them to the web
server nodes. A cluster-based web system design is com-
prised of two basic components; the balancing process, that
selects the best suited target servers to respond to re-
quests and the routing mechanism, that redirects clients to
appropriate target servers [1, 2].
Cluster-based systems routing mechanisms are dividedinto content aware (layer 7) and non-content aware (layer
4). Analysis of layer 7 and layer 4 routing mechanisms
follows.
S. Kontogiannis is with the Electical and Computer Eng. Department,Democritus University of Thrace, University Campus Kimeria, Xanthi,67100, Greece.
S. Valsamidis. is with the Electical and Computer Eng. Department, De-mocritus University of Thrace,University Campus Kimeria, Xanthi, 67100,
Greece. A. Karakos is with the Electical and Computer Eng. Department, Democri-
tus University of Thrace, University Campus Kimeria, Xanthi, 67100,Greece.
T
-
8/7/2019 ALBL, ALBL/HSC algorithms: Towards more scalable, more adaptive and fully utilized balancing systems
2/15
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 2
Balancing systems that use layer 7 routing are mainly
proxy servers or application gateway systems [3, 4, 5].
Such systems redirect requests at application layer caus-
ing requests to traverse the entire protocol stack and thus
limit their performance potential as number of requests
increase. Usually such systems try to moderate degrada-
tion of cluster performance by limiting the number ofusers that request service, or with the use of caching tech-
niques at the web switch. Other implementations of layer
7 routing use kernel space request dispatching techniques
instead of user space request dispatching at the web
switch. Kernel dispatching mechanisms are: TCP splicing
[6, 7, 8], TCP handoff [9], TCP binding [10] and TCP re-
building, a TCP connection transfer mechanism [11]. In
conclusion, there is still lack of content management
mechanisms that shall enable efficient content request
routing.
Layer 4 routing mechanisms redirect connectionsbased on less sophisticated balancing algorithms, whichare unaware of session or application layer attributes.Routing mechanisms used in a non content aware webswitch are the following: Distributed packet rewriting(Direct Routing) [14], IP network address translation,Packet tunnelling and Link layer packet forwarding (alsoreferred to as MAC address translation) [15, 2].
2 LOAD BALANCING ALGORITHMS FOR CLUSTERBASED SYSTEMS
We categorize load balancing algorithms used at a webswitch into five distinct categories: 1. Stateless non-
adaptive, 2. stateful non-adaptive, 3. stateless adaptive, 4.
stateful adaptive and 5. content aware. As stateful/stateless
algorithms, we characterize those algorithms that keep
track or not of client connection requests. As adaptive/non-
adaptive, we characterize those algorithms that take into
account web server status metrics feedback and adapt
their behaviour based on metric transitions accordingly,
while content aware algorithms extend adaptive algorithm
potential by also investigating HTTP request header in-
formation and HTTP request payload size (contentlength) for balancing decisions.
Stateless non-adaptive algorithms do not consider any
kind of system state information. Typical examples of
such algorithms are Random and Round Robin. Both
Random and Round Robin policies can be easily extent to
treat web servers of heterogeneous capacities that remain
constant through time [15, 2, 1]. In addition, Weighted
Round Robin (WRR) can be used as a stateless balancing
algorithm on web servers with different but known proc-
essing capacities.
Stateful non-adaptive algorithms keep track of clientconnections at the web switch. Main representatives of
this category are Least Connections, Weighted Least
Connections (LC-WLC) algorithms [16, 17]. Moreover,
Shortest Expected Delay (SED) and Never Queue sched-
uling algorithms use similar approach to LC algorithm
and assign client connections to the web server with the
shortest expected delay [18].
Stateless adaptive balancing algorithms take into ac-
count web server state metrics but do not keep track of
connection state information. These algorithms use moni-tor agents running on either the web switch or web serv-
ers. The information retrieved from agent lookups is
taken into account in order to determine balancing
weights. Some commonly used metrics are: CPU load,
memory usage, disk usage, current process number and
ICMP request-reply time. An example policy algorithm
called CLBVM (Central Load Balancing for Virtual Ma-
chines) is presented at [19]. In some other cases, web
server metric values are stored by SNMP agents at the
web servers and retrieved by the web switch SNMP man-
ager [20]. Another load balancing algorithm-protocol isthe Openflow protocol that utilizes OpenFlow capable
switches and NOX controllers. Openflow uses a new al-
gorithm called LOBUS (LOad Balancing over UnStruc-
tured networks), presented at [21], that keeps track of sta-
tistic information from Openflow capable switches in or-
der to allocate and commit network resources for each
web server in an unstructured web balancing network.
Adaptive algorithms weight calculation formulas are
based on augmentation of web server metric values fol-
lowed by normalization process. That is, a linear aggre-
gate in terms of summation of metric values load value
as: =
=n
i
k
iiload SMKAgg i1
,where:
=
+=n
i
load
load
i
i
i
Agg
AggWW
1
01
, is
the aggregate load calculated weight per web server andk
iSM are the web server metrics used [2, 1, 16]. Moreover,
also non linear weight calculation processes may be ap-
plicable, such as the following formula used by Linux
load balancer to calculate web server
weights:
3
0
1 iloadi
Agg
WW
= [22, 23, 17].
System components that offer load predictions based
on system (web server) resource metrics are called load
trackers and systems that try to predict web servers be-
haviour based on load tracker results are called load pre-
dictors. Usually, both trackers and predictors do not op-
erate on the same system due to performance issues. Load
trackers are divided into: 1. Linear, such as simple mov-
ing average trackers, as previously mentioned formu-
las:n
M
tSnSMAj
j
i
=))(( , or exponential moving aver-
age: ))(()1())(( 1+= innin tSEMAaSatSEMA , or simple
moving median (SMA). Autoregressive models (AR) are
also considered linear tracker functions, since they use
-
8/7/2019 ALBL, ALBL/HSC algorithms: Towards more scalable, more adaptive and fully utilized balancing systems
3/15
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 8, AUGUST 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 3
linear functions to calculate weights from metric values.
AR model is a linear combination of the past k resource
metric values represented as a vector. AR load tracker
weight value over time is calculated as:
)(...))((11
teSaSatSARkii tktik
+++=
, where e(t) is a
distribution sequence of difference or deviation of metric
values, called residuals sequence. In addition, ARIMA
models (Autoregressive Integrated Moving Average) [24,
25] are obtained by the AR model and the moving aver-
age model as a linear combination of the past metric val-
ues and q noise terms of the calculated metric values.
Based on tracked data measurements over a time period,
residual and differential measured data values are calcu-
lated and weight value is predicted. 2. Non-linear load
trackers, such as cubic spline function and two sided
Quartile-Weighted Median (QWM) [24, 25] and previ-
ously mentioned linux local director weight calculation
formula [22], can also be used by web servers for load
prediction and in some cases perform better than linear
trackers do (more responsive to load incidents). Accord-
ing to [25], AR and ARIMA are inadequate to support
run-time decision systems in cases of highly variable
work case scenarios.
Stateful adaptive algorithms use both adaptive algo-
rithm metrics and client flow state information such as:
Number of connections or ratio of connections that a
server has received to the average connections received at
a specific time interval [23, 17], source or destination IP
address (locality aware based [26, 16]). The predictive
probabilistic load balancing algorithm (PPLB) uses adap-
tive weights based on a utility function that follows the
difference and deviation in predicted average and meas-
ured web server response time: )),(( StSF and the avail-
able processing capacity (remaining utilization capability
Ui of each web server: ))()(1( tFtUW ii += .PPLB algo-
rithm then uses a scheduling policy called Probabilistic
Preferred Short Job (PPSJ) that uses several classes of web
traffic at the web switch. This policy gives precedence to
HTTP requests that belong to a class with relativelyshorter service time and a large number of requests in its
queue waiting for service [27]. Other stateful adaptive
implementations include MALD, which uses agents that
instantiate at the web server, in order to inform the sys-
tem about the web-server's load based on an index that
takes web switch connections per web server into account
[28]. Finally, simulated annealing load balancing algo-
rithm (SA), uses an energy function that scores each one
web server based on the following metrics: request rate,
request processing rate, web server processing capability
and average waiting time of each request in the webserver queue. This algorithm also uses penalty thresholds
and penalty drops [29].
Content aware load balancing algorithms are used both
by layer-7 cache clusters and layer-4 web switches. Layer-
7 switching was implemented in the kernel of operating
systems (Linux layer-7 filtering [12, 13]), thus improving
the overall performance of previous layer-7 balancing
algorithms. OS kernel layer 7 filtering capability lead to
the development of new balancing algorithms based onpacket content and routing based on priority service. Bal-
ancing algorithms based on layer-7 filtering analyze
HTTP headers of requests from clients and adopt blind or
cached dispatching policies [30, 31]. Characteristic exam-
ples of content aware dispatching policies follow.
Workload Aware Request Distribution policy (WARD)
[32] assigns the most common HTTP requests to the same
server, while partitioning the rest of the requests to the
same web servers for the same types of requests. More-
over, LARD (Locality Aware Request Distribution) im-
proves cache hit rate in back end server by serving thesame requests to the same servers [33, 34], while Client-
Aware Policy (CAP) tries to reach load balance by provid-
ing multiple classes of service at the web switch [35].
CWARD/CR, CWARD/FR [33] policies also take into ac-
count web server workload and provide content based
prioritized classification for HTTP requests.
The combination of content-aware load balancing and
HTTP service classification was introduced to meet the
demands of complex HTTP requests that combine dy-
namic web pages, database transactions, multimedia and
real-time services. Such mechanisms provide more effi-cient load balancing for web services not only per flow
but also per service request. LARD/RC, distributes re-
quests based on a requests table incorporated at the web
switch, which assigns requests of the same type to one
cluster of web servers responsible for serving each re-
quest type. It also uses a WLC scheduling policy for the
assignment of requests [36]. GCAP policy groups re-
quests based on CAP (Content aware request distribution
policies) and then assigns in a WRR fashion requests of
the same CAP class for each web server [36]. Moreover
PPLB algorithm classifies web requests into differentclasses based on their service demands for resources at
the web server [27].
3 PROPOSED ALGORITHMSWe designed and implemented a stateful adaptive load
balancing algorithm, called Adopt Load BaLancer (ALBL).
ALBL tries to predict congestive network conditions and
web server load incidents in order to perform its balanc-
ing decisions. Preliminary versions of the algorithm were
presented at [37, 38] We also present a new content aware
load balancing algorithm ALBL/HSC (Adopt Load BaL-
ancer with Hierarchical Service Classes) that maintains a
classification discipline per HTTP service type. Then it
-
8/7/2019 ALBL, ALBL/HSC algorithms: Towards more scalable, more adaptive and fully utilized balancing systems
4/15
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 4
uses ALBL algorithm for balancing HTTP requests per
service class.
3.1ALBL AlgorithmALBL algorithm uses agents that probe web servers of a
web server farm periodically. Metric values derived from
this probing process are used for weight calculation andtherefore for balancing decisions at the web switch. We
implemented a cluster based web system that uses Linux
netfiler kernel api [13] for marking and routing client
HTTP requests to a pool of web servers. A Linux kernel
WRR scheduler uses the calculated weights in order to
forward forthcoming requests at the web switch, using
Direct Routing or NAT to the selected web server. Then
the web server responds to the request and communicates
directly to the client (Direct Routing case) or via web-
switch (NAT case). Direct Routing capability was imple-
mented in the latest version of our algorithm, that untilrecently supported only NAT routing capabilities (the
whole HTTP flow passed through the web switch). We
also altered the algorithm's behavior towards HTTP flows
that are under service. Such flows are marked as ser-
viced flows and are extracted from the forwarding proc-
ess to another web server, if the selected balancing web
server changes.
Two metrics are used by ALBL agents to adjust web
server weights.These metrics are: HTTP response time and
network delay metric. Metric values are periodically up-
dated at a predefined interval period Tp by agents. HTTPresponse time metric is calculated as follows: The web
switch sends an HTTP request for a predefined object or
process to each one of the web servers and waits for a
reply. Then calculation of the time that the request was
sent and the time that the web server FINs the request
occurs. The time difference between request and FIN re-
ply is equal to the metric value. HTTP response time met-
ric is equal to the sum of network propagation delay,
network queuing delay and web server processing delay.
That is: ocnopresp QQQHTTP PrPr ++= , where propaga-
tion delay opQPr follows the assumption that is the equal
for all web servers, queuing delay:nQ is the sum of all
the delays encountered by a packet from the time of its
insertion into the network until the delivery to its destina-
tion and processing delay: ocQPr , is the time needed for a
web server scheduled thread to process a request and
construct an appropriate reply message. Processing delay
depends on web server CPU load index and web service
response capability.
Network delay metric calculation is achieved by a
minimum size TCP SYN packet constructed at the web
switch with source the IP address of the web switch and
destination the IP address of each one of the web servers.
Agents responsible for network delay calculate the re-
sponse time between TCP SYN request and SYN|ACK
reply (TCP half connection). This time difference is an
approximation of propagation and queuing delay at the
network path between the web switch and the web
server. Network delay metric is a good approximation of
network delay of a small size packet that traverses thenetwork path from the web switch to the web server, if
the web server is not overloaded. That is:
inopdelay RTTQQN =+= Pr , where iRTT is the round trip
time of SYN-SYN|ACK PDU. An appropriate probing
timeout is set for SYN queuing. The timeout value can be
modified according to network topology and the maxi-
mum expected latency of the system. If timeout occurs
then that web server is removed from the balancing proc-
ess until the next balancing period Tp.
In order to discriminate among web server CPU load
and network congestion incidents, a more of a static
threshold value is used by the algorithm as follows: If at
least one web server has HTTP response time
(iresp
HTTP ) less than the fixed threshold value, the algo-
rithm concludes that this is due to transient web server
load and network congestion incidents are set to be of less
significance. So weight calculation is performed using the
product of HTTP response time (iresp
HTTP ) with net-
work delay metric (indelay ), as in (1), where ik is a pa-
rameter that depends on the requested object size overnetwork MSS value. If network delay metric value is
smaller than HTTP response time metric value, (1) does
not provide adequate responsiveness towards network
congestion incidents, but can fairly estimate
web server load status.
=
=
n
i iiresp
iresp
i
ndelaykHTTP
ndelayHTTPW
i
i
1
1
1
(1)
If all web servers have HTTP response time metricvalue greater than threshold value, then the algorithm
assumes that such bottleneck is mainly caused by net-
work congestion incidents due to burst client requests
followed by persistent load occurrences at the web server.
So a more sensitive approach to RTT variations, a non
linear (exponential) approach is used for weight calcula-
tion (use of (2)).
=
=
n
i iirespi
iirespi
i
ndelaylHTTPndelay
ndelaylHTTPndelayW
i
i
12
2
)(
1
)(
1
(2)
As we can spot from (2), the processing delay metric for
each web server is estimated by the subtraction ofil
-
8/7/2019 ALBL, ALBL/HSC algorithms: Towards more scalable, more adaptive and fully utilized balancing systems
5/15
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 8, AUGUST 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 5
times the network delay metric from the HTTP response
time metric, where il is the number of packets transmit-
ted and received by the HTTP response time agent (for
downloading a specific object from the web server). Then
the product of the processing delay metric to the network
delay metric is used for weight calculation. The weightvalue derived from (2) increases exponentially propor-
tional to the network delay deviation and can spot con-
gestion incidents, because web server processing time
metric values (i
TPr
), are closer to the network delay met-
ric values than the HTTP response time metric values are
(close to real processing time). That indicates that (2) is
more responsive for conditions where a web server has
less computational load than link network delays or link
network delays forebode persistent load incidents at the
end nodes.
Concluding, (1) does not provide adequate respon-siveness towards network conditions, but can fairly esti-
mate load conditions. Alternatively, from (2), a processing
delay metric value for each web server request is ex-
tracted from the HTTP response time metric. The product
of the processing delay metric to the square of the net-
work delay metric is more responsive to network delay
variations and can spot congestion incidents. ALBL algo-
rithm smoothness or responsiveness is proportional to the
periodic probing frequency Tp. If gp TT 2 (50ms
gT 100ms, is web-switch clock granularity), then
web-switch computational effort increases dramatically
while if gp TT 20 , then ALBL cannot spot or compen-
sate for short duration congestion or load incidents.
3.2ALBL/HSC Algori thmALBL/HSC algorithm is a content aware extension of the
adaptive ALBL algorithm. This algorithm includes sepa-
rate ALBL algorithm processes into five classes of HTTP
requests of different service type as mentioned in section
3.3. Classes maintained by ALBL/HSC include the follow-
ing: Class 1: static and lightly dynamic HTTP traffic(Normal HTTP traffic), class 2: NCQ HTTP traffic, class 3:
Max throughput HTTP traffic, class 4: Multimedia traffic
and class 5: SSL traffic. HTTP requests are classified into
each one of the previous classes based on HTTP request
URI extension (for multimedia traffic, normal traffic),
HTTP reply content length (NCQ traffic, max throughput
traffic) and request protocol (SSL traffic). A short analysis
of the classification and discipline methodology used per
HTTP service class follows.
3.3ALBL/HSC service classesWeb services maintain different characteristics in terms of
bandwidth usage and web server utilization. Web ser-
vices include real-time, interactive, multimedia services,
services that require secure communication channels,
video on demand, file transfer services, gaming, confer-
ence broadcast services, informatory services (blogs, fo-
rums, wikis, rings, etc) and other types of service.
HTTP traffic pattern is based on a request-reply
mechanism, where the request is usually a small packet
no more than 512 bytes and the reply is a set of PUSHpackets that its size depends on the web service that is
requested (multimedia data, file transfer, news HTML
report, script that executes either on client or server). Ac-
knowledgments follow after the PUSH TCP packets at the
end of the reply (if the reply is of 2RTO time or at k RTT
time intervals). There is of course the case of burst UDP
packet ACK replies (for multimedia type of traffic). In
conclusion, response time for small size HTTP packets are
constrained by TCP slow start mechanism and large
packets and thus present bigger throughput deviation
(RTT deviation), while large TCP packets are favouredmore by TCP congestion avoidance mechanism and
achieve much more constant throughput (smaller RTT
deviations as packet size increases)[39].
Distinct patterns of web services utilization efforts
(CPU-memory or BW resources), lead to the demand of
web services classification, performed either at the end-
point web server or by the intermediate routers. Web ser-
vices classification attempts are presented in [35, 40, 27,
30] with formalised categories such as web publishing
(static and lightly dynamic services), web transaction
(disk bounded services), web commerce and web multi-media services. Another web service classification at-
tempt is also presented in [41], as issuance, affair, dy-
namic security and multimedia types of service.
Taking into account the previous web service catego-
ries, we take the assumptions one step further; In order to
preserve and cope with the distinct characteristics of web
services such as real-time interactive and multimedia, to
provide a separate channel for web cryptographic ser-
vices (close to an anonymous channel) and by taking into
consideration the aforementioned studies, ALBL/HSC
disciplines HTTP traffic into the following categories(service classes):
Normal HTTP Traffic: This type of traffic is provided
either by (a) static or (b) light dynamic content informa-
tion, requested by clients. Discrimination among static
and dynamic content that may consume web server re-
sources are left to web server metrics. Dynamic content
data may over-utilize web server resources more than
static content data, but both categories (a) and (b) present
similar characteristics from the network's point of view,
but not for the web server. Further classification may be
required and is set as a future study.Non Congestive Real-time traffic: This type of traffic
includes flows that use small packets in terms of size,
whose priority is predetermined by astatic NCQ thresh-
-
8/7/2019 ALBL, ALBL/HSC algorithms: Towards more scalable, more adaptive and fully utilized balancing systems
6/15
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 6
old rate [42, 43]. These HTTP flows must experience
minimum network delay and high priority precedence.
From this type of traffic TCP acknowledgments that do
not belong to the aforementioned HTTP flows are ex-
cluded, so as not to drain the NCQ mechanism of instant
service for small packets (NCQ threshold is set to more
than 75 and less than 450Bytes).Maximum Throughput Traffic: Maximum throughput or
congestive interactive HTTP traffic flows contain packet
sizes near network MTU size and usually such flows op-
erate in bursts. Typical examples of this category of HTTP
traffic are: HTTP download chunks, P2P application traf-
fic over HTTP and huge data size up-link HTTP POSTS.
Congestive Multimedia Traffic: Multimedia Traffic is
HTTP or non-HTTP streaming video and audio traffic,
instantiated by HTTP requests. Non-HTTP multimedia
traffic is mostly carried out by UDP flows or UDP encap-
sulated packets over HTTP. Such kind of traffic must betaken into consideration as it can degrade web server per-
formance due to its persistent nature.
Secure HTTP Traffic: This type of HTTP traffic is pro-
vided by the SSL-TLS protocol suite by secure web ser-
vices such as web commerce. This type of traffic makes
intensive use of web servers CPU resources. In our case a
separate service class was assigned for SSL traffic in order
to maintain anonymous channel attributes.
3.4ALBL/HSC algorithm and classification schemaALBL/HSC discipline of the five distinct service classes isaccomplished with a set of layer-7 filter rules [12] per each
class at the web switch. In addition, for each class a farm
of at least two web servers is used for serving requests, as
depicted at Fig. 1. We implemented ALBL/HSC mecha-
nism into Linux kernel by using Linux kernel layer-7 [13,
12] filter mechanism for the packet filtering and marking
process and Linux traffic control mechanism for the clas-
sification and queuing process [44].
Each packet entering the web switch is marked by a
layer-7 filter and leads to one of the WWW traffic classes
where a queuing discipline is maintained that supportsmultiple drop precedence levels. We markHTTP traffic
following AF (Assured Forwarding) marking schema
used by CISCO routers. Assured Forwarding (AF) pro-
vides forwarding of IP packets in N independent classes.
Within each AF class, an IP packet is assigned one of M
different levels of drop precedence. An IP packet that be-
longs to an AF class i and has drop precedence j is
marked with the AF codepoint ijAF , where Ni1
and Mj1 .
Currently, AF supports only four classes (N=4) with
three levels of drop precedence in each class (M=3) for
general use, where AFx1 yields the lower loss probability
and AFx2 and AFx3 yield the higher loss probabilities
accordingly. Each IP packet enters a specific AFxy class,
based on the IP packet DS (Differentiated Services) field,
or DSCP value (Differentiated Services Code Point). That
is the 6 leftmost bits of the IP ToS field, excluding the two
rightmost bits (LSB) used by the ECN (Explicit Conges-
tion Notification) mechanism (see Table 1).
For each service class we use GRED [45], a generalized
RED [46, 47] queuing discipline that supports multiple
drop priorities (16 virtual queues of independent or pri-
oritized drop probability parameter) as required for As-
sured Forwarding [44, 48]. GRED functionality is similar
to RED queuing discipline where the queuing average
packet value qave sets each queue to non drop, or prob-
abilistic drop state; if qave exceeds thressholdmin and full
drop if qave exceedsthressholdmax [46]. GRED also offers a
priority mechanism for its 16 virtual queues called GRIO
mechanism that operates similarly to PRIO discipline
mechanism. That is, for VQ1:1VQ
qaveqave= , for
VQ2:21 VQVQ qaveqaveqave += and so forth for all 16
queues. ALBL/HSC implementation uses the non GRIO
version of GRED discipline. GRED marks the 4 least sig-
nificant bits of a field called tc_index field that is attached
to the packet sk_buffbuffer as it traverses through the web
switch (for each packet entering or leaving the web switcha buffer space is allocated called sk_buffstructure and ini-
tialized to the header and data values of the packet). The
sk_buffstructure also includes an additional field for op-
erations required by GRED, DSMARK and INGRESS
queuing disciplines, the tc_index field. Another queuing
discipline called DSMARK is responsible for copying the
DS field from the packet ToS IP header to tc_index value.
ALBL/HSC classification functionality is as follows:
First each packet entering the web switch is marked with
a set of layer-7 rules, using generic AF marking AF1-4
(See Table 1), where AF1 is for multimedia class traffic,AF2 for NCQ traffic, AF3 for normal HTTP traffic, AF4
for max throughput traffic and EF for SSL traffic. EF class
does not use a RED queue, but a TBF queuing discipline
Fig. 1. ALBL/HSC high level architecture design.
-
8/7/2019 ALBL, ALBL/HSC algorithms: Towards more scalable, more adaptive and fully utilized balancing systems
7/15
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 8, AUGUST 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 7
with specified by the administrator max latency value
and higher priority that the entire GRED discipline. That
is, the bandwidth delay product of SSL traffic is assured
by provisions made by network administrators. In order
to initialize each one of the remaining four classes, some
parameters need to be set by the network administrator
that correspond to the % expected bandwidth share (%BW, fraction of the link bandwidth-BW) that each class
utilizes and the maximum desired latency that a flow
may suffer passing through each class. These values need
to be set by the administrator so that the thressholdmin and
thressholdmax values can be calculated from equations (3)
and (4) [49, 45, 50]:
sec/1000/8
1%01.0
maxmsbytebit
avpktBWlatencyBWshare
thresshold
=
(3)
thressholdthresshold max3
1min =
(4)
where avpkt, is the average packet length as calculated
from HTTP traffic that traverses through the web switch.
For each one of the 4 classes we have three levels of
drop precedence (See Table 1). Initially all incoming con-
nections for a specific class enter via the first level of drop
precedence. Then for each class based on ALBL algorithm
metric values an additional exponential weighted movingaverage (EWMA) is calculated of network delay metric
values for all web servers belonging to a class using the
following equation:
=
+=n
i
ii ndelayn
andelayaNDP1
1)1()max(
(5)
where i=1..n is the number of web servers assigned to
each service class. This weight value is called Network
Delay Product (NDP) of all web servers in one class, a pa-
rameter is based on network delay measurements value
and is set from experimental results to a value of a=0.2. Ifresponsiveness towards transient network delays that
packets suffer traversing the network (network with loosy
links) needs to be increased, then this value is better set to
a=0.7. Based on the number of active connections for each
class at the web switch, ALBL/HSC algorithm calculates
the active Class Bandwidth Delay (CBD) product metric
value per each class (See Eq. (6)).
i
i
i
i
i
iiNDP
C
Thrb
C
ThrbCCBD += ))1((
(6)
where i=1..4 (4 service classes), NDPi the Network Delay
Product value per class,ii ThrThr, current and average
throughput for each class and Ci the number of active
connections sustained currently by each class. ab3
2= , is a
coefficient parameter for the CBD metric value calcula-
tion. Based on CBD metric value HTTP flows that belong
to class 1-4 may fall into each one of the class probabilistic
drop phases according to the following criteria:
1. Fall to probabilistic drop of AFx1, if qave is below
or exceeds thressholdmin and CBD value is less
than:),min(
%3
1
i
i
share
ndelaylatencyC
BWBW
2. Fall to probabilistic drop of AFx2, if qave exceeds
thressholdmin CBD value is less
than: ),min(%
3
2
i
i
sharendelaylatency
C
BWBW
3. Fall to probabilistic drop of AFx3, if qave exceeds
thressholdmin CBD value is more
than:),min(
%3
2
i
i
share
ndelaylatencyC
BWBW
3.5ALBL/HSC prediction of scalability and BWsharingALBL/HSC algorithm has also the capability to predict
scalability enhancements for each class based on the aver-
age connections per class and expected bandwidth share
increase or decrease for each class (these mechanisms
were not adapted into GRED but left as prediction met-
rics for the administrator. Their adaptation is considered
future work). We describe these prediction mechanisms
separately at the following sections.
TABLE1ASSURED FORWARDING PHB GROUP DSCP VALUES
Group
Class
Drop Class/ Drop
Probability
DSCP
AF1 1/0.02 0x28
AF1 2/0.03 0x30
AF1 3/0.07 0x38
AF2 1/0.005 0x48AF2 2/0.01 0x50
AF2 3/0.015 0x58
AF3 1/0.01 0x68
AF3 2/0.15 0x70
AF3 3/0.02 0x78
AF4 1/0.01 0x88
AF4 2/0.02 0x90
AF4 3/0.03 0x98
EF 0 0xb8
-
8/7/2019 ALBL, ALBL/HSC algorithms: Towards more scalable, more adaptive and fully utilized balancing systems
8/15
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 8
3.5.1 ALBL/HSC scalability scalability predictionWe compare scalability of an M/M/1 queuing system of 1
web server with an M/M/m system of m web servers.
Both systems use FIFO queues, utilize open queuing net-
works with random inter-arrival and service times (fol-
low non heavy tailed exponential distribution). We de-
note as , and , average inter-arrival and servicerates for M/M/1 and M/M/m systems accordingly, of infi-
nite queuing length. Poisson process and exponential
distribution is assumed and usually used for simulation
of request and servicing rates of HTTP traffic (Moreover,
also more exponential heavy tailed distributions imprint
more efficiently HTTP traffic characteristics and may be
applicable). This is the case for HTTP traffic created from
independent sessions that correspond to bursty flows
with large request spacing inside a session (inter-flow
interval)[39]. For the M/M/1 system we have the follow-
ing:
=1
qNand
)1(
=qW , where Nq is the number of
requests in the queue, Wq is the mean response time in
the queue and0
1 p==
, is the utilization fraction of
time the server is busy. Additionally, for the M/M/m sys-
tem we have:'
')('
=m
DN mq, where
0!
)( pm
m
mD
m
m
=
the all servers not available probability
and)'('
'
)('
= mDWmq
.
We make the following assumption that for both load
balancing systems of 1 and m web servers, the request
and service rates follow Poisson process distributions and
converge to the same average rate values: =and =.
Based on this assumption, for the M/M/1 and M/M/m
queuing systems we see that there is a relationship be-
tween the number of servicing web servers in a cluster
based web system and the rate of incoming HTTP re-
quests at the web switch. That is, depending on the num-
ber and rate of incoming requests scalability of a cluster
based system must converge to a specific number of web
server availability. In other words, if the HTTP request
rate decreases then clusters scaled up to a large number of
nodes (highly scalable) are not the bestbalancing solu-
tion in terms of performance. If we set as
q
k
WC = , then
)1(
=kC
and)1()1(
'+
=m
Ck
. There is a rela-
tionship betweenkC and 'kC :
)1(1
'
1 =
m
CC kk
(7)
From (7) we conclude that waiting time in the queue for an
M/M/m system can be calculated as waiting time in the queue
in terms of an M/M/1 system as follows:
)()1(
'2
m
q
q
qD
mW
WW
+
=
(8)
From (8) it is obvious that as the number of web serv-ers increases in a cluster based web system (meaning
scalability increase), waiting time in the queue of such
system decreases in comparison to a single web server
system. What is also obvious from (8) is that HTTP wait-
ing time in the queue of an M/M/m system is also highly
dependable on system utilization factor . Meaning as the
utilization of such a system decreases we may achieve the
same waiting times for HTTP requests with fewer web
servers m and as utilization increases we can keep con-
stant waiting times by increasing the number of web
servers the system utilizes accordingly. This is in fact truefor a highly scalable balancing algorithm that can scale up
to a large number of web servers without adding signifi-
cant performance drawbacks to the whole system.
Now let us assume that all m web servers of an
M/M/1 are fully busy with a factor 999.0)(,1)( = mm DD
and
q
qW
f1
= , we have:
+=
)1(999.0'
22 mff
q
q
.That leads us to
the following equation:
1),1
1
(1
2+=
qfm (9)From (9), it is obvious that there is a correlation between
scalability of a multi-server system with the queue wait-
ing frequencyqfof an M/M/1 system and parameter .
That is, scalability on an M/M/m system depends on cli-
ents request rate and servers response rate. Both request
and response rates can be calculated at the web-switch.
Prediction of an increase or decrease in scalability per
class of HTTP service is based on (9) (Scalability predic-
tion equation. According to (9), the minimum number of
web servers needed to service requests for a class, whicharrive at the web switch at a rate and receive service at a
rate can be predicted, if we calculate
. Our prediction
mechanism uses a calculated asdt
Cactive= , where
activeC the number of active connections for each class
and asdt
C waitTime_= , the number of active connections
that are left to TCP TIME_WAIT status. Based on the fol-
lowing equations and if , then from (9) we cancalculate the number of minimum web servers needed to
deliver efficiently HTTP requests and the deviation pa-
-
8/7/2019 ALBL, ALBL/HSC algorithms: Towards more scalable, more adaptive and fully utilized balancing systems
9/15
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 8, AUGUST 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 9
rameter according to the following for-
mula: )1
ln(i
ii
m
mu
+= , where i=1,..n, the dt intervals
whereim value is calculated. Then a prediction deviation
parameter of the number of web servers to scale2
n is
calculated: =
=n
i
in un 1
22 1 and the average deviation factor
2 is calculated using EWMA, with parameter =0.94over a series of k intervals of total duration T, where
=
=n
i
ic dtkT1
andck frequency parameter value that is
set by the administrator:
=
+=k
j
jj u1
22)1(
(10)
3.5.2 ALBL/HSC bandwidth sharing estimationBandwidth increase/decrease prediction estimation per
traffic class is based on the total number of connections
that serviced on each one of the three probabilistic drop
precedence states of each class and a ranking vector:
)3,2,1( rrrS= , where r1, r2, r3 the ranking set by ad-
ministrator, which corresponds to class with drop prece-
dence x1,x2,x3 (where: r1+r2+r3=1 and r1:0.7..0.9,
r2:0.25..0.07 and r3:0.05..0.03) and IV vector )0,0,1(=Iv .
Based on previous a bandwidth increase for a class at aBW fraction of:
iThrCC
CrCr
+
+
32
3322 $ of the expected class
bandwidth if:
3
2
1
3
2
1
321 ),,(
C
C
C
Iv
C
C
C
rrr
(11)and the BW fraction given to a class (1..4) shall be a frac-
tion of the minimum BW available and BW fraction givenfrom (11) ( ),min( _ fractionincreaseavailable BWBW ).
One class may be in bandwidth decrease phase only if
all connections for that class exist only in probabilistic
state AFx1 and at least one of the other classes ask for
bandwidth increase. BW decrease of a class if all connec-
tions are in probabilistic state AFx1 for a period Tp fol-
lows a multiplicative decrease schema:classclass BWBW
2
3' = .
Finally, if at least one class asks for bandwidth increase,
then the class with the most connections in probabilistic
state AFx1 will be selected to perform an out-bound BWdecrease that corresponds to the BW increase fraction.
4 TESTBED SCENARIOSFor the experimental scenarios, we used a web cluster of 2
and 5 web servers of equivalent processing power and
web content, connected to a 100Mbit web switch. Apache
version 2.2 is the web server software and the default pre-
fork model MPM is used as processing memory manage-
ment model by Apache for HTTP requests. We performed
cluster tests with httperf tool [51]. Web switch CPU oper-
ates at 2GHz with 1Gb of available memory. The operat-
ing system of the web switch is a custom Linux OS that
uses Linux netfilter [13, 12] and Traffic Control tool (TC)
for the queueing discipline process [44].
Experimental scenarios were performed into a cluster
of 24 Quad Core Pentium clients that send HTTP requests
(client farm) to the web servers that consist the clusters
web server farm. All web servers have the same web con-
tent and no content replication occurs. The generated
HTTP requests are controlled by a central client node,
assigned with the task of equal distributing HTTP request
workload among client nodes. Each one of these requests
tries to obtain random HTTP objects from the web serv-
ers. The submission rate step (contention increase step) of
the requests varies for each experiment accordingly. That
is, for each scenario, a fixed set of request rates per second
is maintained until a total number of 24,000 requests are
received and for each rate the following performance
metrics of the web cluster system are calculated:
-Average Throughput, in terms of average web cluster re-
sponse rate caused by an aggregate of HTTP client re-
quests.
That is the average throughput achieved by a bundle of
fixed k requests per second: =
=
k
n
k
k
n
iThr
k
n
n 11
1 , until n re-
quests are received.
-Average Response time: We calculate response time as the
total response time of an aggregate of k HTTP requests.
This value is then averaged until n HTTP requests are
serviced by the web-switch.
-Scalability Factor (SF): We define as Scalability Factor (SF)the percentage of HTTP traffic in terms of throughput our
cluster gained as we increased the number of web servers
that balance the clients from 2 to 5 web servers. It is calcu-
lated using the following equation:
),min(
maxmax
52
25
==
== =
ss
SS
rateThrThr
ThrThrSF , where
5=sThr , 2=sThr , is the average throughput per requestrate experiment, using 5 and 2 web servers accordingly.
-CPU usage: This is the CPU average load per minute of
each load balancing algorithm operation measured at theweb switch, as a percentage over system utilization dur-
ing the respective period.
-
8/7/2019 ALBL, ALBL/HSC algorithms: Towards more scalable, more adaptive and fully utilized balancing systems
10/15
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 10
In experimental scenarios I and II we investigate per-
formance and scalability of the following balancing algo-
rithms: Round Robin (RR), Least Connections (LC) and
Least Loaded (LL). Then we compare these results with
measurements of ALBL implementation. In experimental
scenario III, we investigate performance of ALBL/HSC
and ALBL, while also putting to the test CPU processingefforts set by both algorithms at the web switch.
4.1ALBL performance scenario IIn this scenario we put to the test the performance of RR,
LC, LLoad (LL) and ALBL balancing algorithms under a
case of a web server with limited network resources due
to congested network link paths. In order to achieve a
congested link, we shape the traffic that passes through
the first web server, using a Token Bucket Filter (TBF
queuing discipline) [48, 52, 13]. This queuing discipline
limits only the down-link bandwidth of the web server ata rate of 2 Mbit/s, with the use of a queue, which size is
expressed in 250ms average latency and a burst value of
125Kbytes. The other web servers are connected to a
100Mbit link. HTTP clients request 10Kb of random ob-
jects from the web cluster. This operation is initially per-
formed at rate of 20 HTTP requests per second and then
the same experiment is repeated, each time with a conten-
tion increase of 20 requests per second until a total num-
ber of 24,000 HTTP requests is reached. We also set the
client request timeout value from 10 to 5 seconds in order
to insure that the bottleneck for the first web server willmainly be its network link to the web switch and not a
web server transient load. We put to the test ALBL algo-
rithm (ALBLND at Fig. 2) with a low threshold value that
causes the network delay metric to be of importance for
the weight calculation process (use of (2)).
From Fig. 2, it is obvious that for low client request rate
values all algorithms perform the same in terms of
throughput (from 20 till 120 req/sec), with the exception
of LC algorithm that outperforms all tested algorithms at
80, 100 req/sec and LLoad (LL) algorithm that fluctuates
at 80 and 120 req/sec with an average deviation value of20 Kbytes/sec per 20 request bunch. RR shows the least
average throughput per bunch of requests. In contrast,
ALBLND algorithm presents a linear performance which
increases steadily for request rates below 120. Fluctua-
tions of ALBLND above 120 provides a positive gain in
terms of throughput and does not fall bellow expected
linear behaviour of the algorithm. ALBLND average per-
formance gain in terms of throughput above 180 req/sec,
is shown in Table 2.
This is proof that ALBLND matches successfully con-gestion incidents at the first web server, due to its high
responsiveness towards changes in link network delay or
the web servers processing time of a request. This respon-
sive behavior pinpoints ALBLND aggressive characteris-
tics towards network congestion or web server load
change. While the aforementioned algorithms do not
have the sensitivity to detect network link changes and if
they do detect a load change may use conservative
mechanisms to mediate the problem (send one one re-
quest to the least loaded server and each time check for
the one that has the least connections). On the other hand,
ALBLND can spot network congestion incidents using a
fine grained probing mechanism (max 10ms granularityon 100Mbit network- max 1ms granularity on Gbit net-
work) and deal with them using a more aggressive ap-
proach: Send a burst of requests to the least delayed web server
and then probe again. Such an aggressive mechanism is not
successful when it comes to transient or random network
errors andin these cases ALBLND may lead the system to
unbalanced conditions (That is why in such cases we use
the less responsive case derived from (1) equation for
weight calculation of the ALBL algorithm.
4.2ALBL scalability scenario IIIn this scenario we put to the test LC, LLoad, ALBL (use
of (1)) and ALBLND (use of (2)) algorithms scalability
Fig. 2. Scenario I, throughput in Kb/sec over low contention increase ofclients Req/sec, over a total of 24,000 HTTP connections. ALBLND isthe ALBL algorithm that uses for weight calculation equation (2).
TABLE 2100HTTP REQUESTS AVERAGE THROUGH-
PUT GAIN (KB/SEC) OF ALBLND OVER
LC AND LLOAD ALGORITHMS
Req. Rate LC Thr Gain: LL Thr Gain:
150-250 87.25 42
250-350 99.75 62.2
350-400 101.5 142.75
.
-
8/7/2019 ALBL, ALBL/HSC algorithms: Towards more scalable, more adaptive and fully utilized balancing systems
11/15
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 8, AUGUST 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 11
with the use of the SF metric. In order to calculate SF val-
ues we retrieve a 10Kb object from the web cluster. This
operation is performed at rates that escalate from 20
HTTP till 400 HTTP requests per second using two web
servers behind the web switch and 1Kb object requests.
Then we change the number of web servers at the web
cluster from 2 to 5 and repeat the whole experiment. Theweb servers used are of equivalent processing power and
not loaded.
SF metric shows the percentage of throughput gain of
the average throughput achieved by the previous algo-
rithms, if we increase the number of web servers in the
web farm from 2 to 5. As we can see in Fig. 3, LC algo-
rithm presents the highest throughput gain as we scale
from 2 to 5 web servers. Adaptive algorithms like LLoad
and ALBLND follow with an average difference of 0.08%
from LC algorithm. Finally, ALBL presents the worst
scalability factor that averages nearly 0.31% less than theSF value of LC algorithm.
LC best scalability is due to the algorithms fast deci-
sions at the web switch that do not depend on complex
calculations or feedback information from the web serv-
ers. As the number of web server nodes increase, LLoad
algorithm depends on more feedback load information
from web servers. Increasing probing frequency of LLoad
leads its SF performance near the performance line of LC,
but still cannot outperform it. ALBLND algorithm (that
use equation (2) for the WC process) is also out-scaled by
LC due to algorithm agent performance at the web
switch. As probing agent frequency and responsiveness
decrease, ALBL algorithm scaling performance decreases
to a maximum value of 0.5% of that of LC algorithm scal-
ing performance. This is of course the case of a balancing
system where all network links are of equivalent networkcapacities and all web servers have equal loads.
4.3ALBL/HSC performance scenario IIIIn this scenario we examine performance of ALBL over
ALBL/HSC in a cluster based environment of 4 web serv-
ers where half the clients request a static HTTP content of
NCQ size (200-250Bytes random size object), while the
other half of the clients request HTTP content is of maxthroughput size (download a large file over HTTP of
100MB size). This operation is performed at rates that
escalate from 40 to 400 HTTP requests per second (20
req/s for the first class and 20 for the second until 200
req/s for the first class and 200 for the second). This con-
tention increase mechanism ends when a total number of
24,000 HTTP requests have been successfully delivered
service. Firstly, we perform the test by using ALBL algo-
rithm weight calculation derived from (1). Then we per-
form the same experiment using ALBL/HSC and assign 2
web servers for servicing the NCQ class and 2 web serv-ers for servicing the Max throughput class. We also send
packets arriving for other classes to the Max throughput
class. We adjusted expected BW parameter for both NCQ
and Max throughput classes to BW of the available link
BW (100Mbit/s). That is, 6250 Kb/s and the maximum ex-
pected latency for each class based on a MTU package
length for the Max throughput class to 0.3ms. Accord-
ingly, we maintain an average 250Bytes packet size for
the NCQ class and latency of 0.05ms. Because ACK pack-
ets due to their small size, affect ALBL/HSC performance
we pass ACKs to both classes (ACKs to a Max throughputpacket via the Max throughput class, while ACKs for an
NCQ packet via the NCQ class).
We calculate the average response time per aggregated
bunch of requests of ALBL and ALBL/HSC algorithms.
That is, for NCQ and Max Throughput HTTP classes. At
Fig. 4 ALBL/HSC NCQ class has less response time than
the NCQ traffic of the classless ALBL algorithm. This is ofcourse not the case for the ALBL/HSC Max throughput
class, which increases its average response time over the
Max throughput class of the ALBL algorithm. This can be
Fig. 3. Scenario II, % clustering algorithms scaling factor SF over cli-ents Req/sec.
Fig. 4. Scenario III(a), ALBL and ALBL/HSC Average HTTP responsetime over client Requests for NCQ and Max Throughput traffic accord-
ingly
Fig. 3. Scenario II, % clustering algorithms scaling factor SF over cli-ents Req/sec.
-
8/7/2019 ALBL, ALBL/HSC algorithms: Towards more scalable, more adaptive and fully utilized balancing systems
12/15
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 12
explained by the functionality of the class mechanism of
ALBL/HSC that favors small NCQ HTTP flows and limits
burstiness and aggressiveness of larger packet size HTTP
flows by setting an extra queuing delay to those packets
via the GRED mechanism (packet drops/ TCP retransmis-
sion and resetting of the TCP window size).
Favouring small packets over large ones is usually
beneficial for the overall system in terms of throughput.
At Fig. 5, the average aggregated throughput of the num-
ber of requests (equal to the request rate) over request
rate, for both ALBL and ALBL/HSC algorithm is depicted.
For this case scenario results ALBL/HSC algorithm out-
performs ALBL by providing an average of 0.55 Kb/smore throughput per HTTP request. Since average
throughput per HTTP request for all request rates using
ALBL algorithm is 4.5 Kb/s then ALBL/HSC performs in
general 12.2% better than ALBL algorithm in terms of
throughput.
4.3.1 ALBL/HSC and ALBL web switch CPU effortscenario III(c)
We also calculated for the whole duration of each ex-
periment, average % CPU uptime value at the web
switch. % CPU uptime values range from 0.01 to 1.0 (It ispossible to have values greater that 1.0). The web switch
system enters CPU overloaded phase measured as abso-
lute uptime value above 0.9, while above 0.7 value is con-
sidered as critical and affects incoming HTTP traffic net-
work delay. % CPU processing time over the number of
simultaneous incoming HTTP requests at the web switch
is depicted at Fig. 6. The number of HTTP requests
reached for every requests rate experiment is a total of
24,000 HTTP requests.
It is obvious from Fig. 6, that ALBL/HSC utilizes 2.5 to 3
times more web switch CPU processing power that
ALBL, due to its complexity, class mechanism and layer-7
marking mechanism efforts. Also ALBL algorithm in-
creases web switch CPU utilization exponentially (Fig. 6,
ALBL, 280-400 req/sec) proportional to the increase of
request rate. This causes request rate to be a more critical
factor for the web switch forwarding capability than the
number of maintained connections. The paradox in this
case is that ALBL/HSC algorithm presents a more of loga-
rithmic increase in CPU processing power over request
rate. The only conclusion that can be clearly drawn by Fig6., is that content aware algorithms with classes of service
need at least 60-200% more CPU processing effort than
adaptive content blind algorithms that use probing agents
or number of connections to balance accordingly among a
number of web servers. The profit of such performance
gains in our case is presented at Table 3. In conclusion, for
a throughput gain of 120 KB/s that is provided by a con-
tent aware algorithm 126% more CPU effort is required
than a content blind load balancing algorithm. That is,
setting the threshold to 1% more CPU effort of
ALBL/HSC may lead to 1 KB/s throughput gain over
ALBL.
CONCLUSIONIn this paper we present ALBL balancing algorithm forcluster based web systems. ALBL main advantage is thattakes into account web servers CPU load as well as net-work conditions for the balancing process.
Fig. 5. Scenario III(b), ALBL and ALBL/HSC performance in terms ofThroughput over client Requests for both NCQ and Max Throughputclasses.
Fig. 6. Scenario III(c), Web switch % CPU performance of ALBL andALBL/HSC algorithms over incoming connections request rate at theweb switch.
TABLE 3AVERAGE % MORE CPU EFFORT OF ALBL/HSC
ALGORITHM THAN ALBL AND THROUGHPUT GAINOF ALBL/HSC100HTTPREQUESTS OVER ALBL IN
KB/SEC
Req. Rate ALBL/HSC
% more CPU
effort:
ALBL/HSC
Thr Gain
(Kb/s):
40-60 60 46.2
200-280 190 140.07
320-400 130 202.01
.
-
8/7/2019 ALBL, ALBL/HSC algorithms: Towards more scalable, more adaptive and fully utilized balancing systems
13/15
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 8, AUGUST 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 13
We also present ALBL/HSC, a content aware balancing
algorithm that uses ALBL mechanism for the balancing
process among web servers that service requests for the
same service class. ALBL/HSC algorithm is capable of
differentiating HTTP requests into service classes and
provides different drop probabilities for each class based
on NDP and CBD metric values. It also has the ability topredict the need of scalability or bandwidth in-
crease/decrease per service class. Incorporation of predic-
tion metrics to the algorithm is considered as future work.
We compare performance of ALBL algorithm against
known stateless, stateful and adaptive algorithms and
performance of ALBL/HSC over ALBL. From the experi-
mental results we show that ALBL matches or even over-
comes the performance of conventional balancing algo-
rithms. In particular, ALBL balances efficiently HTTP
traffic on unbalanced conditions that dynamically change:
(a) Due to utilized network conditions and (b) due to webserver limited computational resources, while adequate
network resources exist. We also confirm ALBL algorithm
scalability potential (used as a base balancing algorithm
by ALBL/HSC).
Finally, we show with performance measurements of
ALBL/HSC over ALBL, the significant performance gains
of content aware balancing strategies over non content
aware ones, as well as the processing efforts at the web
switch that emerge. Further reduction of ALBL/HSC
processing overhead at the web switch and incorporation
of bandwidth and scalability estimation process metricsinto ALBL/HSC, is set for investigation and future work.
ACKNOWLEDGMENTWe would like to thank Democritus University of ThraceDept. of Electrical and Computer Eng., Data analysislaboratory for the permit to use their laboratory equip-ment to perform our tests. We also would like to thankElec. Eng. Panagiotis Nestoras (pnestora at ee.duth.gr) forhis technical advice and assistance at the conduct of theexperimental scenarios.
REFERENCES[1] V. Cardellini, E. Casalicchio, M. Colajanni, and P. S.
Yu, The State of the Art in Locally Distributed Web-Server
Systems, ACM Computing Surveys, vol. 34, no. 2, pp. 263311,
2002.
[2] V. Cardellini, M. Colajanni, and P. S. Yu, Dynamic
load balancing on web server systems, IEEE Internet Compu-
ting, vol. 3, no. 3, pp. 2839, 1999.
[3] CISCO, Distributed Director, http://-
www.cisco.com/warp/public/cc/pd/cxsr/dd/, 2004.
[4] SQUID, web proxy cache, http://www.squid-cache.org, 1995.
[5] D. Wessels and K. Claffy, Internet Cache Protocol
(ICP) version 2, RFC 2186, 1997.
[6] D. Maltz and P. Bhagwat, TCP splice application
layer proxy performance, High Speed Networks, vol. 8, no. 3,
pp. 225 240, 1999.
[7] S. Adhya, Asymmetric TCP Splice: A Kernel Me-
chanism to Increase the Flexibility of TCP Splice, Master The-
sis, http://www.cse.iitk.ac.in/research/mtech1999/-
9911134.ps.gz, 2001.[8] S. Purkayastha, Symmetric TCP Splice: A Kernel
Mechanism for High Performance Relaying, Dept. of C.S.
Indian Institute of Technology, Tech. Rep., 2001. [Online].
Available: http://www.cse.iitk.ac.in/research/mtech1999/-
9911140.ps.gz
[9] L. Wang, Design and Implementation of TCPHA -
Draft release, http://dragon.linux-vs.org/dragonfly/, 2005.
[10] M.-Y. Luo and C.-S. Yang, Efficient support for con-
tent-based routing in web server clusters, in Proc. of 2nd
USENIX Symposium on Internet Technologies and Systems, 1998.
[11] H.-H. Liu, M.-L. Chiang, and M.-C. Wu, Efficientsupport for content-aware request distribution and persistent
connection in web clusters, Software Practise & Experience,
vol. 37, no. 11, pp. 1215 1241, 2007.
[12] Layer7filter, Linux Layer7 filter Team: Application
Layer Packet Classifier for Linux, http://l7-
filter.sourceforge.net/, 2004.
[13] Netfilter, Linux Netfilter Project, http://-
www.netfilter.org, 2000.
[14] L. Aversa and A. Bestavros, Load Balancing a Clus-
ter of Web servers using Distributed Packet Rewriting, in
Proc. of IEEE International Performance, Computing and Commu-nications conference, 2000, pp. 2429.
[15] M. Colajanni, P. S. Yu, and D. M. Dias, Analysis of
task assignment policies in scalable distributed Web-server
systems, IEEE Trans. on Parallel and Distributed Systems, vol. 9,
pp. 585600, 1998.
[16] CISCO, CISCO Services Modules - Understanding
CSM Load Balancing Algorithms, http://www.cisco.com/-
warp/public/117/csm/lb_algorithms.pdf, 2007.
[17] W. Zhang, Build highly-scalable and highly-
available network services at low cost, Linux Magazine, vol. 3,
pp. 2331, 2003.[18] A. Weinrib and S. Shenker, Greed is not enough:
Adaptive load sharing in large heterogeneous systems, in
Proc. of IEEE INFOCOM, 1988, pp. 986994.
[19] B. Abhay and C. Sanjay, Performance evaluation of
web servers using central load balancing policy over virtual
machines on cloud, in Proc. of COMPUTE: The Third Annual
ACM Bangalore Conference. New York, NY, USA: ACM, 2010,
pp. 14.
[20] J. Batheja and M. Parashar, A framework for Adap-
tive Cluster Computing Using Javaspaces, Cluster Computing,
vol. 6-3, no. 3, pp. 201213, 2003.[21] N. Handigol, S. Seetharaman, N. McKeown, and
R. Johari, Plug-n-serve: Load-Balancing Web Traffic using
OpenFlow, in Proc. of ACM SIGCOMM, 2009, pp. 8895.
-
8/7/2019 ALBL, ALBL/HSC algorithms: Towards more scalable, more adaptive and fully utilized balancing systems
14/15
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 14
[22] D. Maltz and P. Bhagwat, Linux Director: a connec-
tion director fo scalable network services, Computer Science
and Technology, vol. 15, no. 6, pp. 560 571, 2000.
[23] P. ORourke and M. Keefe, Performance Evaluation
of Linux Virtual Server, in Proc. of the 15th LISA System Ad-
ministration Conference, 2001, pp. 7992.
[24] M. Andreolini and S. Casolari, Load predictionmodels in web-based systems, in Proc. of the 1st international
conference on Performance evaluation methodolgies and tools.
ACM, 2006, p. 27.
[25] M. Andreolini, S. Casolari, and M. Colajanni, Mod-
els and Framework for Supporting Runtime Decisions in
Web-Based Systems, ACM Transactions on the Web., vol. 2,
no. 3, pp. 1743, 2008.
[26] V. Cardellini, M. Colajanni, and P. S. Yu, Geograph-
ic load balancing for scalable distributed Web systems, in
Proc. of the 8th International Symposium on Modeling, Analysis
and Simulation of Computer and Telecommunication Systems,2000, pp. 2028.
[27] S. Sharifian, S. Motamedi, and M. Akbari, A predic-
tive and probabilistic load-balancing algorithm for cluster-
based web servers, Applied Soft Computing, vol. 5, no. 1, pp.
174186, 2010.
[28] J. Cao, Y. Sun, X. Wang, and S. K. Das, Scalable load
balancing on distributed web servers using mobile agents,
Parallel and Distributed Computing, vol. 63, no. 10, pp. 9961005,
2003.
[29] B. Boone, S. Van Hoecke, G. Van Seghbroeck,
N. Joncheere, V. Jonckers, F. De Turck, C. Develder, andB. Dhoedt, Salsa: QoS-aware load balancing for autonomous
service brokering, Systems & Software., vol. 83, no. 3, pp. 446
456, 2010.
[30] E. Casalicchio and M. Colajanni, A client-aware
dispatching algorithm for web clusters providing multiple
services, in Proc. of the 10th international conference on World
Wide Web. ACM, 2001, pp. 535544.
[31] M.-Y. Luo, C.-S. Yang, and C.-W. Tseng, Content
management on server farm with Layer-7 routing, in Proc. of
ACM symposium on Applied Computing. ACM, 2002, pp. 1134
1139.[32] L. Cherkasova and M. Karlsson, Scalable Web Serv-
er Cluster Design with Workload-Aware Request Distribution
Strategy WARD, in Proc. of the 3rd International Workshop on
Advanced issues of E-Commerce and Web-Based Information Sys-
tems. Society Press, 2001, pp. 212221.
[33] M.-L. Chiang, Y.-C. Lin, and L.-F. Guo, Design and
implementation of an efficient web cluster with content-based
request distribution and file caching, Systems and Software,
vol. 81, no. 11, pp. 20442058, 2008.
[34] V. S. Pail, M. Aront, G. Bangat, M. Svendsent,
P. Druschel, W. Zwaenepoelt, and E. Nahum, Locality-Aware Request Distribution in Cluster-based Network Serv-
ers, in Proc. of the 8th International conference on Architectural
Support for Programming Languages and Operating Systems, 1998.
[35] E. Casalicchio and Colajanni, A Client-Aware Dis-
patching Algorithm for Web Clusters providing multiple ser-
vices, in In proc. of the 10th WWW conference, 2002.
[36] M.-L. Chiang, C.-H. Wu, Y.-J. Liao, and Y.-F. Chen,
New content-aware request distribution policies in web clus-
ters providing multiple services, in Proc. of the 2009 ACM
symposium on Applied Computing. ACM, 2009, pp. 7983.[37] A. Karakos, D. Patsas, A. Bornea, and
S. Kontogiannis, Balancing HTTP traffic using dynamically
updated weights, an implementation approach, in Proc. of the
10th Panhellenic Conference on Informatics, 2005, pp. 873878.
[38] S. Kontogiannis, S. Valsamidis, P. Efraimidis, and
A. Karakos, Probing based load balancing fo web server
farms, in Proc. of the 13th Panhellenic Conference on Informatics,
2009, pp. 175180.
[39] S. B. Fred, T. Bonald, A. Proutiere, G. Rgni, and
J. W. Roberts, Statistical bandwidth sharing: a study of con-
gestion at flow level, ACM SIGCOMM Comput. Commun.Rev., vol. 31, no. 4, pp. 111122, 2001.
[40] M. Andreolini, E. Casalicchio, M. Colajanni, and
M. Mambelli, A Cluster-Based Web System Providing Diffe-
rentiated and Guaranteed Services, Cluster Computing, vol. 7,
no. 1, pp. 719, 2004.
[41] Z. Lin, L. Xiao-ping, and S. Yuan, A content-based
Dynamic Load-Balancing Algorithm for Heterogeneous Web
Server Cluster, Computer Science and Information Systems,
vol. 7, no. 1, pp. 153 162, 2010.
[42] L. Mamatas and V. Tsaoussidis, A new approach to
Service Differentiation: Non-Congestive Queueing, in Proc. ofInternational Workshop on Convergence of Heterogeneous Wireless
Networks, 2005, pp. 7883.
[43] L. Mamatas and V. Tsaoussidis, Differentiating Ser-
vices with Non-Congestive Queuing (NCQ), IEEE Transac-
tions on Computers, vol. 58, no. 5, pp. 591 604, 2009.
[44] TC, Linux Traffic Control Project, http://lartc.org/,
2001.
[45] W. Almesberger, J. H. Salim, and A. Kuznetsov,
Generalized Random Early Drop queueing discipline,
http://http://www.opalsoft.net, 2006.
[46] S. Floyd and V. Jacobson, Random early detectiongateways for congestion avoidance, IEEE ACM Transactions
on Networks, vol. 1, no. 4, pp. 397413, 1993.
[47] M. Christiansen, K. Jeffay, D. Ott, and S. F. Donelson,
Tuning RED for Web Traffic, in Proc. of ACM SIGCOMM,
2000, pp. 139150.
[48] B. Hubert, G. Maxwell, R. Van Mook,
M. Van Oosterhout, P. Schroeder, J. J. Spaans, and P. Larroy,
Linux Advanced Routing and Traffic Control HOWTO,
http://www.tldp.org/HOWTO/Adv-Routing-HOWTO, 2004.
[49] S. Floyd, Recomendations on using the gentle va-
riant of RED, http://www.icir.org/floyd/red/gentle.html,2000.
-
8/7/2019 ALBL, ALBL/HSC algorithms: Towards more scalable, more adaptive and fully utilized balancing systems
15/15
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 8, AUGUST 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 15
[50] W. Chen and S.-H. Yang, The mechanism of adapt-
ing RED parameters to TCP traffic, Computer Communications,
vol. 32, no. 13-14, pp. 15251530, 2009.
[51] D. Mosberger and T. Jin, A Tool for Measuring Web
Server Performance, in Proc. of ACM Workshop on Internet
Server Performance. ACM, 1998, pp. 5967.
[52] E. Magana, E. Izkue, and J. Villadangos, Review oftraffic scheduler features on general purpose platforms, in
Proc. of SIGCOMM Workshop on Data communication in Latin
America and the Caribbean, 2001, pp. 5079.
S. Kontogianns is a PhD candidate student at the Dept. of Electricaland Computer Eng., Democritus University of Thrace, Xanthi,Greece. He received a five-year Eng. diploma and MSc in SoftwareEng. from Department of Electrical and Computer Eng., DemocritusUniversity of Thrace. His research interests are in the areas of Dis-tributed systems, computer networks, middlware protocol design,network modelling and computer networks performance evaluation.His e-mail is: skontog at ee.duth.gr. S. Valsamidis is a PhD candidate student at the Dept. of Electricaland Computer Eng., Democritus University of Thrace, Xanthi,Greece. He received a five-year Electrical Eng. diploma from De-partment of Electrical Eng., University of Thessaloniki, Greece andMSc in Computer Science from University of London, UK. He is anApplications Prof. in the Dept. of Information Technology, TEI ofKavalas, Greece. His research interests are in the areas of Distrib-uted systems, computer networks, database architectures, dataanalysis and evaluation and data mining. His e-mail is: svalsam atee.duth.gr.
A. Karakos received the Degree of Mathematician from the Depart-ment of Mathematics from Aristoteles University of Thessaloniki,Greece and the Maitrise d' Informatique from the university PIERRE
ET MARIE CURIE, Paris. He completed his PhD studies at universityPIERRE ET MARIE. He is Assistant Professor at the Dept. of Elec-trical and Computer Eng., Democritus University of Thrace, Greece.His research interests are in the areas of Distributed systems, dataanalysis and programming languages. His e-mail is: karakos atee.duth.gr.