qos in ws-bpel processesto ascertain qos for bpel processes. in this thesis, we illustrate the...
TRANSCRIPT
QOS IN WS-BPEL PROCESSES
A DISSERTATION SUBMITTED TO
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
AT THE INDIAN INSTITUTE OF TECHNOLOGY DELHI
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF MASTER OF TECHNOLOGY
Debdoot Mukherjee
May 2008
c© Copyright by Indian Institute Of Technology Delhi 2008
All Rights Reserved
CERTIFICATE
This is to certify that the thesis titled “QoS in WS-BPEL Processes” be-
ing submitted by Debdoot Mukherjee to the Indian Institute of Technology
Delhi, for the award of the degree of Master of Technology in Computer Sci-
ence & Engineering, is a record of bona-fide research work carried out by him
under our supervision. The work presented in this thesis has not been submitted to
any other university or institute for the award of any other degree or diploma.
Prof. Pankaj Jalote Dr. Mangala Gowri Nanda
Microsoft Chair Professor Research Staff Member
Dept. of Computer Science & Engg. IBM India Research Lab
IIT Delhi New Delhi
Abstract
With a large number of web services offering the same functionality, the Quality of
Service (QoS) rendered by a web service becomes a key differentiator. WS-BPEL
has emerged as the de facto industry standard for composing web services. Thus,
determining the QoS of a composite web service expressed in BPEL can be extremely
beneficial. While there has been much work on QoS computation of workflows repre-
sented in custom built Workflow Management Systems (WfMS), there exists no tool
to ascertain QoS for BPEL processes. In this thesis, we illustrate the differences in
expressiveness of BPEL and conventional workflow systems and show that a BPEL
process cannot be always reduced to a composition of series, parallel, conditional or
loop constructs; which is an assumption of the existing QoS computation approaches.
We propose a model for estimating three QoS parameters, namely, Response Time,
Cost and Reliability, of an executable BPEL process with requisite QoS and flow in-
formation available for its constituent activities. We have built a tool to compute QoS
of a WS-BPEL process that accounts for all workflow patterns that may be expressed
by standard WS-BPEL. Again, with mission critical applications of organizations
getting supported through web service compositions, very high levels of reliability
and performance may be warranted. We present an approach that utilizes traditional
fault tolerance constructs to improve QoS of BPEL processes and alleviate the risk of
not meeting SLA requirements. We have modeled two constructs each for reliability
improvement and performance improvement using standard WS-BPEL 2.0 elements.
N-Version Programming and Recovery Blocks are the two approaches implemented
that help in improving the reliability of a service invocation. To increase the chances
of meeting the performance requirements outlined in SLA, one may use a construct
where the fastest response is returned from a parallel execution of redundant services
or enforce a deadline mechanism to fork off alternate services if a primary service does
not deliver within a certain length of time. Our tool allows the designer of the web
service composition to arbitrarily nest these fault tolerant constructs for any activity
and strive to achieve the desired QoS levels for the composite service.
Acknowledgment
Contents
Abstract ii
Acknowledgment iv
1 Introduction 1
1.1 QoS in Web Services . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Workflows and QoS computation . . . . . . . . . . . . . . . . . . . . 4
1.3 Fault Tolerance in Web Services . . . . . . . . . . . . . . . . . . . . . 4
1.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.1 BPEL - A Case Apart . . . . . . . . . . . . . . . . . . . . . . 6
1.4.2 Fault Tolerance in a WS-World - New Opportunities . . . . . 8
2 Related Work 9
2.1 QoS in Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Cardoso’s QoS Model . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 Other Efforts . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 WS-BPEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.1 Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.2 Brief Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 QoS Computation 22
3.1 Passport Application Service Example . . . . . . . . . . . . . . . . . 24
3.2 QoS Model Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.1 User Environment . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.2 Inputs to the Model . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Model Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 Reliability Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.1 Activity-Wise Reliability Computation . . . . . . . . . . . . . 34
3.4.2 Suppression of Join Failures . . . . . . . . . . . . . . . . . . . 42
3.5 Response Time Modeling . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.5.1 Activity Wise Response Time Computation . . . . . . . . . . 45
3.5.2 Operations on Random Variables . . . . . . . . . . . . . . . . 49
3.6 Cost Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.6.1 Activity-Wise Cost Computation . . . . . . . . . . . . . . . . 52
4 QoS Improvement with Fault Tolerance 55
4.1 N-Version Programming (NVP) . . . . . . . . . . . . . . . . . . . . . 56
4.1.1 WS-BPEL implementation . . . . . . . . . . . . . . . . . . . . 56
4.1.2 QoS formulation . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2 Recovery Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2.1 WS-BPEL Implementation . . . . . . . . . . . . . . . . . . . . 58
4.2.2 QoS Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3 Return Fastest Response . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3.1 WS-BPEL Implementation . . . . . . . . . . . . . . . . . . . . 60
4.3.2 QoS Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.4 Deadline Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4.1 WS-BPEL implementation . . . . . . . . . . . . . . . . . . . . 62
4.4.2 QoS Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 62
5 Implementation Details 64
5.1 BPEL Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.2 QoS Calculator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6 Conclusions 69
References 71
List of Figures
2.1 A system that is neither series nor parallel . . . . . . . . . . . . . . . 13
2.2 Synchronization with links . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Dependability Stack in Web Services . . . . . . . . . . . . . . . . . . 16
3.1 Passport Office BPEL Process . . . . . . . . . . . . . . . . . . . . . . 25
5.1 Block Diagram of Implementation . . . . . . . . . . . . . . . . . . . . 65
Chapter 1
Introduction
ISO9000 defines quality as the degree to which a set of inherent characteristic fulfills
requirements [29]. It may be regarded as the customer’s perception of the supplier’s
work output or the extent to which the service delivery meets user expectations. Also,
quality is inherently subjective - different people may experience the quality of the
same software differently. Thus, according to Gerald Weinberg quality is value to
some person. Software quality measures how well a software is designed as well as
the degree of its conformance to its design. The study of software quality concerns
itself as much with internal quality characteristics viz. maintainability, portability,
understandability and adherence to coding standards; as it does with external user
requirements such as reliability, security, efficiency etc. [30].
Quality of Service (QoS) is a term that was originally coined in the world of tele-
phony and traffic engineering to refer to the levels of performance and transmission
characteristics of a communications channel. ITU standard X.902 defines QoS as, a
set of quality requirements on the collective behavior of one or more objects. QoS has
been heavily studied in the area of computer networks [14, 31, 32], middleware [64, 22]
and real time systems [13]. However, in computer networking the term QoS is more
used to relate to resource reservation control mechanisms rather than the achieved
service quality. QoS is clearly a multi-dimensional quantity with its various dimen-
sions getting dictated by the context within which it is being studied. For example
1.1. QOS IN WEB SERVICES 2
in telephony, the different aspects of a connection such as service response time, loss,
signal-to-noise ratio, cross-talk, echo, interrupts, frequency response, loudness levels
etc. contribute in the process of measurement of QoS. On the other hand, Quality of
Service of a real time system will involve an assessment of factors such as guarantees
on response time, and the degree of predictability of delays.
1.1 QoS in Web Services
Quality of a web service or for that matter any service may be modeled only through
its externally measurable characteristics. A web service presents the functionality
being rendered as a black box and does not give the consumer a chance to peruse
the quality processes of the service provider. This is the perhaps the reason why we
talk about Quality of Service and not just quality (as in software quality) because
quality involves inspection of the development process. The Service Level Agreement
(SLA), where deliverable limits on the qualitative attributes are listed, forms the
legal binding through which consumers or brokers can track the service provider’s
offerings. Since, SLA violations could lead to penalties being incurred by the service
provider, it contains only those attributes of the service that can be monitored by both
parties and possibly by neutral third parties who may be approached for mediation of
disputes. In the web services world, SLA management drives QoS research; thus, it is
only natural that the dimensions of QoS will have to be unambiguously measurable
so that they can find a place in SLAs.
Quality of Service (QoS) of a web service has been parameterized in terms of its
response time, throughput, reliability, availability, security and other network related
parameters. Interestingly, reputation or fidelity has also been studied [41, 50, 61]
as a potential QoS factor. Various web services consortiums have tried to formalize
taxonomies for QoS dimensions aiming at automated SLA management. Efforts have
been focused on arriving at semantic models for representation of different quality
attributes and their assessment functions, capturing the inter-relationships between
the entities involved in SLA management [55, 10].
1.1. QOS IN WEB SERVICES 3
Cappiello et al. [10] point out that QoS research in web services revolves around
two basic heads. Firstly, measurement of QoS dimensions such as response time,
throughput, reliability and availability has drawn attention [58, 62]. QoS monitoring
systems have come up that track SOAP messages and attributes of the network
connection set up during an invocation of a web service. These values are averaged
over a number of calls to the web service to derive its expected QoS parameters.
Secondly, QoS aware composition has emerged as a hot topic in recent years. The
problem of optimally composing a web service that invokes several other web services
during the course of its workflow has captured the minds of many researchers. Several
solutions based on diverse techniques such as integer linear programming [61, 60],
genetic algorithms [9], constraint based optimization [2] and use of heuristics in mixed
integer programming [8] have been proposed in the literature.
Knowing the QoS of the web service being composed is extremely crucial during
the process of service orchestration (binding concrete web services to tasks in the
workflow). The integrator of the WS-composition needs to keep track of the QoS of
the composite service whenever he makes modifications to the bindings. Section ??
looks at some of the traditional workflow modeling systems and gain an insight into
the QoS computation technique used in such systems.
Although, the use of fault tolerant constructs in web services have studied in
literature [17, 19, 51, 23, 39, 18], there has been no research that quantifies the
QoS improvement that may be brought about by these constructs. Our work aims
to provide a framework within which various fault tolerant constructs may be built
into BPEL processes and puts foward models to help the designer measure the QoS
improvement that may take place. Section 1.3 sets the background of fault tolerance
research in web services.
1.2. WORKFLOWS AND QOS COMPUTATION 4
1.2 Workflows and QoS computation
A workflow is an abstraction of a business process wherein the logical steps performed
in it and the dependencies between them are listed. It also contains the rules for rout-
ing and sharing of information amongst the different participants. Workflow Manage-
ment Systems (WfMSs) have been in vogue to streamline business processes. They
help in increasing efficiency of organizational workflows by automating invocation of
tasks following the pre-defined rules and providing better coordination amongst par-
ticipants wherever manual intervention is required. Moreover they allow the business
processes to be continuously monitored and analyzed providing an opportunity for
improvements to be implemented.
With the advent of web services, business workflows are being implemented as
WS-compositions. Thus, there arises a need for proper QoS management of workflow
instances running as web services. Cardoso [11] presents a model for QoS computation
of workflows. Although his research draws motivation from the web services world,
the QoS model is generic and works the same way irrespective of whether the tasks
are being implemented as web services or not. The formulation reduces the workflow
repetitively into a decomposition of constructs like sequential, parallel, conditional,
loop and fault tolerant; computes the QoS parameters at each reduction in order
to arrive at the overall QoS for the workflow. The model provides estimation of
time, cost and reliability and has been implemented on a WfMS called METEOR-S.
Chapter 2 looks at Cardoso’s model and other workflow QoS models in greater detail.
1.3 Fault Tolerance in Web Services
With mission critical web service compositions often being dependent on rented ser-
vices, high risk is involved in the process of binding tasks in the workflow to concrete
web services. When reputation is at stake, the web services designer cannot simply
delegate a task in the work to a single web service. He must have redundant web
services for every critical task in the composition to account for failures that may
1.3. FAULT TOLERANCE IN WEB SERVICES 5
occur. Traditional fault tolerant approaches like N-version programming, recovery
blocks and deadline mechanisms may be used to enhance reliability and/or perfor-
mance of web services. Fault tolerant constructs are based on the simple principle
that “two heads are better than one”. When one is contended with choosing a web
service (liable to failures) as part of a critical web service composition, it is imperative
that one backs it up with sufficient redundant implementations. In order to alleviate
design faults, the redundant implementations must have different designs so that one
can expect a design fault is not repeated across all designs - the key assumption in
all software fault tolerance approaches. In distributed systems, one also finds redun-
dant implementations of the same design being deployed to improve performance and
availability through better load balancing.
However, web service standards currently do not have any provision in the architec-
tural infrastructure for software fault tolerance. Redundancy is generally incorporated
in a proprietary manner by service developers. Looker and Munro [39] were the first
to propose N-version programming in web services. Sommerville [19] followed with a
container-based approach to fault tolerance, allowing the container to be configured
with a policy which specifies what kind of fault tolerance mechanisms may be ap-
plied to the services it contains. Dobson [17] has used the mechanisms provided by
WS-BPEL to support NVP and recovery blocks.
Our framework provides a platform through which the web services composer can
easily bind redundant web services to a task after knowing the QoS improvement
that may be expected in the overall BPEL process. We present four different fault
tolerance constructs, show how they may be written using WS-BPEL and illustrate
QoS computation methodologies for each of them. These formulations for QoS deter-
mination may be tied to our QoS computation framework for WS-BPEL, to estimate
QoS in presence of redundancies.
1.4. MOTIVATION 6
1.4 Motivation
The SOA dream consummates in a marketplace for services wherein more services
will be produced by brokers who will be composing various web services to provide
new functionality rather than coding programs from scratch. As in any competitive
market, where a number of offerings are available for the same functionality, Quality
of Service is slated to be the key differentiator. The consumer shall select a web service
that best suits his QoS requirements and enter into the process of SLA negotiation
with the vendor of that service. This has led to research on Quality of Service gaining
prominence in the area of web services.
The orchestrator has to judiciously choose every web service that he binds to the
composition in order to attain a high level of QoS and meet his SLA requirements .
A tool that can provide accurate estimates of QoS of the resultant WS-composition,
given the values of QoS parameters for constituent web services (as specified in their
SLAs) will come in most handy for the integrator. This paper aims to provide such
a framework for QoS determination in BPEL processes. Again, it might be possible
that the designer is unable to find a single web service that may perform a task to the
desired levels of QoS. The tool also helps in QoS improvement through fault toler-
ance constructs and re-computation of QoS of the BPEL process with redundancies.
Section 1.4.1 argues that the WS-BPEL 2.0 standard compliant work presented here
is significant illustrating the fact that QoS modeling in BPEL is far more challenging
than the same for traditional workflow systems. Section 1.4.2 throws light upon fresh
motivations for application of fault tolerance constructs in a web services world.
1.4.1 BPEL - A Case Apart
Business Process Execution Language (BPEL) has emerged as the de-facto standard
for representation of industrial workflows either as abstract processes or as executable
processes that can be easily deployed. BPEL (its first standardized version was known
as BPEL4WS1.1, later ratified as WS-BPEL 2.0 after some modifications) evolved
from two early process modeling languages, namely, IBM’s WSFL (Web Services Flow
1.4. MOTIVATION 7
Language) and Microsoft’s XLANG. Thus, it combines the best features of a block
structured process language (XLANG) with those of a graph-based workflow language
(WSFL).
The XML based language includes tags for hierarchical control flow constructs
such as 〈sequence〉, 〈if〉, 〈while〉 etc. as well as models graph like behavior with
the help of the notion of links that are allowed to synchronize various activities
inside a flow activity. BPEL has been proven to be more expressive than most
other workflow modeling languages [59]. It is able to capture some of the workflow
patterns exclusively because of the power it derives from the synchronization links
and the transition and join conditions one can impose on these links. Moreover,
it provides support for fault handling and also backward recovery of long running
transactions through fault handlers and compensation handlers respectively. It also
allows event driven programming in some sense with constructs like receive, pick
and event handlers. All of these make BPEL much more powerful than traditional
workflow languages - most of which support only block structured flow constructs.
The reasons cited above necessitate QoS computation for BPEL processes to be
organized differently from that for the traditional workflow systems. BPEL workflows
can be made arbitrarily complex and may not be always decomposable into a com-
bination of simple structured constructs as outlined in Cardoso’s work [11]. Despite,
QoS management of web service compositions being the focal point of research in web
services for the past few years, no attempts have been made to come up with a clean
formulation for determination of QoS of BPEL processes. With WS-BPEL playing a
major role in the realization of the SOA dream, a comprehensive model for the QoS
estimation of BPEL workflows is clearly warranted.
This paper outlines separate strategies for calculating reliability, response time, and
cost for BPEL processes, given the values of these parameters for the component web
services and a few other estimates about the control flow structure. A dependency
graph structure is built by annotating all the activity nodes in BPEL process tree with
1.4. MOTIVATION 8
their dependencies. The QoS parameters are computed thereof at each activity. Reli-
ability for an activity is taken to be the probability that it will successfully complete
execution. It is computed as the product of the probability that all its dependencies
have are successfully completed and the conditional probability of its success given all
preconditions have been satisfied. Response time (modeled as a random variable) is
estimated by tracking for each activity, the cumulative time, measured from the start
of the process, within which it is expected to complete. Expected cost is ascertained
on the lines of the reduction based approach as the sum of costs of all activities after
suitable relaxations by their probabilities of execution.
1.4.2 Fault Tolerance in a WS-World - New Opportunities
Traditionally, software fault tolerance approaches have tried to capitalize on the design
diversity present in the redundant implementations. Although these techniques were
proposed long back in 70’s, they were never used with alacrity because coding diverse
implementations proved to be very costly. In the web services era, cost is less of an
issue since the companies do not have to build the components from scratch - they only
pay a certain fee for them. Also with interfaces for various software modules getting
standardized, a large number of implementations offering the same functionality is
expected to be available - many of them actually diverse in their designs. Also, fault
tolerant constructs will enable the orchestrator to compare web services against one
another by actually running them together for a length of time, before forming an
opinion about their vendors.
Techniques such as N-version programming, recovery blocks have been primarily
devised to make software resistant against design faults. But, in a WS world, service
compositions have to guard against node failures too. The fact that the redundant
components in a fault tolerant design are widely spread out, deployed at geographi-
cally separated hosts; protects the service in case a network partition occurs between
the site of service composition and that of one of its components.
Chapter 2
Related Work
Project management techniques like CPM (Critical Path Method) and PERT (Pro-
gramme Evaluation Review Technique) [26] have been used traditionally to track
performance and cost of business workflows. Orthogonally, techniques like Reliability
Block Diagrams [33] have been used to model reliability of complex systems. In the
recent years, modern workflow management systems provide an integrated platform
to monitor, analyze and improve business processes in terms of the Quality of Service
they deliver. Moreover, the workflow modeling languages available have much more
expressive power and hence better formulations for QoS computation are necessary.
For example, CPM/PERT does not support conditional or loop structures.
In this chapter, we review the state of the art of QoS determination of workflows,
fault tolerance literature in web services and present an overview of WS-BPEL that
has emerged as the workflow modeling language of choice in the web services world.
2.1 QoS in Workflows
2.1.1 Cardoso’s QoS Model
Cardoso’s thesis [11] is the seminal work in literature to demonstrate the importance
of Quality of Service in workflows and propose a framework for estimation of QoS in
web service processes. The thesis presents some insight into fidelity computation and
2.1. QOS IN WORKFLOWS 10
a detailed formulation for the following three QoS dimensions.
1. Task Time (T) : Task response time refers to the time taken by a request to
be processed by a task, measured from the inception of the request. The task
response time includes delay time, which is a sum of queuing delay and setup
time as well as process time.
2. Task Reliability (R) : For modeling reliability, Cardoso takes into consider-
ation two kinds of failures: system failures and process failures. Reliability is
defined as:
R(t) = 1− (SystemFailureRate + ProcessFailureRate)
3. Task Cost (C) : Task cost is taken to be cost incurred by the service provider
when a task is executed and it has also been broken down into two components:
enactment cost and realization cost. The enactment cost is the cost associated
with the management of the workflow system and with the monitoring of work-
flow instances. The realization cost is attributed to the runtime execution of
the task.
Cardoso proposes Stochastic Workflow Reduction (SWR) to arrive at QoS esti-
mates for the overall workflow, provided the QoS values for all tasks in the workflow
are known. The SWR algorithm repeatedly applies a reduction process on various
structured constructs until only one atomic task remains. He introduces reduction
rules for systems listed below:
1. Sequential system: Two tasks ti and tj that are in sequence may be reduced
to a single task tnew with the help of the following reduction formulae. Cardoso
notes that the for such a reduction to take place ti must not a xor/and split
2.1. QOS IN WORKFLOWS 11
and tj should not a xor/and join.
T (tnew) = T (ti) + T (tj)
C(tnew) = C(ti) + C(tj)
R(tnew) = R(ti)×R(tj)
2. Parallel system (and split/and join) : In a parallel system multiple tasks
(t1, t2, . . . , tn) can be concurrently executed after an and split task ta merged
with synchronization in an and join task tb. SWR reduces the system to ta
, followed by a new activity tnew (QoS parameters for it are defined below)
and tb. All incoming transitions to ta and outgoing transitions from tb remain
unaltered.
T (tnew) = Maxi∈{1,2...n}{T (ti)}C(tnew) =
∑1≤i≤n
C(ti)
R(tnew) =∏
1≤i≤n
R(ti)
3. Conditional system(and split/and join) : A conditional system is made
up of tasks (t1, t2, . . . , tn), one of which is possibly carried out subject to satis-
faction of the condition associated with it. The probabilities of execution of the
branches are given by (p1, p2, . . . , pn). These conditional tasks emanate from a
XOR split task and merge in a XOR join task. The reduction collapses all the
conditional tasks into one new task tnew) that is then sandwiched between the
split and the join tasks. However, the reduction is limited by the restriction
that there cannot be any other outgoing transitions from the split task and
incoming transitions to the join task except for the conditional branches. QoS
2.1. QOS IN WORKFLOWS 12
estimates of the task tnew is given below:
T (tnew) =∑
1≤i≤n
pi × T (ti)
C(tnew) =∑
1≤i≤n
pi × C(ti)
R(tnew) =∑
1≤i≤n
pi ×R(ti)
4. Loop System: Cardoso characterizes a loop task (tloop) by its probability (ploop)
of repeating the loop and the probabilities (po1, po2, . . . , pon) of its outgoing
transitions. After removal of the loop, the reduced task (tnew) only has outgoing
transitions with each of their probabilities relaxed by a factor of 1− ploop.
T (tnew) =T (tloop)
1− ploop
C(tnew) =C(tloop)
1− ploop
R(tnew) =(1− ploop)×R(tloop)
1− ploopR(tloop)
5. Fault Tolerant System: A k-out-of-n system with n tasks (t1, t2, . . . , tn), is
modeled along with an and split task and a XOR join task. The tasks in the n
branches can be replaced by a single task tnew and QoS estimated through the
following formulae:
T (tnew) = kthMini∈{1,2...n}{T (ti)}C(tnew) =
∑1≤i≤n
C(ti)
R(tnew) =1∑
i1=0
· · ·1∑
in=0
∑g
(n∑
j=1
ij − k
)
×((1− i1) + (2i1 − 1)R(t1))× · · ·×((1− in) + (2in − 1)R(tn))
2.1. QOS IN WORKFLOWS 13
A
B
D
EC
Figure 2.1: A system that is neither series nor parallel
X O R J o i n T a s k
X O R S p l i t T a s k
C o n d i t i o n a l T a s k s
S u c h a t r a n s i t i o ni s n o t a l l o w e d
Figure 2.2: Synchronization with links
where, g(x) takes up either 0 or 1 for values x > 0 and x ≥ 0 respectively
Limitations
There is a restrictive rider added with most of the reduction rule-sets given above.
For example, in the sequential system the start task cannot be a split and the end
task cannot be a join. Cardoso’s model adapts from the reductions used in standard
reliability theory for computing reliability of series-parallel systems [33, 27]. But the
model is not capable of handling complex systems such as the one shown in Figure 2.1
that can neither be reduced to a series nor decomposed as a parallel system. Reliabil-
ity modeling of such systems can be performed using various approaches such as path
tracing, state enumeration, decomposition method and cut set method. Again, only
structured programming constructs have been considered here. Presence of goto-like
transitions that extend from one structured construct to another as shown in Figure
2.2 prevent application of the proposed reductions. Cardoso treats QoS for a task as
a deterministic value. But due to the uncertainty that is generally associated with
web services, especially if they are not owned by the integrator’s enterprise, it is more
2.1. QOS IN WORKFLOWS 14
realistic to model QoS probabilistically.
2.1.2 Other Efforts
Hwang et. al. [28] propose a probabilistic framework for QoS computation. They
extend Cardoso’s model to have each QoS parameter for a web service represented by
a discrete random variable having a certain probability mass function (PMF). The
work discusses how to efficiently aggregate PMFs of a number of random variables
over different domains and contrasts greedy and dynamic programming approaches
for sample space reduction in the aggregation problem.
Canfora et al. [9] apply Cardoso’s QoS model with minor modifications in their
middleware that uses genetic algorithms for QoS aware composition and replanning.
They make use of estimated number of loop iterations in the reduction for a loop
rather than unfolding loops based on probability of repeating the loop. They also
introduce availability as a parameter with exactly same reduction rules as reliability.
Zeng et al. [61] model composite web services as state charts and put forward aggre-
gation functions to ascertain QoS of execution plans. The critical path of all possible
execution paths is determined and the duration of that path is taken to be the dura-
tion of the composite service. Estimation of successful execution rate and availability
taken into consideration only critical tasks based on the assumption that non-critical
tasks can be re-executed successfully without altering the QoS characteristics of the
final response. The model is simplistic since the state charts are allowed to have only
two types of compound states, namely, AND states and OR states, with no provision
for conditional or loop constructs. Menasce [43] presents a formulation to determine
throughput in composite web services from the flow graph that lists the web service
invocations. Jaeger et al. [35] propose aggregation of QoS dimensions on the workflow
patterns listed in Van der Aalst’s seminal work [1]. The approach is an elegant one
but the authors do not explain how to dig out such workflow patterns from a process
an to carry out an implementation of the same. Model-driven computation of QoS
has gained attention of late. We briefly look at some of the techniques used in the
section below.
2.2. FAULT TOLERANCE 15
Model Driven QoS Computation
D’Ambrogio and Bocciarelli [16] propose a model driven approach wherein a BPEL
process is described by an UML (Unified Modeling Language) model, extended ac-
cording to the UML Profile for Automated Business Processes [3]. The UML model
is then annotated with performance data and a LQN (Layered Queueing Network)
model is obtained, which is solved to predict performance of the BPEL process. Al-
though the process of conversion of models built according to the UML Profile into
BPEL has been thoroughly described in [3], the complex control flow offered by BPEL
have not been exhaustively mapped back onto the UML profile. However, BPEL to
UML transformation is an active research topic [48].
2.2 Fault Tolerance
Traditionally, software fault tolerance research has revolved around two approaches
- N-Version programming formulated by Avizienis [6] and Randell’s Recovery Blocks
[46]. N-version programs run multiple implementations providing the same function-
ality but having diverse designs in parallel. Thereafter, majority voting is done to
obtain the result of the programs. Recovery blocks invoke the redundant alternate
implementations sequentially if the output produced by one cannot pass assertion
checks. These two fault tolerant constructs have been studied in various contexts.
Reliability improvement that may be possibly derived through the use of these con-
structs have been analyzed [21, 20, 15, 44]. Other topics that have gained interest in
fault tolerance research include how to provision voting mechanisms [45] and generate
acceptance tests [4].
The notion of dependability has been studied at various levels in the area of web
services. Figure 2.3 lists the different levels where various standards exist to model
behavior of web services. Reliability of message exchanges has been documented in
OASIS standards such as WS-Reliability and WS-ReliableMessaging. A host of se-
curity standards (WS-Security, WS-Trust, WS-SecurityPolicy etc. make use of avail-
able encryption techniques to model integrity and confidentiality in SOAP messages.
2.2. FAULT TOLERANCE 16
W S - R e l i a b l eM e s s a g i n g ,W S - R e l i a b i l t y
W S - S e c u r i t y +
H T T P S
W S - T r a n s a c t i o n W S - C o o r d i n a t i o n
W S - B P E LF a u l t H a n d l e r sC o m p e n s a t i o nH a n d l e r s
R e d u n d a n c y i nW S c o m p o n e n t s
N e t w o r k L e v e l
M e s s a g e L e v e l
T r a n s a c t i o n L e v e l
C o m p o n e n t L e v e l
Figure 2.3: Dependability Stack in Web Services
WS-Transaction and WS-Coordination define mechanisms for transactional interop-
erability between Web services domains and seeks to ratify transactional qualities of
service in web services applications. At the level of web service composition, we have
WS-BPEL constructs such as fault handlers and compensation handlers that help the
designer incorporate error handling and backward recovery in case of failures. The
block at the top right corner signifies our focus area of research, i.e., how to make
use of redundant components to improve dependability. No web services standard
exists as of date in this area. However, there have been some research efforts that use
redundancy as a tool for dependabilty enhancement.
2.3. WS-BPEL 17
2.3 WS-BPEL (Business Process Execution Lan-
guage)
2.3.1 Evolution
Web services aim at providing an environment that supports flexible integration of
business processes implemented as heterogeneous systems and in diverse platforms
across enterprise boundaries. A standard process integration model is essential to let
business processes and applications carry out complex interactions that are often long
running.
A host of standards have come up in the process integration space, each sup-
ported by a company or a standards body. Microsoft’s XLANG (Web Services for
Business Process Design) [52], one of the earliest process modeling languages, is block-
structured with basic control flow structures such as sequence, switch, while, all (for
parallel routing), and pick (for forking activities based on timing or external triggers).
IBM’s WSFL (Web Service Flow Languages) is a unique graph based language, offers
capabilities to represent control flow as directed graphs that can be nested but must
be acyclic. Again it derives most of its control flow constructs from the workflow lan-
guage of IBM’s MQ Series Workflow. Web Service Choreography Interface (WSCI) is
an XML-based interface description language that describes the flow of messages ex-
changed by a Web Service participating in interactions with other services. WSCI was
conceived and developed by BEA, SAP, Intalio and Sun Microsystems. Intalio pro-
moted the Business Process Management Initiative (BPMI.org) which came up with
BPML (Business Process Markup Language). ebXML (Electronic Business using eX-
tensible Markup Language) contains BPSS (Business Process Schema Specification),
which is yet another workflow language with similar capabilities. The plethora of
standards, most of which are overlapping and add no real value, have contributed in
great measure only to WSAH (Web Services Acronym Hell).
Business Process Execution Language (BPEL), combines the capabilities of both
XLANG and WSFL. BPEL 1.0 was jointly developed by IBM, BEA, SAP, Siebel,
2.3. WS-BPEL 18
and Microsoft in August 2002. In April 2003, BPEL 1.1 [53], which came to be
known as BPEL4WS, was submitted to OASIS for ratification. WS-BPEL 2.0 was
approved as an OASIS standard in April, 2007 by a technical committee with rep-
resentatives from 37 different organizations. Two other standards, WS-Coordination
and WS-Transaction strengthen to BPEL’s cause in supporting long running trans-
actions. They lend a framework for distributed processes to interact and let ACID
transactions to happen between business activities. Also, WS-BPEL utilizes several
XML specifications: WSDL 1.1, XML Schema 1.0, XPath 1.0 and XSLT 1.0. Due to
BPEL’s greater expressive power (see a comparison of various workflow languages in
[57] in terms of the different workflow patterns supported by them) and the patronage
it received from the two giants IBM and Microsoft, it eventually managed to leave
the pack behind to emerge as the choicest of all web service composition languages.
2.3.2 Brief Overview
WS-BPEL is intended for modeling two types of processes: executable and abstract
processes. An abstract process may hide some of the concrete operational details that
are required by an executable artifact and can thereby serve as a process template
capturing the process logic embodying the domain specific best practices. WS-BPEL
lays down a grammar for capturing the behavior of a business process based on inter-
actions between the process and its partner processes. The notion of 〈partnerLinks〉is used to model peer-to-peer conversational partner relationships. Again, the actual
partner service may be dynamically determined within the process.
A WS-BPEL process specification is analogous to a flow-chart. Each element in
the process is called an activity. An activity can be of two types: basic or structured.
Basic activities either describe interactions with other partners or model primitive
steps in the process. Structured activities encode control-flow logic and can have
other activities nested in them. WS-BPEL 2.0 has nine different basic activities that
are listed below:
1. invoke: An 〈invoke〉 activity is used to call operations (either one way or
2.3. WS-BPEL 19
request-response) embodied in web services offered by partners of the business
process being described.
2. receive: A 〈receive〉 activity is used to accept inbound messages from part-
ners of the service being provided by the business process. A receive activity
annotated with createInstance = ”yes” denotes the starting point of execution
of the service.
3. reply: A 〈reply〉 activity is used to send a response to a request previously
accepted through an inbound message activity such as receive or pick.
4. assign: An 〈assign〉 is used carry out updates on variables.
5. throw: A 〈throw〉 activity is used to signal an internal fault explicitly.
6. wait: A 〈wait〉 activity forces the process to be delayed for a certain period of
time or wait until a certain deadline is reached.
7. exit: A 〈exit〉 activity immediately ends the business process instance terminat-
ing all running activities without execution of any fault handlers or termination
handlers.
8. rethrow: A 〈rethrow〉 activity is used inside fault handlers to rethrow the fault
they caught, propagating the fault name and the fault data of the original fault.
9. empty: An 〈empty〉 activity does nothing but sometimes finds use as a syn-
chronization point or for supression of faults.
Apart from these basic activities, WS-BPEL has a provision for addition of new
activities using the tag 〈extensionActivity〉. WS-BPEL also enumerates seven differ-
ent structured activities.
1. sequence: A 〈sequence〉 activity contains one or more activities that are per-
formed sequentially, in the order in which they appear within the 〈sequence〉element. The 〈sequence〉 activity completes when the last activity nested within
in the sequence has completed.
2.3. WS-BPEL 20
2. flow: A 〈flow〉 activity provisions execution of activities concurrently and also
allows for synchronization between the activities contained in it through the
notion of links. The 〈flow〉 completes on completion of all activities nested
directly within the flow. However, skipping execution of activities within a flow
is allowed if their enabling conditions evaluate to false.
3. if: An 〈if〉 activity consists of one or more conditional branches defined by
the 〈if〉 and optional 〈elseif〉 elements, followed by an optional 〈else〉 element.
The first branch whose 〈condition〉 holds good is taken, and the activity nested
within it is executed.
4. pick: A 〈pick〉 activity waits for the occurrence of exactly one event from a
set of events and executes the activity contained within that event. The events
can either be receipt of inbound messages (〈onMessage〉) or triggering of timer
based alarms (〈onAlarm〉).
5. while: A 〈while〉 activity is one of the three constructs for provisioning loops.
The activity contained in 〈while〉 is executed until the 〈condition〉 at the start
of the loop evaluates to false.
6. repeatUntil: In a 〈repeatUntil〉 activity, the contained activity is executed
until the given 〈condition〉 becomes true.
7. forEach: A 〈forEach〉 activity provides a loop structure controlled by an
implicit index variable that is initialized to 〈startCounterV alue〉 and ends in
〈finalCounterV alue〉. The number of iterations can be further limited by spec-
ifying a 〈completionCondition〉 wherein one can force the construct to happen
”atleast K-out-of-N” times, where K is the unsigned integer value given by the
〈branches〉 expression. The value of the parallel attribute lends an unique fea-
ture where the ”iterations” of the loop can happen in parallel. In case the
parallel attribute is set to ”yes”, the nested 〈scope〉 is replicated as many times
as the number of iterations of the loop and the index variable takes up val-
ues from 〈startCounterV alue〉 through 〈finalCounterV alue〉 in each of these
2.3. WS-BPEL 21
branches.
WS-BPEL’s notion of a 〈scope〉 offers the ability to specify a behavioral context
within which an activity may execute. A scope allows definition of variables, partner
links, message exchanges and correlation sets that are visible only within the scope.
Event handlers, fault handlers, a compensation handler, and a termination handler
may also be attached to a scope. A brief description of these handlers are given below.
• Event Handler: It provisions event driven programming to some degree in
WS-BPEL. The activity associated with an event is executed when the corre-
sponding event is fired. Again events may be either incoming messages or timer
based alarms.
• Fault Handler: The 〈catch〉 or 〈catchAll〉constructs inside a fault handler
help to intercept faults that might occur and specify appropriate measures that
need to be taken to negate their effects.
• Compensation Handler: It allows backward recovery through the compen-
sation logic that it contains.
• Termination Handler: It helps forced termination of a scope by terminating
its primary activity, stop all running event handler instances and then execution
the activity contained in the termination handler.
Chapter 3
QoS Computation
We have already argued in previous chapters that the workflow QoS models available
in literature are geared towards structured programming constructs and illustrated
their inabilities to cope with the graph based patterns supported through BPEL (See
Section 2.1.1). Our QoS model is specifically designed to deal with the complex graph-
like structures that may be written by tapping WS-BPEL’s greater expressive power.
Also, it provides mechanisms to handle fault handlers and event driven programming
that may be embedded in WS-BPEL processes .
The QoS dimensions considered in our framework, namely, response time, reliabil-
ity and cost; are the three most important parameters that all successful companies
must track in their strife to remain competitive [25, 24]. In the approach outlined
here, these QoS dimensions are evaluated at each activity enroute to QoS computation
for the overall BPEL process.
In a reduction based approach for computation of a QoS parameter of a structured
activity (sequence / parallel / conditional / loop), one independently determines
the values of the parameter for all child activities and then composes them using
an appropriate aggregation function. Since activities in WS-BPEL may be heavily
intertwined with synchronization links, independent computation of QoS parameters
followed by aggregation is not possible. We follow a more direct approach to infer QoS
of an activity by tracking all the dependencies that the activity may have. The term
dependencies of an activity refers to other activities that play a role in determining
23
when the activity may start. Effectively listing the dependencies of all activities in a
BPEL process is central to our approach and helps us to tackle the challenges posed
by the graph based nature of BPEL.
Reliability of a BPEL activity is typified by the probability that it will successfully
complete execution. Here, the term successful completion encompasses the activity
delivering its desired functionality to the effect that it measures upto the expecta-
tions of all other activities that might be dependent on it. A successful web service
invocation would mean the service being called with appropriate inputs, the service
being available and it performing the required functionality entirely in conformance
with its semantic implications. Reliability, as given in Service Level Agreements,
refers to the conditional probability that the web service will succeed provided it
gets a valid input. Again, network failures may jeopardize arguments on their way
to the service provider’s site and prevent results from reaching the client. Now, the
WS-composition can be assured to supply an appropriate input to a constituent web
service if all other activities on the execution path leading upto the point of invocation
have done their jobs properly. Thus, in order to derive the probability of successful
execution of a web service, one needs to multiply the conditional probability of its
success with the probability that all its dependencies have been successfully com-
pleted. The QoS model computes the probability of successful completion for each
activity in the BPEL process.
Response Time is modeled as a random variable characterized by values for mean
and standard deviation. The expected time taken for completion if the activity com-
pletes in a run of the BPEL process is estimated for each activity. All times are
measured relative to the start of the BPEL process. Response time of the overall
BPEL process can be taken to be the expected completion time of its main activity.
We follow the steps given below to compute the end time, ETX , of an activity X:
1. The expected time by which all the dependencies of the activity are complete
and the activity is ready to start, STX is determined.
3.1. PASSPORT APPLICATION SERVICE EXAMPLE 24
2. The end times of all the child activities (activities nested within X), are esti-
mated.
3. The end times of all the child activities and STX are suitably aggregated to
compute ETX .
Cost is computed more on the lines of the reduction based approach that is preva-
lent in QoS models for traditional workflow systems. For each activity, we compute
the probability PC that the activity will start execution given that the parent ac-
tivity starts. Broadly speaking, expected cost for an activity is calculated as sum of
costs of all its child activities after relaxing them by the respective PCs of the child
activities.
3.1 Passport Application Service - A Running Ex-
ample
We digress to introduce a running example of an online passport application service.
The passport office workflow is implemented with the help of WS-BPEL and deployed
as a web service. In this section, we briefly look at how the WS-BPEL constructs may
be effectively used to model the passport office workflow that calls several external
web services hosted by other government departments. This example will be cited at
several places throughout this thesis to elucidate various concepts.
Before a passport may be issued, the date of birth and the place of residence needs
to be verified. The passport application offers a choice for an age-proof document -
the applicant may either furnish the transcript issued by his/her education board or
his/her birth certificate or both. Depending upon the type of age-proof its verification
process will be delegated to either of the education board or the municipal office. The
address proof will always be verified by the municipal office. After verification of age
and address, one may proceed to issuing a passport only if the bank payment has
been made by the applicant.
3.1. PASSPORT APPLICATION SERVICE EXAMPLE 25
E d u c a t i o n B o a r d D O B V e r i f y
< i n v o k e > < i n v o k e >
M u n i c i p a l B o a r d D O B V e r i f y
M u n i c i p a l B o a r d A d d r e s s V e r i f y
< i n v o k e >
< l i n k n a m e = " X " > < l i n k n a m e = " Y " > < l i n k n a m e = " Z " >
< t r a n s i t i o n C o n d i t i o n >
v e r i f y R e s u l t = t r u e
< t r a n s i t i o n C o n d i t i o n >
v e r i f y R e s u l t = t r u e
< t r a n s i t i o n C o n d i t i o n >
v e r i f y R e s u l t = t r u e
< f l o w >
< i f >
< c o n d i t i o n > $ B a n k P a y m e n t = " P a i d "
< i n v o k e >
P a s s p o r t O f f i c eM a k e P a s s p o r t
< j o i n C o n d i t i o n >
( $ X o r $ Y ) a n d $ Z
Figure 3.1: Passport Office BPEL Process
In our example, all the verification processes and the passport issue process are
deployed as web services. The passport office workflow composes these web services
through WS-BPEL. A part of the BPEL process is illustrated graphically in Figure
3.1. The workflow logic described above is encapsulated in a flow activity and the
control flow is captured through the notion of synchronization links. The links named
X, Y and Z are used to ensure that the web service to issue passport may be started
only after the web services for verification have successfully completed execution.
Further, the 〈transitionCondition〉s defined at the source of each link guarantee that
the link may be taken (the status of the link evaluates to true) only if the furnished
document has been successfully validated. The boolean expression on the links defined
as the 〈joinCondition〉 at the invoke for passport issue web service examines whether
at least one age-proof is valid and the given address proof has been verified. The if
condition checks whether the bank payment has been done or not. The status of the
3.2. QOS MODEL USAGE 26
bank payment is set by some means that is outside the scope of this example.
It may be noted that the links enter the if activity from outside and control the
execution of an activity nested within it. This is an example of the goto type of
links, illustrated in Figure 2.2, that are not advocated by structured programming
paradigm.
3.2 QoS Model Usage
This section will answer questions like: ”Who is the targeted user?”, ”Does the model
find use at design time or run time?”, ”Does the model add any value if the dream of a
marketplace is not realized?”, ”What infrastructure should be in place for the model
to work?”, ”What are the inputs to the model and how would one obtain them?” etc.
3.2.1 User Environment
A tool that implements the QoS model for WS-BPEL will come as a boon to the
population of web service brokers who compose web services in BPEL out of readily
available web services. The web service designer is always striving to improve upon
quality whilst staying within his budget. At the very least, the orchestrator has to
ensure adherence to the QoS limits to be stated in the SLA that he may sign with his
clients. He tries out several combinations of binding concrete services to tasks in the
workflow and wants to have an idea of the QoS rendered by the composite service for
each of these combinations. This process of arriving at the optimum orchestration
may be automated [61, 9], but even there one requires estimation of QoS of the WS-
composition. Again, once the BPEL process has been deployed its QoS has to be
constantly monitored and re-orchestration might become necessary. Thus, the tool
may come in handy at run-time if a need arises for dynamic reconfiguration of the
deployed BPEL process.
As pointed out in earlier chapters, one of the primary motivations of such a design
lies in the fact that a marketplace for services will evolve where several competing
services will be available providing the same functionality. The dream of having
3.2. QOS MODEL USAGE 27
such a marketplace has taken a setback with the three giants, IBM, Microsoft and
SAP, discontinuing their public UDDI registries since early 2006. However, even the
internal repositories of large companies can be expected to have duplicate services
for the same functionality, each varying in their costs and correspondingly offering a
different degree of QoS. Thus, different web services may have to be chosen for the
same task in different projects depending upon the quality and cost requirements at
hand - our tool will simply facilitate such a selection.
3.2.2 Inputs to the Model
The web service composition in BPEL may include invocations of various other web
services and the service composer may have entered into service level agreements with
the providers of the constituent web services. The WS-composition can be assured
to have the component web services comply with the QoS limits mentioned in their
SLAs. The model presented here expects values of response time, reliability and
cost for all web service invocations of the BPEL process. The limits on these QoS
parameters stated in the SLAs after accounting for the failures and delays introduced
by the network can serve as inputs to the model. The input for reliability of a web
service is a number between 0 and 1. Response times inputs are specified by a 2-tuple
- (Mean, Standard Deviation). Costs are taken as fixed numeric values.
Average waiting times for all activities used to intercept inbound messages to
the business process are required by the model. Waiting time for receive is taken
to be the amount of time elapsed between the completion of all its dependencies
and the completion of the receive activity. It is assumed that if the process has
already received the corresponding incoming message, then the receive activity will
instantaneously complete, otherwise the particular thread of execution will halt and
can resume only after the message is received. Similarly, we require the average
waiting times for the onMessage events inside a pick activity or an event handler.
Waiting time of an onMessage event is taken to be the time between the start of the
parent pick activity or event handler and the instant when the corresponding message
arrives, if the message does arrive. Such average waiting times may be obtained from
3.3. MODEL PRELIMINARIES 28
logs of past executions of the business process collected by monitoring mechanisms
integrated with the BPEL engine.
The model also assumes certain parameters that characterize the control flow of
the business process to be available as inputs to it. These have been listed below:
• For every branch in an if activity, the probability of a branch being taken.
• For all events (both message based and timer based) attached to pick activities,
the probabilities that they get fired.
• For each transitionCondition, the probability of its success.
• For every catch or catchAll block, the fraction of failures of its parent scope
that it is able to catch.
• For all loops, the average number of iterations taken by them in an execution
of the business process.
• For all message based events in event handlers, the average number of times
they are fired in one execution of the associated scope.
Again, all of these attributes may be extracted from the execution logs of the business
process.
Our passport application example (See Section 3.1) will require as inputs the QoS
parameters, Reliability, Response Time and Cost for each of the four component web
services. Moreover, we will have to supply probabilities of the transition conditions
(defined at sources of links X, Y and Z) evaluating to true, and the probability that
the variable bankPayment attains the value “paid”.
3.3 Model Preliminaries
All computations in our QoS model happen at the level of a activity or a scope or a
handler, which are various units of encapsulation of process logic in WS-BPEL. The
3.3. MODEL PRELIMINARIES 29
BPEL workflow is represented as a graph where the activities/scopes/handlers are
represented by nodes. Naturally, the model maintains a host of parameters for each
node in the WS-BPEL process. A node, X, in the BPEL process is annotated by: (a)
its child activities, i.e., activities that are directly contained by X (b) its dependencies,
i.e., activities that must necessarily complete before X can start. Additionally, invoke
activities and scopes are attached to catch blocks, fault handlers and compensation
handlers that they may be associated with.
Dependencies and Join Conditions
Here, we formally define dependency of an activity and also categorize the nature of
a dependency. A unit of execution X in WS-BPEL to be dependent on another unit
Y , if either of the following holds:
• Type A: Y is a structured activity (cannot be a loop) / scope / catch block
/ event and X is any activity / scope / event contained in it. In this case, if
Y is a sequence then X can only be the first activity within it; if Y is a flow
then X (a child of Y ) must not contain any of the links defined under Y as its
incoming links. In other words, X is an activity that marks the beginning of
execution of a sequence or a flow. Loop constructs viz. while, repeatUntil and
forEach are handled differently and thus the activity enclosed by a loop does
not have a Type A dependency.
• Type B: X is the ith child of a sequence and Y is (i − 1)th child of the same
sequence, i.e., an activity is dependent on a prior activity in sequence provided
the dependent is not the first activity in sequence
• Type C: Y is the source of a link whose target is X.
It may be noted that Type B and Type C dependencies qualify as control dependencies
because they imply completion of Y as a necessary requirement for X to start. In
Type A dependencies, Y must start before X can but X should complete before Y
does. In the BPEL process for passport application, the 〈invoke〉 for the passport
3.3. MODEL PRELIMINARIES 30
issue web service has the enclosing 〈if〉 activity as its Type A dependency and the
three other 〈invoke〉 activities as its Type C dependencies. The〈invoke〉 activities for
verification and the 〈if〉 activity have the 〈flow〉 as their Type A dependency.
A join condition for an activity contains a boolean expression defined on the incom-
ing links for the activity. An activity may start only if its join condition evaluates to
true. If no join condition element is present, a disjunction (logical-OR) of all incom-
ing links is taken as default. A transition condition on a link is defined at its source
and refers to the condition defined on variables that must be passed for the link to
attain a true value. The probability that a link assumes a true value is dependent
on the successful completion of the source activity and the transition condition being
evaluated to true.
P (linki = true) = P (successsourcei).P (transitionCondi = true) (3.1)
The suppressJoinFailure attribute is instrumental in defining the behavior of ex-
ecution if a joinCondition is evaluated to false. If the value of the attribute is yes,
then the activity is skipped on the face of its joinCondition being false, but no
bpel:joinFailure is generated. A false status is assigned to all outgoing links of the
skipped activity. However, the default value of suppressJoinFailure is no. For our
purposes, we assume the suppressJoinFailure attribute for all activities to be set to
no and all statements made (if not explicitly mentioned otherwise) are based on such
an assumption. Section 3.4.2 looks at modeling QoS if the attribute has a yes value.
Evaluation of Join Condition
The QoS model will estimate the probability that an activity starts after successful
completion of all its dependencies. For such an estimation to happen, it becomes
imperative to compute the probability that the join condition for an activity evaluates
to true.
First, the boolean expression of the join condition is converted to a canonical Sum
3.3. MODEL PRELIMINARIES 31
of Products (SOP) form. A generalized canonical join condition may be written as:
(A11 ∧ A12 ∧ . . . ∧ A1n1) ∨ (A21 ∧ A22 ∧ . . . ∧ A2n2)
∨ . . . (Am1 ∧ Am2 ∧ . . . ∧ Amnm)
where, Aij represent status of links, ∧ stands for logical-AND and ∨ denotes logical-
OR.
We assume the events of success of any combinations of incoming links to be indepen-
dent. Thus, the probabilities of the links may be multiplied to obtain the probability
of the ANDed portions being true.
P (Ai1 ∧ Ai2 ∧ . . . ∧ AiNi) = P (Ai1 ∩ Ai2 . . . ∩ AiNi
)
= P (Ai1).P (Ai2). . . . .P (AiNi)
Note: Here, P (Ai) is used as a shorthand for P (Ai = true).
Then, we can compute probability of success of the join condition by applying the
axiom given below that would help us to deal with the ORed parts of the canonical
expression.
P (A1 ∨ A2 ∨ . . . ∨ An) = P (A1 ∪ A2 ∪ . . . An)
=n∑
i=1
P (Ai) +∑
i 6=j
P (Ai ∩ Aj) + . . . + P (A1 ∩ A2 ∩ . . . ∩ An)
Again, we assume independence of status of the incoming links and thus can apply:
P (n⋂
i=1
Ai) =n∏
i=1
P (Ai)
The invoke activity for the passport issue web service has a join condition: (X∨Y )∧Z.
First we convert it into its SOP form: (X∧Z)∨(Y ∧Z). If the probabilities of the links
evaluating to true are known, we can compute the probability of the join condition
3.3. MODEL PRELIMINARIES 32
attaining a true value as follows:
P ((X ∨ Y ) ∧ Z) = P ((X ∧ Z) ∨ (Y ∧ Z))
= P ((X ∩ Z) ∪ (Y ∩ Z))
= P (X).P (Z) + P (Y ).P (Z)− P (X)P (Y ).P (Z)
For an activity with incoming links (having Type C dependency) to start, the status
of all its incoming links must be determined and then the join condition (implicit or
explicit) must evaluate to true. The scopes and the structured activities also play a
role in deciding when an activity contained in them may be ready for execution. The
parent scope or parent structured activity must start for a child to start execution.
Again, an activity in sequence can only start if the prior activity has completed. Thus,
the probability that an activity X will start, P (startX), is computed as a product of
the probabilities introduced below:
• P (depAX): Probability that the parent activity or scope will start, P (startparent),
if X has Type A dependency and 1 otherwise.
• P (depBX): Probability that the previous activity Y in sequence will successfully
complete, P (successY ) , if X has Type B dependency and 1 otherwise.
• P (depCX): Probability that the joinCondition is true, P (joinCondX = true),
if X has Type C dependency and 1 otherwise.
P (startX) = P (depAX).P (depBX).P (depCX) (3.2)
The passport issue invoke activity in our example has only Type A and Type C
dependencies. Thus, the probability that it may start may be computed as:
P (StartpassportIssue) = P (Startif )× P ((X ∪ Y ) ∩ Z)
The reader should note that P (startX) refers to the probability that the activity
X will start with all preconditions met, i.e, at the point of execution of X the system
3.4. RELIABILITY MODELING 33
is in a state that is in absolute conformance with the expectation of X. For a more
formal description of how to compute P (startX) see Algorithm 1
Algorithm 1 findProbStart(X) : Finding the probability that an activity X willstart
probStart ← 1.0if X has a Type A dependency then
if P (startparent(X)) is not known thenfindProbStart(parent(X))
end ifprobStart ← probStart× P (startparent(X))
end ifif X has a Type B dependency then
if P (successpredecessor(X)) is not known thenfindProbSuccess(predecessor(X)) {See Algorithm 4}
end ifprobStart ← probStart× P (successpredecessor(X))
end ifif X has a Type C dependency then
for all Y such that X has a Type C dependency on Y doif P (successY ) is not known then
findProbSuccess(Y )end if
end forCompute P (joinCondX = true)probStart ← probStart× P (joinCondX = true)
end ifP (startX) ← probStart
3.4 Reliability Modeling
In our pursuit to estimate the QoS parameters, we first compute the reliability of
the BPEL process. As mentioned earlier, by reliability we refer to the probability of
successful execution, and we determine this probability, P (successi), for each activity
i, in the BPEL process. Again, successful completion encompasses a behavior that is
both syntactically correct and semantically desirable.
Algorithm 4 shows that in order to compute the probability of success of an activity
3.4. RELIABILITY MODELING 34
X, we first compute P (startX) and then if X is a structured activity, determine
P (success) for all its children. However, the computation of P (success) involves
steps that are specific to each activity or scope or handler.
Algorithm 2 findProbSuccess(X) : Finding the probability that an activity X willsuccessfully complete
findProbStart(X)for all Z such that Z is a child of X do
if P (successZ) is not known thenfindProbSuccess(Z)
end ifend for· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·Activity-Wise Computation of P (successX)· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
3.4.1 Activity-Wise Reliability Computation
In our QoS model, invocations of external web services are considered to be the only
activities that may be regarded as source of failures. All basic activities except invoke
are assumed to complete successfully provided they do start. Hence, their probability
of successful completion is equal to the probability that they get to execute. Thus,
we have,
P (successX) = P (startX) (3.3)
where X may be one of the following:
(a) receive
(b) reply
(c) assign
(d) wait
(e) throw
3.4. RELIABILITY MODELING 35
(f) rethrow
(g) exit
(h) empty
(i) compensate
(j) compensateScope
Sequence
A sequence said to be complete when the last activity contained in it completes
successfully. An activity nested inside a sequence can only start if the previous
activity in the sequence has been successful. Thus, if the ith child of sequence is
executing, then all child activities from the first to the (i − 1)th can be taken to be
complete . Therefore, we can model success of a sequence by:
P (successsequence) = P (successlastChild) (3.4)
Flow
A flow activity is deemed to complete only if all activities enclosed by it are complete.
If the suppressJoinFailure attribute is set to no, the flow terminates because of a
joinCondition being evaluated to false and a bpel:joinFailure is thrown to the enclosing
scope. The notion of activities being skipped and the flow still completing is applicable
only when the suppressJoinFailure has the value yes. We track Type C dependencies
and estimate the probability of joinCondition being true, to figure out whether an
activity may start. The synchronization links in effect model the control flow of
execution. Thus, it may be contended (with a similar argument to that posed in case
of sequence) that the completion of all child activities without any outgoing links
would mark the completion of the flow. Therefore, the probability of success of a flow
3.4. RELIABILITY MODELING 36
activity may be written as:
P (successflow) =∏
∀iP (successflowSinki
) (3.5)
where, flowSinki is a child activity of flow with no outgoing links, i.e., an activity
where the 〈sources〉 element is absent. In our passport application example, the 〈if〉activity is the sink since it does not have any outgoing links. Thus, P (successflow) =
P (successif )
If
An if activity is complete when the activity nested in the taken branch completes.
It completes immediately when no condition evaluates to true and no else branch is
specified. In order to elucidate the model, we assume a else branch with an empty
activity to be inserted where no else exists. The probability with which a branch is
taken P (branchTakeni) is obtained as an input. The probability of selection of the
else branch is computed as
P (branchTakenelse) = 1−n−1∑i=1
P (branchTakeni)
where, n is the total number of branches in an if activity.
The probability of success of an if activity is calculated as a weighted sum of
the probabilities of success of the activities contained in all its branches, the weights
being the probability with which a branch gets selected for execution.
P (successif ) =n∑
i=1
P (branchTakeni)× P (successbranchActivity) (3.6)
The 〈if〉 activity in our example does not have an else branch. Thus, we simulate
an else branch, which is selected with a probability that the bank payment is not
done.
3.4. RELIABILITY MODELING 37
Pick
Pick activities are treated in a similar way as if activities; with probabilities of
selection of each event being taken as input. Since, WS-BPEL stipulates that exactly
one event should be executed when a pick is started, the inputs should be validated
to check whether they add upto 1.
P (successpick) =n∑
i=1
P (eventSelectedi)× P (successeventActivity) (3.7)
where, n is the total number of events in an pick activity.
Loops
QoS modeling for loops follows a reduction based approach in the sense that com-
putation of QoS parameters for the child activity may be performed independently
without requiring QoS information of its parent activity. This is facilitated by the
WS-BPEL stipulation (see WS-BPEL 2.0 Static Analysis requirement SA00070) that
synchronization links cannot enter into repeatable constructs by crossing their bound-
aries. Note that we have not included loops in the list of structured activities that
can lead to a Type A dependency because they are dealt with differently.
Loops are handled by unfolding them to the number of iterations that they make.
In case of while and repeatUntil, the number of iterations is taken as an input, pre-
sumably obtained from execution logs. In a forEach activity, the number of iterations
is taken to be either of the following:
numIterations =
B, unsigned int value of completionCondition, if it exists,
〈finalCounterV alue〉 − 〈startCounterV alue〉+ 1 otherwise.
The various iterations of the loop are assumed to be independent and in effect the loop
construct is captured as a number of copies of the contained child activity running
in sequence. Hence, the reduction applied to compute the reliability of loops can be
3.4. RELIABILITY MODELING 38
formulated as:
P (successloop) = P (startloop)× P (successchild)numIterations (3.8)
Invoke
Invoke activities denote the point of calling external web services that may be prone
to failures. For each invoke activity, the model expects (as input) the reliability of the
web service bound to it. SLAs list the rate of failures that may occur even though the
web service is fed with proper inputs. Apart from failures at the site of the provider,
one also has to take into account network failures. The QoS model demands as input,
the conditional probability Rws that an invocation to an external web service will
fail despite the call being made with proper arguments. At any point of time during
execution, if the business process is running then it is implicit that the system is in
a consistent state. Thus, if the process is active at the point of invocation it may be
assumed that it provides “correct” inputs to the callee web service. Now, probability
of a successful invocation can be formulated as below:
P (success′invoke) = P (success′invoke|proper inputs supplied)× P (proper inputs supplied)
= Rws × P (startinvoke) (3.9)
However, an invoke activity may have catch blocks and compensation handlers
attached to it and completion of an invoke activity would encompass their completion
too. We compute the improved probability of success P (success′′) (See Equation 3.16
of invoke after estimation of the number of faults that may handled through catch
blocks. We may write the expression for probability of success of an invoke activity
after incorporating the same for the attached compensation handler (see Equation
3.17) if there is any.
P (successinvoke) = P (success′′invoke)× P (successcompensationHandler) (3.10)
3.4. RELIABILITY MODELING 39
Scope
A scope provisions attachment of fault handlers, compensation handlers, event han-
dlers and termination handlers that together set the context within which the primary
activity of the scope may run. Reliability computation in scopes is done in the same
way as it is performed for the invoke activities. In the absence of any handler, the
probability of success of a scope is given by that of its child activity.
P (success′scope) = P (successscopeChild) (3.11)
The catch blocks inside a fault handler may help remove some of the faults, and
thereby improve the reliability of the scope to P (success′′scope). The following equation
models reliability of a scope in presence of fault handlers, compensation handlers and
event handlers.
P (successscope) = P (success′′scope)×P (successcompensationHandler)×P (successeventHandler)
(3.12)
The above equation when applied to the process (that is nothing but a special form
of scope) will give the reliability of the composite web service written in BPEL.
Fault Handler
The model assumes as input, the fraction of faults caught by each catch block
Fraction-Capture, out of the total number of faults produced by its scope. Note
that even a catchAll block may not have FractionCapture as 1 because there may
be uncaught semantic inconsistencies passed on. Now, the probability that a catch
block is started (after completion of its associated scope, successful or otherwise)
may be given by the probability that the associated invoke/scope fails and a fault
is captured by the catch. Here, we consider the conditional probability of success
of scope/invoke given it starts, in order to capture only those failures that originate
from the web service invocation or from within the child scope and then multiply it
3.4. RELIABILITY MODELING 40
with P (startscope/invoke).
P (startcatchi|startinvoke) = (1−Rws)× FractionCapturei
P (startcatchi|startscope) = (1− P (successscopeChild|startscope))× FractionCapturei
= (1− P (successscopeChild)/P (startscope))× FractionCapturei
P (startcatchi) = P (startcatchi
|startX)× P (startX)
where X = scope/invoke
(3.13)
The probability of successful completion of a catch block P (successcatch) is equal to
that of the activity nested in it. It may be noted that the child activity of a catch
may only have a Type A dependency (Type C dependencies are forbidden vide WS-
BPEL static analysis requirement SA00071) but it may have outgoing links. Now,
the fraction of faults removed FFR by all catch blocks may be computed as:
FractionFaultRemoval(FFR) =∑
∀i(FractionCapturei × P (successcatchi
)) (3.14)
The rate of faults thrown up by the web service invocation or the child scope may be
taken to be:
FaultRateX =
1−Rws if X is an invoke,
1− P (successscopeChild|startscope) if X is a scope(3.15)
After taking into account the fraction of faults removed, the improved probability
of the invoke / scope will stand as below:
P (success′′X) = P (success′X) + FaultRateX × FFR (3.16)
where, X may be invoke/scope
3.4. RELIABILITY MODELING 41
Compensation Handler
Compensation handlers are just a wrapper for an activity that are supposed to pro-
vide backward recovery. A compensation handler may be invoked through a compen-
sateScope or a compensate activity. Again, a compensation handler is only invoked
if the scope or the invoke being compensated is completed, i.e., the fault handlers
successfully catch all faults that may have occurred. Thus, probability that the com-
pensation handler start may be given by:
P (startCH) = P (successX)× P (successY )
where, X can be either compensateScope or compensate; Y can be either scope
or invoke. If the compensation handler is not invoked, there is no chance of any
failures emanating from it and hence the reliability may be taken to be 1. Now,
P (successcompensationHandler) may be computed as the weighted sum of the reliabili-
ties of the two cases - when it does execute and when it does not.
P (successcompensationHandler) = P (startCH)× P (successchild) + (1− P (startCH))× 1
(3.17)
Again, P (successchild) may be obtained independently of activities outside the com-
pensation handler because the child activity can have no Type C dependency (vide
WS-BPEL static analysis requirement SA00070)
Event Handler
In order to model QoS of event handlers one requires information about the number
of times an event may occur. For timer based events one may estimate the number
of times the timer goes off and the event completes within the lifetime of the scope
(see computation of start and end times for scopes in Section 3.5). For message based
events this information has to be derived from execution logs. The probability of
success of an event P (successevent) is given by the same for the scope that is enclosed
within it. The following equation throws some insight into how reliability modeling
3.4. RELIABILITY MODELING 42
may be performed for event handlers.
P (successeventHandler) =numEvents∏
i=1
P (successeventi)numOccuri (3.18)
where,
numEvents = number of events in the event handler,
numOccuri = number of occurrences of eventi
Our QoS model does not deal with the behavior where forced termination may
take place. Hence, termination handlers are outside the scope of this research.
3.4.2 Suppression of Join Failures
The entire exposition above assumes that a bpel : joinFailure is thrown whenever
a join condition is not satisfied. However, WS-BPEL allows skipping of activities
through Dead Path Elimination - a process where if a join condition evaluates to false
and the suppressJoinFailure attribute is set to yes, then the activity is skipped and
a false status is propagated on all its outgoing links with no fault generation. Since a
skipped activity is also deemed to be complete, the expression for the probability of
successful completion has to be suitably modified in such cases. Here the probability
of success of an activity is improved by the probability of the activity being skipped.
P (success′) = P (joinCondition = true)×P (success)+(1−P (joinCondition = true))
(3.19)
However, in the evaluation of P (link = true) (see Equation 3.1), the probability
of successful completion of the source will substituted by old P (success) and not
P (success′) because a failed status is propagated on each of the outgoing links of a
skipped activity.
3.5. RESPONSE TIME MODELING 43
3.5 Response Time Modeling
Response time is tracked by determining for each activity the expected time instant
when it starts and the expected time of its completion. Time is measured from the
start of the BPEL process, i.e., start time of the process scope is taken to be zero.
All time values are represented as random variables characterized by their means and
standard deviations.
Analogous to reliability modeling, here too we present a generalized algorithm for
computation of start times (ST) and then specifically design techniques to estimate
end times (ET) for each activity type. An activity may only start after all its Type
B and Type C dependencies have completed execution. In case the activity has a
Type A dependency, its parent activity should start before it does. The maximum of
all completion times of the Type B and Type C dependencies and start time of the
Type A dependency is thus taken to obtain the start time (ST) for an activity (see
Algorithm 3).
Note: In Algorithm 3, waiting times are taken into account only for those receive
activities that have their createInstance attribute set to no. Otherwise, the receive
activity is a start activity and marks the beginning of the business process. It may
be observed that all the start activities have a start time of zero. Scope, sequence and
flow are the only non-start activities allowed that may not have a control dependency
on start activities (vide WS-BPEL static analysis requirement SA00056) and thereby
they may also attain a zero start time.
Estimation of End Time (ET ) of an activity requires the start time of the activity
and the end times of all its children. Again, computation of End Time (ET ) varies
significantly from one activity to another and is detailed for each activity / scope /
handler in the next section.
3.5. RESPONSE TIME MODELING 44
Algorithm 3 findStartT ime(X) : Finding the Start Time (STX) for an activityX
Create an empty list TimeList of times each represented as (µ, σ)if X has a Type A dependency then
if STparent(X) is not known thenfindStartT ime(parent(X))
end ifAdd STparent(X) to TimeList
end ifif X has a Type B dependency then
if ETpredecessor(X)) is not known thenfindEndT ime(predecessor(X))
end ifAdd ETpredecessor(X) to TimeList
end ifif X has Type C dependency then
for all Y such that X has a Type C dependency on Y doif ETY is not known then
findEndT ime(Y )end ifAdd P (startY )× ETY to TimeList
end forend ifSTX ← Max of all elements in TimeListif X is a message based event in pick or a receive with createInstance = no then
STX ← STX + AverageWaitingT imeX
end if
Algorithm 4 findEndT ime(X) : Finding the End Time ETX for an activity X
findStartT ime(X)for all Z such that Z is a child of X do
if ETZ is not known thenfindEndT ime(Z)
end ifend for· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·Activity-Wise Computation of ETX
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
3.5. RESPONSE TIME MODELING 45
3.5.1 Activity Wise Response Time Computation
Most basic activities (except invoke, receive and wait) may be treated as though they
complete instantaneously. Thus, their start times are equal to their end times.
ETX = STX (3.20)
where, X may be one of the following:
(a) receive
(b) reply
(c) assign
(d) throw
(e) rethrow
(f) exit
(g) empty
(h) compensate
(i) compensateScope
Wait
Wait activity is designed to either introduce a delay for given duration or wait until
a certain deadline is reached. In case an absolute deadline is specified, we need to
know the absolute time when our process starts so that we can obtain the relative
end time of completion as shown below.
ETwait =
STwait + duration , if for specifies delay,
deadline− Absolute Time of Start of Process , if until specifies deadline.
(3.21)
3.5. RESPONSE TIME MODELING 46
Sequence
A sequence is deemed to complete when the last activity enclosed in it ends. Thus,
we have:
ETsequence = ETlastChild (3.22)
Flow
As demonstrated in Section 3.4, the completion of all activities with no outgoing links
contained in a flow shall mark the end point of the flow. Thus, the end time of a flow
may be computed as below:
ETflow = Max∀i(ETsinki) (3.23)
If
The end time of an if activity will refer to its expected time of completion taking into
account the probabilities of execution of each of its branches. It is computed as the
weighted sum of the end times of all the branches, the weights being the probabilities
of taking the branches.
ETif =numBranches∑
i=1
P (branchTakeni)× ETbranchActivityi(3.24)
Pick
Pick activities are treated almost identically to if activities. Here, the weights refer
to the probability of occurrence of the concerned event.
ETpick =numEvents∑
i=1
P (eventOccursi)× ETeventActivityi(3.25)
Loops
The activity nested inside a loop cannot have any of three types of dependencies
(Type A and Type B are disallowed by definition, Type C not possible because of
3.5. RESPONSE TIME MODELING 47
static analysis requirement SA 00070). Thus, the start time of the child activity of
the loop will be evaluated to zero and its QoS computation performed as if it is the
primary activity of a BPEL process. The end times for while / repeatUntil loops are
obtained as:
ETloop = STloop + numIterations× ETloopChild (3.26)
ForEach allows each of its iterations being rolled out in parallel if the parallel
attribute is set to yes. In such a case, we simply take the end time of the forEach
loop to be equal to that of a single execution of the loop child assuming all the
unrolled loop branches executing in parallel to finish in more or less the same time.
Some degree of randomness is already accounted for because the times are represented
through random variables having a certain standard deviation.
ETforEach =
STforEach + numIterations× ETchild , if parallel = no
STforEach + ETchild , if parallel = yes(3.27)
Here, numIterations given by Equation 3.4.1.
Invoke
By setting appropriate parameters one may force the BPEL engine to implement time-
outs in synchronous web service calls made through Invoke activities. The fraction of
timeouts occurring in invocations of a particular web service may be obtained from
logs. Such an input enables us to write the expression for determining the expected
end time of a web service call.
ET ′invoke = STinvoke + (1− fractionT imeout)× Tws + fractionT imeout× TimeOut
(3.28)
where,
3.5. RESPONSE TIME MODELING 48
Tws = Total time taken by the web service call to return
TimeOut = Wait time before an invocation is timed out
Scope
Ignoring the attached handlers, the end time of a scope may be taken to be that of
its child activity.
ET ′scope = ETchildActivity (3.29)
Fault Handler
The fraction of faults captured by a catch block multiplied by the fault rate of the
enclosed scope/invoke gives the probability that the catch block will be executed.
The end time of a scope or an invoke can thus be incremented by the time taken for
execution of all catch blocks and may be estimated by the following expression:
ET ′′X = ET ′
X +Max∀i(fractionCapturei×FaultRate× (ETcatchi−STcatchi
)) (3.30)
where, X may be a scope or a invoke.
Compensation Handler
Compensation handlers may run concurrently with other activities of a scope and
other handlers. The start time of a compensation handler would be given by the
start times of compensate or compensateScope activities that are responsible for
invoking it. Again, the end time of the child activity inside a compensation handler
may be computed independently (taking its start time as zero) due to absence of
dependencies. The expected end time may be obtained as:
ETcompensationHandler = P (startCH)× (STcompensationHandler + ETchild) (3.31)
3.5. RESPONSE TIME MODELING 49
Event Handler
Event handlers also run concurrently along with other handlers and the primary
activity of the scope to which it is attached. All events are enabled when the parent
scope of the event handler starts. The event handler is disabled when the primary
activity of the scope ends but already running instances of events are allowed to
complete. The start time of the associated child scope of an event as computed by
Algorithm 3 refers to the time when the first instance of that event may be ready to
be fired. The time taken by each instance of an event to complete may be taken as:
Tevent = WaitingT ime + (ETchildScope − STchildScope)
ETevent = STevent + numOccurs× Tevent
In case of message based events, WaitingT ime is taken to be average waiting time
of the event obtained from execution logs and for timer based alarms it refers to the
value of the timer specified in the alarm. Now, end time for an event handler may be
computed as follows:
ETeventHandler = Max∀iETeventi (3.32)
Finally, we can write the equations to estimate the end times of invoke activities
and scopes.
ETinvoke = Max(ET ′′invoke, ETcompensationHandler)
ETscope = Max(ET ′′scope, ETcompensationHandler, ETeventHandler) (3.33)
3.5.2 Operations on Random Variables
In our model for response time, all times (ST and ET) are represented as random
variables characterized by their means and standard deviations. Here, we look at how
mean and standard deviation are computed for a resultant random variable that is
obtained as a function of other random variables. We list some of the operations that
are commonly used in the equations laid down in Section 3.5.1
3.5. RESPONSE TIME MODELING 50
(a) Addition of two random variables: If two random variables A and B are
added to get a third random variable R, the mean and standard deviation for
R are obtained as follows:
µR = µA + µB
σR =√
σ2A + σ2
B
(b) Addition of a random variable and a constant: If a constant t is added to
a random variable A, the mean and standard deviation for the resultant random
variable R are computed as:
µR = µA + t
σR = σA
(c) Multiplication of a random variable by a constant: If a random vari-
able A is multiplied by a constant t, the mean and standard deviation for the
resultant random variable R are computed as:
µR = t× µA
σR = | t | ×σA
(d) Weighted Sum of random variables: In case a weighted sum is performed
on a set of random variables (A1, A2, · · · , An) with the help of constant weights
(p1, p2, · · · , pn) that sum upto 1, we obtain mean and standard deviation for
the resultant random variable R as below:
µR =n∑
i=1
pi × µAi
σR =
√√√√n∑
i=1
p2i × σ2
Ai
3.6. COST MODELING 51
(e) Maximum of random variables: Unfortunately, there is no fixed expression
for obtaining mean and standard deviation for the maximum of a set of random
variables that may have any distribution. Even, if we assume all the variables
to be normally distributed - a good approximation for characterizing the be-
havior of response times which have inherent randomness, we cannot arrive at
a generalized expression that would work for any number of random variables.
Thus, we apply simulation over a large number of data points to estimate mean
and standard deviation of a random variable that may represent the maximum
of a set of normally distributed random variables with (µ, σ) being specified for
each. Although the resultant random variable does not follow an exact nor-
mal distrbution, it has been observed that there is a very small deviation of its
probability distribution from a standard bell shaped curve.
3.6 Cost Modeling
The cost model presented here gives the expected cost for each activity in the WS-
BPEL process, if it may be executed in a run of the business process. Cost modeling
is different than the models for response time or reliability in the sense that whilst
calculating the cost of an activity one is concerned only with costs of activities that
are nested under it. However, the model computes for each activity the probability
that its starts given its parent has already started. This probability, referred to as
PC, is used extensively in our model for aggregation of the costs of child activities in
order to estimate the expected cost of an activity.
3.6. COST MODELING 52
PCX = P (startX |startparent(X))
=P (startX ∩ startparent(X))
P (startparent(X))
=P (startparent(X)|startX).P (startX)
P (startparent(X))
=P (startX)
P (startparent(X))Since, P (startparent(X)|startX) = 1
= P (depBX)× P (depCX) See Equation 3.2 (3.34)
The basic methodology followed in obtaining the expected cost of an activity is
calculating a weighted sum of costs of all child activities where the weights are given
by their PCs. Thus, for each activity we determine its expected cost if it starts
execution. The precondition for such a computation to happen is that the expected
costs of all the child activities are known.
3.6.1 Activity-Wise Cost Computation
Zero costs are associated with all basic activities except invoke. Thus, we detail out
the steps for cost determination only for structured activities, invoke, scopes and
handlers.
Sequence and Flow
For a sequence or a flow, the expected cost is simply a weighted sum of the expected
costs of all its activities with their PCs being the weights.
CostX =
|children(X)|∑i=1
PCchildi× Costchildi
(3.35)
where X may be a sequence or a flow
3.6. COST MODELING 53
If and Pick
To determine the cost of if / pick activities, we take a weighted sum of costs of all
branches / events and the weights are given by the probabilities of selection of those
branches / events.
Costif =numBranches∑
i=1
P (branchTakeni)× CostbranchActivityi(3.36)
Costpick =numEvents∑
i=1
P (eventOccursi)× CosteventActivityi(3.37)
Loops
PC of the child activity of a loop will always evaluate to 1 because it cannot have
any dependencies. Thus, there is no scope of relaxation of the cost of the child of a
loop. Cost estimation in loops are handled as below:
Costloop = numIterations× CostloopChild (3.38)
Invoke
The cost of an invoke activity is computed as a sum of the cost of invocation of the web
service and the costs associated with the various catch blocks and the compensation
handler if they are present.
Costinvoke = Costws + Costcatches + CostcompensationHandler (3.39)
Scope
The cost of a scope is given as follows:
Costscope = Costchild + Costcatches + CostcompensationHandler + CosteventHandler (3.40)
3.6. COST MODELING 54
Handlers
The cost of all catch blocks are summed up after taking into account the probability
with which they will execute.
Costcatches = FaultRate×∑
∀ifractionCapturei × CostcatchActivityi
(3.41)
The cost of a compensation handler is taken as the cost of its enclosed activity after
relaxation by its PC.
PCCH = P (startCH)/P (startinvoke/scope)
CostcompensationHandler = PCCH × CostchildActivity (3.42)
The cost an event handler of an event is determined by the sum of costs of all the
events multiplied by their number of occurrences.
CosteventHandler =numEvents∑
i=1
numOccuri × Costeventi (3.43)
Chapter 4
QoS Improvement with Fault
Tolerance
During orchestration of a web service composition, the designer may be unable to find
a web service for some task in the workflow that meets the reliability requirements. If
the task is central to the workflow, the impact on the reliability of overall web service
composition will be severe. Again, it may be possible that all web services available
to perform a certain task on the critical path in a workflow, show huge variations in
their response time (indicated by high standard deviations in our QoS model). In
such cases, fault tolerant constructs may be used to create dependable web services
out of undependable ones and attain the desired reliability and performance levels.
Ofcourse, keeping redundant implementations will mean incurring higher costs.
In this chapter, we present four conventional fault tolerance (FT) techniques,
namely, N-version programming, Recovery Blocks, Return Fastest Response and Dead-
line Mechanisms ; show how they may be implemented with the help of standard
WS-BPEL 2.0 and derive expressions that help in QoS determination of these FT-
constructs. The first two constructs focus on reliability improvement and the latter
two seek to enhance performance. The rest of chapter is organized as follows. For
each of the fault tolerance approaches, we list the scheme used in its WS-BPEL imple-
mentation, their QoS estimation formulae, followed by a demonstration of the utility
of the approach through the running example of the Passport Application Service
4.1. N-VERSION PROGRAMMING (NVP) 56
4.1 N-Version Programming (NVP)
Avizienis and Chen [6] defines N-version programming as the independent generation
of N ≥ 2 functionally equivalent programs, called “versions”, from the same initial
specification. In a N-version program, multiple implementations of a program having
diverse designs are invoked in parallel. After all of them complete execution, a voting
function decides the output of the N-version program. Traditionally, N-version pro-
grams use majority voting on some attribute of the output of the program. However,
voting mechanisms often turn out to complex (especially in inexact cases), require
considerable user intervention and are seldom automatically generated. Our N-version
framework assumes as input - multiple web services having the same functionality and
a voter web service that acts upon the outputs of the redundant services to supply
the result.
N-version programming may used to improve reliability of a task without com-
promising on time very much. Thus, it may be applied to tasks on the critical path
of a workflow where increase in response time is an issue. The N-version program
has to wait for all the implementations to be complete and also the additional voting
method, but it performs better with respect to response time as compared to recovery
blocks when the response times of services are close to each other. However, it may
prove to be more costly than other FT approaches.
4.1.1 WS-BPEL implementation
Given, a set of web services (A1, A2, . . . , An) offering the same functionality and
a voting web service, our N-version program in BPEL can be laid out as follows.
A1, A2, . . . , An are all invoked from inside a 〈flow〉. The voter web service is called
after the completion of the flow to obtain the result. Thus, the 〈flow〉 and the
〈invoke〉 for the voter web service are in a 〈sequence〉. The voter web service must
expect as input an array of the output type of the other web services and produce
an output of the same type as that of the redundant web services. All the 〈invoke〉activities have attached catch blocks to ensure that only outputs of successful web
4.1. N-VERSION PROGRAMMING (NVP) 57
services reach the voting web service.
4.1.2 QoS formulation
We denote the reliability, time and cost of a redundant web service Ai to be denoted by
(Ri, Ti, Ci). We consider the voting web service to perform majority voting and thus
it may work only if k ≥ 2 out of n services have completed successfully. We compute
the probability that at least two implementations succeed as below. (k denotes the
number of successful services, P (X) gives the probability that X is successful, P (X)
gives the probability that X fails)
P (k = 0) =n∏
i=0
(1−RAi)
P (k = 1) =n⋃
i=0
P (A1A2 . . . Ai . . . An)}
=n⋃
i=0
RAi× P (k = 0)
1−RAi
P (k ≥ 2) = 1− (P (k = 0) ∪ P (k = 1)) (4.1)
Now, the Rvoter will denote the conditional probability of success of the voting
web service provided atleast two redundant services have been successful. The voter
program requires at least two values to perform matching or majority voting.
P (voter) = P (voter|K ≥ 2)× P (k ≥ 2)
= Rvoter × P (k ≥ 2) (4.2)
Finally, reliability of the N-version program is given by the probability of success
of the voter service.
RNV P = P (voter) (4.3)
Time taken by a N-version program may be estimated as below.
TNV P = Tvoter + Max∀iTAi(4.4)
4.2. RECOVERY BLOCKS 58
Cost is simply a sum of the costs of all the redundant web services and the voter web
service.
CNV P =n∑
i=0
CAi(4.5)
4.2 Recovery Blocks
In a recovery block, the redundant components are executed sequentially. A comput-
ing element is run and an acceptance test or assertion check is applied to the result.
An alternative implementation is invoked only if the output fails the acceptance test.
There can be many redundant services lined up, each of which may be executed only
if all the services prior to it have failed the assertion test.
Recover block is used when the user has an order of preference amongst the various
services. The primary service is the choicest of all the redundant implementations
and thus is invoked first. A recovery block always costs less than an equivalent NVP
implementation with same services, assuming the costs of the assertion checker and
the voter to be comparable.
In our case, the redundant programs (A1, A2, . . . , An) in a recovery block as well
as the assertion checker (AC) are implemented as web services. Again, the framework
assumes as input the values of (R, T, C) for each of these web services.
4.2.1 WS-BPEL Implementation
All invocations of redundant web services along with the various calls made to the
assertion checker (after execution of each Ai) are in a 〈sequence〉. The alternative
implementation are conditionally invoked inside 〈if〉 activities if the implementations
prior in sequence have either failed to respond (error status set by an attached catch)
or failed the assertion test.
4.2. RECOVERY BLOCKS 59
4.2.2 QoS Formulation
In a recovery block, a service Ai is said to be successful when it returns successfully
and its output is passed by the assertion checker. Thus, the conditional probability
that a service succeeds given it starts execution may be written as:
P (Ai|StartAi) = RAi
×RAC
Since, a service in a recovery block may only start if preceding activity failed
after being invoked. The probability of start of a service Ai is given by the following
recurrence:
P (StartA1) = 1
P (StartAi) = P (StartAi−1
)× P ( ¯Ai−1|StartAi−1) if i > 1 (4.6)
where, P ( ¯Ai−1|StartAi−1) = 1− P (Ai−1|StartAi−1
)
Now, the probability of success of a service may be given as:
P (Ai) = P (Ai|StartAi)× P (StartAi
)
Finally, since the events of success of the redundant services are mutually exclusive,
we can write:
RRB = P (A1 ∩ A2 ∩ . . . ∩ An)
=n∑
i=0
P (Ai) (4.7)
Both Response Time and cost of a recovery block construct is estimated as a
weighted sum of the response times/costs of the several units of execution (including
time taken by the assertion checker in each case), the weights being the probabilities
of execution for the units, P(Start) (given by Equation 4.6 However, the time/cost
4.3. RETURN FASTEST RESPONSE 60
for assertion checker is incorporated only in case the service returns.
TRB =n∑
i=0
P (startAi)× (TAi
+ RAi× TAC) (4.8)
CRB =n∑
i=0
P (startAi)× (CAi
+ RAi× CAC) (4.9)
4.3 Return Fastest Response
The Return Fastest Response construct primarily aims to improve the performance
levels of a web service. All the redundant services are executed in parallel and the
first response obtained is chosen as the result of the FT-construct. Such a construct
is especially helpful if the redundant programs have comparable reliabilities and per-
formance is crucial.
4.3.1 WS-BPEL Implementation
The WS-BPEL implementation of this construct is the same as that proposed in
[17]. The set of redundant web services are invoked in parallel in a 〈flow〉 activity.
However, a 〈flow〉 activity completes on the completion of all the activities nested
inside it which will not serve our purpose. Thus, on completion of execution of one
activity we copy its result as output and forcibly terminate the 〈flow〉 activity to
stop all other executions of redundant services by throwing a fault that is caught by a
fault handler of the enclosing scope. These faults are simply ignored within the fault
handlers by placing an empty activity in the catch block.
4.3.2 QoS Formulation
In any invocation of a Return Fastest Response (RFR) block, the service that returns
within minimum time gets counted. Thus, in order to compute the reliability of such
a block, we require the probability that a service takes minimum time and happens
to be the first to return. The probability, P (FirstAi) that a web service Ai returns
4.4. DEADLINE MECHANISM 61
the first response in an execution of the RFR block is written as:
P (FirstAi) = P (TAi
= Min{TA1 , TA2 , . . . , TAn}) (4.10)
The reliability of the RFR block may be then computed as a weighted sum of
reliabilities, RAi, of all the constituent services.
RRFR =n∑
i=0
P (FirstAi)×RAi
(4.11)
Time taken by the RFR block is simply the minimum of the response times of all
services.
TRFR = Min∀i{TAi} (4.12)
Cost of the RFR block is the sum of costs of all services in it.
CRFR =n∑
i=0
CAi(4.13)
Note that time in our QoS model is represented as random variables and mini-
mum of a set of random variables is obtained through simulation. In Equation 4.10
and Equation 4.12 P (FirstAi) and Min{TAi
} are estimated by simulating the time
quantities as normally distributed random variables.
4.4 Deadline Mechanism
In case of time critical applications, a designer may want to impose hard deadlines for
completion of certain activities within the workflow. Deadline mechanisms support
setting deadlines for completion of tasks and provision forking off redundant services
for a task if a primary service does not complete within some specified length of time.
In our model, the user sets a hard deadline for completion of a task. If a output is
to be returned from the deadline mechanism block, then one of the constituent services
must return within the specified hard deadline. Moreover, the designer is allowed to
4.4. DEADLINE MECHANISM 62
specify the time instants when the alternate implementations may be invoked if no
response is received from services that have been running.
4.4.1 WS-BPEL implementation
Deadline mechanisms can be incorporated with the help of the event handler construct
available in WS-BPEL. The primary web service is run inside a scope and the alternate
services are placed inside the event handler of the same scope with a timer based alarm
event, 〈onAlarm〉, associated with each alternate implementation. We ensure that
the response is returned as soon as it is received from a constituent service invocation,
in the same manner as in the Return Fastest Response construct. If a service returns
successfully we copy its response to the output and throw a fault that is caught by
an empty fault handler of the enclosing scope. In case as service fails we make the
scope to wait till the deadline is reached with the help of a wait activity inside the
catchAll block attached to the invoke activity for the service.
4.4.2 QoS Formulation
The model would require the hard deadline HD for the process and for each service
Ai in a deadline mechanism (DM) block, its Firing Time TFAiapart from the regular
inputs of (R, T, C). We denote the sum of the firing time and the response time
as T ′ for our purposes. In a similar analysis to that done in case of Return Fastest
Response, for each service in a DM block, we find the probability that it returns the
first response.
P (FirstAi) = P (T ′
Ai= Min{T ′
A1, T ′
A2, . . . , T ′
An} and T ′
Ai< HD) (4.14)
The reliability of the DM block is then computed as a weighted sum of reliabilities,
RAi, of all the constituent services in exactly the same way as in RFR.
RDM =n∑
i=0
P (FirstAi)×RAi
(4.15)
4.4. DEADLINE MECHANISM 63
Time taken by the DM block is the minimum of the sums of response times and firing
times of all services.
TDM = Min∀i{T ′Ai} (4.16)
For cost estimation in deadline mechanism we find for each service the probability
that it is invoked inside the DM block. A service inside a DM block may start if a
successful response has not been generated by other services till the point of its firing.
P (StartAi) = P (TFAi
≥ Min{T ′A1
, T ′A2
, . . . , T ′An
, HD}) (4.17)
Cost of the DM block is a weighted sum of costs of all services in it, the weights
being the probabilities of start of the services.
CDM =n∑
i=0
P (StartAi)× CAi
(4.18)
Chapter 5
Implementation Details
We have implemented the QoS model for WS-BPEL 2.0 processes presented in the
last two chapters in a stand-alone software using Java 1.5. The QoS Calculator built
takes as input the values for reliability, time and cost for the constituent web services
and the various essential control flow parameters of the business process outlined
in Section 3.2.2. Also, for improving QoS through fault tolerant constructs it takes
WSDLs of several redundant web services and web services for voting and assertion
checking. In this chapter, we look at the utility of some of the important modules in
our implementation and describe the interactions that take place between them.
Figure 5.1 shows that the implementation may be organized under two major heads,
namely, BPEL parser and QoS Calculator. We discuss these two modules, explore the
components within them and explain the flow of data in and out of these modules.
5.1 BPEL Parser
The BPEL parsing unit is responsible for parsing the BPEL file and converting the
business process into an internal graph-like representation consisting of nodes which
may be either activities or scopes or handlers. Such a node is the central data structure
in the implementation as QoS computation happens at this level. We track all three
types of dependencies (we can treat them as different classes of directed edges) for
5.1. BPEL PARSER 65
Use r I n t e r f a c e
X M L P a r s e r W S D L P a r s e r
B P E L P a r s e r
A c t i v i t y G r a p h G e n e r a t i o n
D e p e n d e n c y M a n a g e r
B o o l e a nC o n d i t i o n P a r s e r
C r e a t e B P E LP r o j e c t F i l e s
S e r v i c e N o d e G e n e r a t i o n
N V P
R F R D M
R B
Re l iab i l i t y M o d e l e r
R e s p o n s e T i m e Mode le r
C o s t M o d e l e r R a n d o m V a r i a b l e S imu la to r
A c t i v i t y G r a p h a n n o t a t e d w i t h d e p e n d e n c i e sa n d c o n t r o l f l o w p a r a m e t e r s
S e r v i c e N o d e st a g g e d w i t h Q o S p a r a m e t e r s
Q o S C a l c u l a t o r
Fau l t To le ran tB P E L P r o j e c t F i l e s
E s t i m a t e d Q o S f o rO v e r a l l P r o c e s s + A c t i v i t y W i s e Q o S
S o u r c e B P E L P ro j ec t F i l es
C o n t r o l F l o w P a r a m e t e r s
Q o S V a l u e s f o r C o m p o n e n t W S
Figure 5.1: Block Diagram of Implementation
each node in the graph. Thus, the module takes the input BPEL file and produces a
graph structure out of it that clearly shows the dependencies between the activities,
scopes and handlers present in the BPEL process. The invoke nodes in the graph are
specially stored as service nodes and annotated with the values of the QoS parameters
for the web services being called. In case of redundant web services being introduced
they are also represented as service nodes. Further, several service nodes may be
suitably composed to form a fault tolerant structure which is composite service node.
Whilst adding fault tolerant constructs for various constituent web services, the user
is shown a listing of the service nodes already introduced and for each original external
web service he chooses one service node that may be a composite one. The QoS values
are also listed against every service node. Waiting times of incoming messages are
set against the nodes receiving them. Various other inputs (listed in Section 3.2.2)
are also appropriately stored in the data structures maintained for nodes and links.
Figure 5.1 shows the different sub-modules within the system. Here, we discuss
each of them briefly.
5.1. BPEL PARSER 66
(a) XML Parser:We make use of a Xerces implementation (Xerces2 Java Parser
2.9.1) of an XML-DOM parser for all our purposes. The DOM style of parsing
suits use better than SAX parsing because we need to keep the parsed DOM
tree in memory for a great length of time and frequently access widely separated
parts of the document at the same time.
(b) Workflow Graph Generation: Transforming the BPEL process alongwith
other inputs into a workflow data structure that may be easily tapped by the
QoS calculator is the key functionality rendered by the BPEL parsing unit.
The XML-DOM tree produced by the XML parser is traversed in order to build
the workflow graph structure. All activities / links / handlers for which one
would require additional inputs from the user are extracted and fed into the
User Interface module so that the user may be intimated about what all inputs
are to be supplied. For example, the User Interface is provided with operations,
portTypes and partnerLinks(required for uniquely identifying an invoke) for all
invoke activities so that it can seek QoS inputs from the user. The workflow is
generated after all forms of dependencies between nodes have been incorporated
into it.
(c) User Interface: The BPEL file is first loaded and passed onto the XML parser.
The workflow generation module provide a list of fields whose values will be
necessary for QoS computation and the user is required to enter these inputs
through a graphical interface. The inputs are passed back into the workflow
builder so that they may be properly maintained for future use.
(d) Dependency Manager: All dependencies in the BPEL process are properly
classified and maintained against the dependent nodes. Each link gets annotated
by the probability that its transition condition is evaluated to true (this is
obtained as an input).
(e) Boolean Condition Parser: A join condition defined for an activity/scope is
a boolean expression (containing boolean operators such as AND and OR) on
5.2. QOS CALCULATOR 67
incoming links. The expression is parsed from the BPEL file and converted into
a tree where each non-leaf vertex is an operator and leafs contain identifiers for
links. This expression tree is then transformed such that it represents a canon-
ical boolean expression (ORs of ANDs analogous to the sum of products form).
The probability of success of such an expression may be easily evaluated given
the probability of success of the links that are involved in it (See Evaluation of
Join Conditions in Section 3.3).
(f) Service Node Generation: This block provisions generation of BPEL imple-
mentations for each of the four fault tolerant constructs. It also documents how
the various redundant web services are arranged and which are the ones that
will feature in the final BPEL file to be created.
(g) WSDL Parser: WSDLs of the various redundant web services are taken as
input. They are parsed to show the user the various elements present in them
and allow him to select the operations, port types, bindings and addresses to
be used in the fault tolerant constructs.
(h) Creation of BPEL project files: The final service nodes are considered in
the production of the fault tolerant BPEL file. Also all WSDLs and deployment
descriptors are suitably enhanced to generate a BPEL project that is ready to
be deployed. We follow the directory structure laid out by the ActiveBPEL
engine and produce files that may be readily deployed using it.
5.2 QoS Calculator
The QoS calculator implements the various mathematical formulations presented in
the previous two chapters. It may be noted that reliability modeling must be carried
out before response times and costs may be estimated because the latter computations
make use of P (start) and P (success). All values that help in QoS computation viz.
P (start), P (success), ST , ET , PC and Cost are evaluated and stored at each node
in the workflow graph. All QoS computations for the fault tolerant constructs are
5.2. QOS CALCULATOR 68
performed at the service node level. Figure 5.1 also shows a module for performing
simulations to compute the maximum, minimum etc. for a set of random variables
used to represent time values. A random variable is simulated by a set of 100000 data
points having the specified mean and standard deviation.
Chapter 6
Conclusions
A comprehensive QoS model has been presented in this work. The model is capable
of estimating QoS of arbitrarily complex structures that may be written through WS-
BPEL. To the best of our knowledge, no QoS estimation technique exists in literature
for graph based flow languages. As argued in previous chapters, most QoS modeling
frameworks are customized for structured programming languages and thus support
only a handful of workflow patterns. In presence of goto-like links that may extend
from within one structured construct into another, QoS modeling for an activity has
to involve tracking QoS of more activities than just those which are nested within it.
This work should also come as the first attempt that lends QoS determination support
in presence of event driven programming, fault handling and backward recovery.
Unlike some other approaches in literature, our model does not call for transfor-
mation of BPEL processes to formalizations which have ratified QoS determination
frameworks. Instead, it offers a direct approach that is firmly grounded in probability
theory. The model tries to exhaustively cover all aspects of BPEL and can scale very
well in face of increasing complexity in BPEL processes.
The QoS calculator provides values for reliability, response time and cost for each
activity / scope / handler. This feature enables the designer to track the critical
parts of the program more closely. For example, the end times for each activity allows
70
the user to detect the activity that might be a possible performance bottleneck for
the composite process. Again low reliabilities for certain activities may prompt the
application of redundancies and fault tolerance constructs. The designer may choose
to modify parts of the BPEL process workflow (differently organize the control flow or
add/remove fault handlers and compensation handlers etc.) and then check whether
the QoS has improved. The model will aid the integrator to improve upon QoS by
making appropriate changes to the bindings for constituent web services.
The QoS tool also provisions increasing dependability of the web service com-
position through fault tolerant constructs. All fault tolerant constructs have been
implemented in a manner that is compliant with WS-BPEL 2.0 standard. Again, our
work becomes the first to introduce the notion of enforcing hard deadlines through
the deadline mechanism constructs in web services compositions. However, the QoS
computation of a fault tolerant BPEL file cannot be performed in an uniform way
through our QoS model. In order to detect the presence of redundant services inside
a BPEL file will require to track data dependencies in the BPEL file. It may be
noted that our QoS determination model for WS-BPEL only tracks control depen-
dencies. Finding fault tolerance structures inside a BPEL file through static analysis
of the BPEL code remains a challenge and provides scope for future work. Our QoS
tool may be utilized to create highly dependable web service compositions out of
undependable web services.
References
[1] Aalst, W. M. P. V. D., Hofstede, A. H. M. T., Kiepuszewski, B., andBarros, A. P. Workflow Patterns. Distrib. Parallel Databases 14, 1 (2003),5–51.
[2] Aggarwal, R., Verma, K., Miller, J., and Milnor, W. Constraintdriven web service composition in METEOR-S. In SCC ’04: Proceedings of the2004 IEEE International Conference on Services Computing (Washington, DC,USA, 2004), IEEE Computer Society, pp. 23–30.
[3] Amsden, J., Gardner, T., Griffin, C., and Iyengar, S. Draft UML 1.4profile for automated business processes with a mapping to BPEL 1.0, 2004.
[4] Anderson, T., and Kerr, R. Recovery blocks in action: A system supportinghigh reliability. Proceedings of the 2nd international conference on Softwareengineering (1976), 447–457.
[5] Avizienis, A. The N-Version Approach to Fault-Tolerant Software. IEEETrans. Softw. Eng. 11, 12 (1985), 1491–1501.
[6] Avizienis, A., and Chen, L. On the implementation of N-version program-ming for software fault tolerance during execution. Proc. IEEE COMPSAC 77(1977), 149–155.
[7] Baresi, L., and Guinea, S. Towards Dynamic Monitoring of WS-BPELProcesses. Proceedings of the 3rd International Conference on Service OrientedComputing (2005).
[8] Berbner, R., Spahn, M., Repp, N., Heckmann, O., and Steinmetz, R.Heuristics for QoS-aware web service composition. In ICWS ’06: Proceedingsof the IEEE International Conference on Web Services (Washington, DC, USA,2006), IEEE Computer Society, pp. 72–82.
[9] Canfora, G., Penta, M. D., Esposito, R., and Villani, M. L. Anapproach for QoS-aware service composition based on genetic algorithms. InGECCO ’05: Proceedings of the 2005 conference on Genetic and evolutionarycomputation (New York, NY, USA, 2005), ACM, pp. 1069–1075.
[10] Cappiello, C., Pernici, B., and Plebani, P. Quality-agnostic or quality-aware semantic service descriptions?
REFERENCES 72
[11] Cardoso, A. J. S. Quality of Service and Semantic Composition of Workflows.PhD thesis, University of Georgia, Athens, Georgia, 2002.
[12] Chen, L., and Avizienis, A. N-Version Programming: A Fault-ToleranceApproach to Reliability of Software Operatlon. Fault-Tolerant Computing,1995,’Highlights from Twenty-Five Years’., Twenty-Fifth International Sympo-sium on (1995).
[13] Clark, D. D., Shenker, S., and Zhang, L. Supporting real-time applica-tions in an integrated services packet network: Architecture and mechanism. InSIGCOMM (1992), pp. 14–26.
[14] Cruz, R. L. Quality of service guarantees in virtual circuit switched networks.IEEE Journal on Selected Areas in Communications 13, 6 (1995), 1048–1056.
[15] CSENKI, A., Square, N., and London, E. Recovery Block reliabilityanalysis with Failure Clustering. Dependable Computing for Critical Applications(1991).
[16] D’Ambrogio, A., and Bocciarelli, P. A model-driven approach to describeand predict the performance of composite services. In WOSP ’07: Proceedingsof the 6th international workshop on Software and performance (New York, NY,USA, 2007), ACM, pp. 78–89.
[17] Dobson, G. Using WS-BPEL to implement Software Fault Tolerance for WebServices. In EUROMICRO ’06: Proceedings of the 32nd EUROMICRO Con-ference on Software Engineering and Advanced Applications (Washington, DC,USA, 2006), IEEE Computer Society, pp. 126–133.
[18] Dobson, G. Using WS-BPEL to Implement Software Fault Tolerance for WebServices. Proceedings of the 32nd EUROMICRO Conference on Software Engi-neering and Advanced Applications (2006), 126–133.
[19] Dobson, G., Hall, S., and Sommerville, I. Dependable Service En-gineering: A Fault-tolerance based Approach. Submitted to ACM Trans-actions on Software Engineering and Methodology, http://digs. sourceforge.net/papers/2005 tosem ftc. pdf (2005).
[20] Dugan, J., and Lyu, M. System reliability analysis of an N-version program-ming application. Reliability, IEEE Transactions on 43, 4 (1994), 513–519.
[21] Ege, M., Eyler, M., and Karakas, M. Reliability analysis in N-versionprogramming with dependent failures. Euromicro Conference, 2001. Proceedings.27th (2001), 174–181.
[22] Frolund, S., and Koistinen, J. Quality-of-service specification in distributedobject systems. IOP/BCS Distributed Systems Engineering Journal (December1998). to appear.
REFERENCES 73
[23] Gorbenko, A., Kharchenko, V., Popov, P., Romanovsky, A., andBoyarchuk, A. Development of Dependable Web Services out of UndependableWeb Components. School of Computing Science, University of Newcastle uponTyne CS-TR-863 (2004).
[24] G.Rommel. Simplicity Wins: How Germanys Mid-Sized Industrial CompaniesSucceed,. Harvard Business School Press, Boston, 1995.
[25] G.Stalk, and T.M.Hout. Competing Against Time: How Time-Based Com-petition is Reshaping Global Markets. Free Press, New York, 1990.
[26] Hillier, F. S., and Lieberman, G. J. Introduction to Operations Research,7th ed. Mc-Graw Hill, 2002.
[27] Hoyland, A., and Rausand, M. System Reliability Theory: Models andStatistical Methods. John Wiley and Sons, 1994.
[28] Hwang, S.-Y., Wang, H., Tang, J., and Srivastava, J. A probabilisticapproach to modeling and estimating the QoS of web-services-based workflows.Inf. Sci. 177, 23 (2007), 5484–5503.
[29] International Standards Organization. Model for quality assurance, iso9000:1987 ed., 1987.
[30] International Standards Organization. Information Technology - Soft-ware Product Evaluation - Quality Characteristics and Guidelines for their Use,iso/iec is 9126 ed., 1991.
[31] International Telecommunication Union (ITU). Terms and definitionsrelated to quality of service and network performance including dependability,ITU recommendation e.800 ed., 1994.
[32] International Telecommunication Union (ITU). Communications Qual-ity of Service: A framework and definitions, ITU recommendation g.1000 ed.,2001.
[33] Ireson, W., Jr., C., and Moss, R. Y. Handbook of Reliability Engineeringand Management. McGraw Hill, New York, 1996.
[34] Jaeger, M., and Ladner, H. Improving the QoS of WS Compositions Basedon Redundant Services. Next Generation Web Services Practices, 2005. NWeSP2005. International Conference on (2005), 189–194.
[35] Jaeger, M. C., Rojec-Goldmann, G., and Muhl, G. QoS Aggregation forWeb Service Composition using Workflow Patterns. In EDOC ’04: Proceedingsof the Enterprise Distributed Object Computing Conference, Eighth IEEE Inter-national (Washington, DC, USA, 2004), IEEE Computer Society, pp. 149–159.
[36] Jalote, P. Software Design Faults. John Wiley and Sons Ltd, 1994, ch. FaultTolerance in Distributed Systems, pp. 355–396.
REFERENCES 74
[37] Leymann, F. Web Services Flow Language (WSFL 1.0). IBM, May 2001.
[38] Looker, N., and Munro, M. WS-FTM: A Fault Tolerance Mechanism forWeb Services. University of Durham, Technical Report 19 (2005).
[39] Looker, N., and Xu, M. Increasing Web Service Dependability Through Con-sensus Voting. Computer Software and Applications Conference, 2005. COMP-SAC 2005. 29th Annual International 2 (2005).
[40] Ludwig, H. Web services QoS: external SLAs and internal policies or: how dowe deliver what we promise? Web Information Systems Engineering Workshops,2003. Proceedings. Fourth International Conference on (2003), 115–120.
[41] Maximilien, E. M., and Singh, M. P. Conceptual model of web servicereputation. SIGMOD Rec. 31, 4 (2002), 36–41.
[42] Menasce, D. Composing Web Services: A QoS View. IEEE INTERNETCOMPUTING (2004), 88–90.
[43] Menasce, D. A. QoS issues in web services. IEEE Internet Computing 6, 6(2002), 72–75.
[44] Musa, J., Iannino, A., and Okumoto, K. Software reliability: measure-ment, prediction, application. 1987.
[45] Purtilo, J. M., and Jalote, P. An Environment for Developing Fault-Tolerant Software. IEEE Trans. Softw. Eng. 17, 2 (1991), 153–159.
[46] Randell, B. System structure for software fault tolerance. In Proceedings ofthe international conference on Reliable software (New York, NY, USA, 1975),ACM, pp. 437–449.
[47] Randell, B., and Xu, J. The Evolution of the Recovery Block Concept.Software Fault Tolerance (1995), 1–22.
[48] Reiter, T. Transformation of web service specification languages into UMLactivity diagrams. Diploma thesis.University of South Australia, 2005.
[49] Rud, D., Schmietendorf, A., and Dumke, R. Performance Modeling ofWS-BPEL-Based Web Service Compositions. Services Computing Workshops,2006. SCW’06. IEEE (2006), 140–147.
[50] Sherchan, W., Loke, S. W., and Krishnaswamy, S. A fuzzy model forreasoning about reputation in web services. In SAC ’06: Proceedings of the 2006ACM symposium on Applied computing (New York, NY, USA, 2006), ACM,pp. 1886–1892.
[51] Tartanoglu, F., Issarny, V., Romanovsky, A., and Levy, N. Coordi-nated Forward Error Recovery for Composite Web Services. Proceeding of the22nd International Symposium on Reliable Dependable Systems, SRDS 2003 .
REFERENCES 75
[52] Thatte, S. XLANG: Web Services for Business Process Design. Microsoft,2001.
[53] Thatte, S. BPEL4WS: Business Process Execution Language for Web ServicesVersion 1.1. BEA, IBM, Microsoft, SAP and Siebel, May 2003.
[54] Tian, M., Gramm, A., Naumowicz, T., Ritter, H., and Freie, J. Aconcept for QoS integration in Web services. Web Information Systems Engineer-ing Workshops, 2003. Proceedings. Fourth International Conference on (2003),149–155.
[55] Tian, M., Gramm, A., Ritter, H., and Schiller, J. Efficient selectionand monitoring of QoS-aware web services with the WS-QoS framework. In WI’04: Proceedings of the 2004 IEEE/WIC/ACM International Conference on WebIntelligence (Washington, DC, USA, 2004), IEEE Computer Society, pp. 152–158.
[56] Tian, M., Gramm, A., Ritter, H., and Schiller, J. Efficient Selectionand Monitoring of QoS-Aware Web Services with the WS-QoS Framework. WebIntelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Con-ference on (2004), 152–158.
[57] van der Aalst, W. Don’t go with the flow: Web services composition stan-dards exposed, 2003.
[58] Wang, G., Wang, C., Chen, A., Wang, H., Fung, C., Uczekaj, S.,Chen, Y.-L., Guthmiller, W. G., and Lee, J. Service level managementusing QoS monitoring, diagnostics, and adaptation for networked enterprise sys-tems. In EDOC ’05: Proceedings of the Ninth IEEE International EDOC En-terprise Computing Conference (Washington, DC, USA, 2005), IEEE ComputerSociety, pp. 239–250.
[59] Wohed, P., van der Aalst, W. M., Dumas, M., and ter Hofstede,A. H. Analysis of web services composition languages: The Case of BPEL4WS,2003.
[60] Wu, B., Chi, C.-H., and Xu, S. Service Selection Model Based on QoSReference Vector. IEEE Congress on Services (2007), 270–277.
[61] Zeng, L., Benatallah, B., Ngu, A. H., Dumas, M., Kalagnanam, J.,and Chang, H. QoS-aware middleware for web services composition. IEEETrans. Softw. Eng. 30, 5 (2004), 311–327.
[62] Zeng, L., Lei, H., and Chang, H. Monitoring the QoS for web services.Service-Oriented Computing - ICSOC 2007 (2007), 132–144.
[63] Zhou, C., Chia, L., and Lee, B. DAML-QoS ontology for Web services. WebServices, 2004. Proceedings. IEEE International Conference on (2004), 472–479.
REFERENCES 76
[64] Zinky, J. A., Bakken, D. E., and Schantz, R. E. Architectural supportfor quality of service for CORBA objects. Theory and Practice of Object Systems3, 1 (1997).