qos in ws-bpel processesto ascertain qos for bpel processes. in this thesis, we illustrate the...

QOS IN WS-BPEL PROCESSES

A DISSERTATION SUBMITTED TO

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

AT THE INDIAN INSTITUTE OF TECHNOLOGY DELHI

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OF MASTER OF TECHNOLOGY

Debdoot Mukherjee

May 2008

c© Copyright by Indian Institute Of Technology Delhi 2008

All Rights Reserved

CERTIFICATE

This is to certify that the thesis titled “QoS in WS-BPEL Processes” be-

ing submitted by Debdoot Mukherjee to the Indian Institute of Technology

Delhi, for the award of the degree of Master of Technology in Computer Sci-

ence & Engineering, is a record of bona-fide research work carried out by him

under our supervision. The work presented in this thesis has not been submitted to

any other university or institute for the award of any other degree or diploma.

Prof. Pankaj Jalote Dr. Mangala Gowri Nanda

Microsoft Chair Professor Research Staff Member

Dept. of Computer Science & Engg. IBM India Research Lab

IIT Delhi New Delhi

Abstract

With a large number of web services offering the same functionality, the Quality of

Service (QoS) rendered by a web service becomes a key differentiator. WS-BPEL

has emerged as the de facto industry standard for composing web services. Thus,

determining the QoS of a composite web service expressed in BPEL can be extremely

beneficial. While there has been much work on QoS computation of workflows repre-

sented in custom built Workflow Management Systems (WfMS), there exists no tool

to ascertain QoS for BPEL processes. In this thesis, we illustrate the differences in

expressiveness of BPEL and conventional workflow systems and show that a BPEL

process cannot be always reduced to a composition of series, parallel, conditional or

loop constructs; which is an assumption of the existing QoS computation approaches.

We propose a model for estimating three QoS parameters, namely, Response Time,

Cost and Reliability, of an executable BPEL process with requisite QoS and flow in-

formation available for its constituent activities. We have built a tool to compute QoS

of a WS-BPEL process that accounts for all workflow patterns that may be expressed

by standard WS-BPEL. Again, with mission critical applications of organizations

getting supported through web service compositions, very high levels of reliability

and performance may be warranted. We present an approach that utilizes traditional

fault tolerance constructs to improve QoS of BPEL processes and alleviate the risk of

not meeting SLA requirements. We have modeled two constructs each for reliability

improvement and performance improvement using standard WS-BPEL 2.0 elements.

N-Version Programming and Recovery Blocks are the two approaches implemented

that help in improving the reliability of a service invocation. To increase the chances

of meeting the performance requirements outlined in SLA, one may use a construct

where the fastest response is returned from a parallel execution of redundant services

or enforce a deadline mechanism to fork off alternate services if a primary service does

not deliver within a certain length of time. Our tool allows the designer of the web

service composition to arbitrarily nest these fault tolerant constructs for any activity

and strive to achieve the desired QoS levels for the composite service.

Acknowledgment

Contents

Abstract ii

Acknowledgment iv

1 Introduction 1

1.1 QoS in Web Services . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Workflows and QoS computation . . . . . . . . . . . . . . . . . . . . 4

1.3 Fault Tolerance in Web Services . . . . . . . . . . . . . . . . . . . . . 4

1.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4.1 BPEL - A Case Apart . . . . . . . . . . . . . . . . . . . . . . 6

1.4.2 Fault Tolerance in a WS-World - New Opportunities . . . . . 8

2 Related Work 9

2.1 QoS in Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 Cardoso’s QoS Model . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.2 Other Efforts . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 WS-BPEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3.1 Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3.2 Brief Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3 QoS Computation 22

3.1 Passport Application Service Example . . . . . . . . . . . . . . . . . 24

3.2 QoS Model Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2.1 User Environment . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2.2 Inputs to the Model . . . . . . . . . . . . . . . . . . . . . . . 27

3.3 Model Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.4 Reliability Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.4.1 Activity-Wise Reliability Computation . . . . . . . . . . . . . 34

3.4.2 Suppression of Join Failures . . . . . . . . . . . . . . . . . . . 42

3.5 Response Time Modeling . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.5.1 Activity Wise Response Time Computation . . . . . . . . . . 45

3.5.2 Operations on Random Variables . . . . . . . . . . . . . . . . 49

3.6 Cost Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.6.1 Activity-Wise Cost Computation . . . . . . . . . . . . . . . . 52

4 QoS Improvement with Fault Tolerance 55

4.1 N-Version Programming (NVP) . . . . . . . . . . . . . . . . . . . . . 56

4.1.1 WS-BPEL implementation . . . . . . . . . . . . . . . . . . . . 56

4.1.2 QoS formulation . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.2 Recovery Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.2.1 WS-BPEL Implementation . . . . . . . . . . . . . . . . . . . . 58

4.2.2 QoS Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.3 Return Fastest Response . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.3.1 WS-BPEL Implementation . . . . . . . . . . . . . . . . . . . . 60

4.3.2 QoS Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.4 Deadline Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.4.1 WS-BPEL implementation . . . . . . . . . . . . . . . . . . . . 62

4.4.2 QoS Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 62

5 Implementation Details 64

5.1 BPEL Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.2 QoS Calculator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6 Conclusions 69

References 71

List of Figures

2.1 A system that is neither series nor parallel . . . . . . . . . . . . . . . 13

2.2 Synchronization with links . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3 Dependability Stack in Web Services . . . . . . . . . . . . . . . . . . 16

3.1 Passport Office BPEL Process . . . . . . . . . . . . . . . . . . . . . . 25

5.1 Block Diagram of Implementation . . . . . . . . . . . . . . . . . . . . 65

Chapter 1

Introduction

ISO9000 defines quality as the degree to which a set of inherent characteristic fulfills

requirements [29]. It may be regarded as the customer’s perception of the supplier’s

work output or the extent to which the service delivery meets user expectations. Also,

quality is inherently subjective - different people may experience the quality of the

same software differently. Thus, according to Gerald Weinberg quality is value to

some person. Software quality measures how well a software is designed as well as

the degree of its conformance to its design. The study of software quality concerns

itself as much with internal quality characteristics viz. maintainability, portability,

understandability and adherence to coding standards; as it does with external user

requirements such as reliability, security, efficiency etc. [30].

Quality of Service (QoS) is a term that was originally coined in the world of tele-

phony and traffic engineering to refer to the levels of performance and transmission

characteristics of a communications channel. ITU standard X.902 defines QoS as, a

set of quality requirements on the collective behavior of one or more objects. QoS has

been heavily studied in the area of computer networks [14, 31, 32], middleware [64, 22]

and real time systems [13]. However, in computer networking the term QoS is more

used to relate to resource reservation control mechanisms rather than the achieved

service quality. QoS is clearly a multi-dimensional quantity with its various dimen-

sions getting dictated by the context within which it is being studied. For example

1.1. QOS IN WEB SERVICES 2

in telephony, the different aspects of a connection such as service response time, loss,

signal-to-noise ratio, cross-talk, echo, interrupts, frequency response, loudness levels

etc. contribute in the process of measurement of QoS. On the other hand, Quality of

Service of a real time system will involve an assessment of factors such as guarantees

on response time, and the degree of predictability of delays.

1.1 QoS in Web Services

Quality of a web service or for that matter any service may be modeled only through

its externally measurable characteristics. A web service presents the functionality

being rendered as a black box and does not give the consumer a chance to peruse

the quality processes of the service provider. This is the perhaps the reason why we

talk about Quality of Service and not just quality (as in software quality) because

quality involves inspection of the development process. The Service Level Agreement

(SLA), where deliverable limits on the qualitative attributes are listed, forms the

legal binding through which consumers or brokers can track the service provider’s

offerings. Since, SLA violations could lead to penalties being incurred by the service

provider, it contains only those attributes of the service that can be monitored by both

parties and possibly by neutral third parties who may be approached for mediation of

disputes. In the web services world, SLA management drives QoS research; thus, it is

only natural that the dimensions of QoS will have to be unambiguously measurable

so that they can find a place in SLAs.

Quality of Service (QoS) of a web service has been parameterized in terms of its

response time, throughput, reliability, availability, security and other network related

parameters. Interestingly, reputation or fidelity has also been studied [41, 50, 61]

as a potential QoS factor. Various web services consortiums have tried to formalize

taxonomies for QoS dimensions aiming at automated SLA management. Efforts have

been focused on arriving at semantic models for representation of different quality

attributes and their assessment functions, capturing the inter-relationships between

the entities involved in SLA management [55, 10].

1.1. QOS IN WEB SERVICES 3

Cappiello et al. [10] point out that QoS research in web services revolves around

two basic heads. Firstly, measurement of QoS dimensions such as response time,

throughput, reliability and availability has drawn attention [58, 62]. QoS monitoring

systems have come up that track SOAP messages and attributes of the network

connection set up during an invocation of a web service. These values are averaged

over a number of calls to the web service to derive its expected QoS parameters.

Secondly, QoS aware composition has emerged as a hot topic in recent years. The

problem of optimally composing a web service that invokes several other web services

during the course of its workflow has captured the minds of many researchers. Several

solutions based on diverse techniques such as integer linear programming [61, 60],

genetic algorithms [9], constraint based optimization [2] and use of heuristics in mixed

integer programming [8] have been proposed in the literature.

Knowing the QoS of the web service being composed is extremely crucial during

the process of service orchestration (binding concrete web services to tasks in the

workflow). The integrator of the WS-composition needs to keep track of the QoS of

the composite service whenever he makes modifications to the bindings. Section ??

looks at some of the traditional workflow modeling systems and gain an insight into

the QoS computation technique used in such systems.

Although, the use of fault tolerant constructs in web services have studied in

literature [17, 19, 51, 23, 39, 18], there has been no research that quantifies the

QoS improvement that may be brought about by these constructs. Our work aims

to provide a framework within which various fault tolerant constructs may be built

into BPEL processes and puts foward models to help the designer measure the QoS

improvement that may take place. Section 1.3 sets the background of fault tolerance

research in web services.

1.2. WORKFLOWS AND QOS COMPUTATION 4

1.2 Workflows and QoS computation

A workflow is an abstraction of a business process wherein the logical steps performed

in it and the dependencies between them are listed. It also contains the rules for rout-

ing and sharing of information amongst the different participants. Workflow Manage-

ment Systems (WfMSs) have been in vogue to streamline business processes. They

help in increasing efficiency of organizational workflows by automating invocation of

tasks following the pre-defined rules and providing better coordination amongst par-

ticipants wherever manual intervention is required. Moreover they allow the business

processes to be continuously monitored and analyzed providing an opportunity for

improvements to be implemented.

With the advent of web services, business workflows are being implemented as

WS-compositions. Thus, there arises a need for proper QoS management of workflow

instances running as web services. Cardoso [11] presents a model for QoS computation

of workflows. Although his research draws motivation from the web services world,

the QoS model is generic and works the same way irrespective of whether the tasks

are being implemented as web services or not. The formulation reduces the workflow

repetitively into a decomposition of constructs like sequential, parallel, conditional,

loop and fault tolerant; computes the QoS parameters at each reduction in order

to arrive at the overall QoS for the workflow. The model provides estimation of

time, cost and reliability and has been implemented on a WfMS called METEOR-S.

Chapter 2 looks at Cardoso’s model and other workflow QoS models in greater detail.

1.3 Fault Tolerance in Web Services

With mission critical web service compositions often being dependent on rented ser-

vices, high risk is involved in the process of binding tasks in the workflow to concrete

web services. When reputation is at stake, the web services designer cannot simply

delegate a task in the work to a single web service. He must have redundant web

services for every critical task in the composition to account for failures that may

1.3. FAULT TOLERANCE IN WEB SERVICES 5

occur. Traditional fault tolerant approaches like N-version programming, recovery

blocks and deadline mechanisms may be used to enhance reliability and/or perfor-

mance of web services. Fault tolerant constructs are based on the simple principle

that “two heads are better than one”. When one is contended with choosing a web

service (liable to failures) as part of a critical web service composition, it is imperative

that one backs it up with sufficient redundant implementations. In order to alleviate

design faults, the redundant implementations must have different designs so that one

can expect a design fault is not repeated across all designs - the key assumption in

all software fault tolerance approaches. In distributed systems, one also finds redun-

dant implementations of the same design being deployed to improve performance and

availability through better load balancing.

However, web service standards currently do not have any provision in the architec-

tural infrastructure for software fault tolerance. Redundancy is generally incorporated

in a proprietary manner by service developers. Looker and Munro [39] were the first

to propose N-version programming in web services. Sommerville [19] followed with a

container-based approach to fault tolerance, allowing the container to be configured

with a policy which specifies what kind of fault tolerance mechanisms may be ap-

plied to the services it contains. Dobson [17] has used the mechanisms provided by

WS-BPEL to support NVP and recovery blocks.

Our framework provides a platform through which the web services composer can

easily bind redundant web services to a task after knowing the QoS improvement

that may be expected in the overall BPEL process. We present four different fault

tolerance constructs, show how they may be written using WS-BPEL and illustrate

QoS computation methodologies for each of them. These formulations for QoS deter-

mination may be tied to our QoS computation framework for WS-BPEL, to estimate

QoS in presence of redundancies.

1.4. MOTIVATION 6

1.4 Motivation

The SOA dream consummates in a marketplace for services wherein more services

will be produced by brokers who will be composing various web services to provide

new functionality rather than coding programs from scratch. As in any competitive

market, where a number of offerings are available for the same functionality, Quality

of Service is slated to be the key differentiator. The consumer shall select a web service

that best suits his QoS requirements and enter into the process of SLA negotiation

with the vendor of that service. This has led to research on Quality of Service gaining

prominence in the area of web services.

The orchestrator has to judiciously choose every web service that he binds to the

composition in order to attain a high level of QoS and meet his SLA requirements .

A tool that can provide accurate estimates of QoS of the resultant WS-composition,

given the values of QoS parameters for constituent web services (as specified in their

SLAs) will come in most handy for the integrator. This paper aims to provide such

a framework for QoS determination in BPEL processes. Again, it might be possible

that the designer is unable to find a single web service that may perform a task to the

desired levels of QoS. The tool also helps in QoS improvement through fault toler-

ance constructs and re-computation of QoS of the BPEL process with redundancies.

Section 1.4.1 argues that the WS-BPEL 2.0 standard compliant work presented here

is significant illustrating the fact that QoS modeling in BPEL is far more challenging

than the same for traditional workflow systems. Section 1.4.2 throws light upon fresh

motivations for application of fault tolerance constructs in a web services world.

1.4.1 BPEL - A Case Apart

Business Process Execution Language (BPEL) has emerged as the de-facto standard

for representation of industrial workflows either as abstract processes or as executable

processes that can be easily deployed. BPEL (its first standardized version was known

as BPEL4WS1.1, later ratified as WS-BPEL 2.0 after some modifications) evolved

from two early process modeling languages, namely, IBM’s WSFL (Web Services Flow

1.4. MOTIVATION 7

Language) and Microsoft’s XLANG. Thus, it combines the best features of a block

structured process language (XLANG) with those of a graph-based workflow language

(WSFL).

The XML based language includes tags for hierarchical control flow constructs

such as 〈sequence〉, 〈if〉, 〈while〉 etc. as well as models graph like behavior with

the help of the notion of links that are allowed to synchronize various activities

inside a flow activity. BPEL has been proven to be more expressive than most

other workflow modeling languages [59]. It is able to capture some of the workflow

patterns exclusively because of the power it derives from the synchronization links

and the transition and join conditions one can impose on these links. Moreover,

it provides support for fault handling and also backward recovery of long running

transactions through fault handlers and compensation handlers respectively. It also

allows event driven programming in some sense with constructs like receive, pick

and event handlers. All of these make BPEL much more powerful than traditional

workflow languages - most of which support only block structured flow constructs.

The reasons cited above necessitate QoS computation for BPEL processes to be

organized differently from that for the traditional workflow systems. BPEL workflows

can be made arbitrarily complex and may not be always decomposable into a com-

bination of simple structured constructs as outlined in Cardoso’s work [11]. Despite,

QoS management of web service compositions being the focal point of research in web

services for the past few years, no attempts have been made to come up with a clean

formulation for determination of QoS of BPEL processes. With WS-BPEL playing a

major role in the realization of the SOA dream, a comprehensive model for the QoS

estimation of BPEL workflows is clearly warranted.

This paper outlines separate strategies for calculating reliability, response time, and

cost for BPEL processes, given the values of these parameters for the component web

services and a few other estimates about the control flow structure. A dependency

graph structure is built by annotating all the activity nodes in BPEL process tree with

1.4. MOTIVATION 8

their dependencies. The QoS parameters are computed thereof at each activity. Reli-

ability for an activity is taken to be the probability that it will successfully complete

execution. It is computed as the product of the probability that all its dependencies

have are successfully completed and the conditional probability of its success given all

preconditions have been satisfied. Response time (modeled as a random variable) is

estimated by tracking for each activity, the cumulative time, measured from the start

of the process, within which it is expected to complete. Expected cost is ascertained

on the lines of the reduction based approach as the sum of costs of all activities after

suitable relaxations by their probabilities of execution.

1.4.2 Fault Tolerance in a WS-World - New Opportunities

Traditionally, software fault tolerance approaches have tried to capitalize on the design

diversity present in the redundant implementations. Although these techniques were

proposed long back in 70’s, they were never used with alacrity because coding diverse

implementations proved to be very costly. In the web services era, cost is less of an

issue since the companies do not have to build the components from scratch - they only

pay a certain fee for them. Also with interfaces for various software modules getting

standardized, a large number of implementations offering the same functionality is

expected to be available - many of them actually diverse in their designs. Also, fault

tolerant constructs will enable the orchestrator to compare web services against one

another by actually running them together for a length of time, before forming an

opinion about their vendors.

Techniques such as N-version programming, recovery blocks have been primarily

devised to make software resistant against design faults. But, in a WS world, service

compositions have to guard against node failures too. The fact that the redundant

components in a fault tolerant design are widely spread out, deployed at geographi-

cally separated hosts; protects the service in case a network partition occurs between

the site of service composition and that of one of its components.

Chapter 2

Related Work

Project management techniques like CPM (Critical Path Method) and PERT (Pro-

gramme Evaluation Review Technique) [26] have been used traditionally to track

performance and cost of business workflows. Orthogonally, techniques like Reliability

Block Diagrams [33] have been used to model reliability of complex systems. In the

recent years, modern workflow management systems provide an integrated platform

to monitor, analyze and improve business processes in terms of the Quality of Service

they deliver. Moreover, the workflow modeling languages available have much more

expressive power and hence better formulations for QoS computation are necessary.

For example, CPM/PERT does not support conditional or loop structures.

In this chapter, we review the state of the art of QoS determination of workflows,

fault tolerance literature in web services and present an overview of WS-BPEL that

has emerged as the workflow modeling language of choice in the web services world.

2.1 QoS in Workflows

2.1.1 Cardoso’s QoS Model

Cardoso’s thesis [11] is the seminal work in literature to demonstrate the importance

of Quality of Service in workflows and propose a framework for estimation of QoS in

web service processes. The thesis presents some insight into fidelity computation and

2.1. QOS IN WORKFLOWS 10

a detailed formulation for the following three QoS dimensions.

1. Task Time (T) : Task response time refers to the time taken by a request to

be processed by a task, measured from the inception of the request. The task

response time includes delay time, which is a sum of queuing delay and setup

time as well as process time.

2. Task Reliability (R) : For modeling reliability, Cardoso takes into consider-

ation two kinds of failures: system failures and process failures. Reliability is

defined as:

R(t) = 1− (SystemFailureRate + ProcessFailureRate)

3. Task Cost (C) : Task cost is taken to be cost incurred by the service provider

when a task is executed and it has also been broken down into two components:

enactment cost and realization cost. The enactment cost is the cost associated

with the management of the workflow system and with the monitoring of work-

flow instances. The realization cost is attributed to the runtime execution of

the task.

Cardoso proposes Stochastic Workflow Reduction (SWR) to arrive at QoS esti-

mates for the overall workflow, provided the QoS values for all tasks in the workflow

are known. The SWR algorithm repeatedly applies a reduction process on various

structured constructs until only one atomic task remains. He introduces reduction

rules for systems listed below:

1. Sequential system: Two tasks ti and tj that are in sequence may be reduced

to a single task tnew with the help of the following reduction formulae. Cardoso

notes that the for such a reduction to take place ti must not a xor/and split


and tj should not a xor/and join.

T (tnew) = T (ti) + T (tj)

C(tnew) = C(ti) + C(tj)

R(tnew) = R(ti)×R(tj)

2. Parallel system (and split/and join) : In a parallel system multiple tasks

(t1, t2, . . . , tn) can be concurrently executed after an and split task ta merged

with synchronization in an and join task tb. SWR reduces the system to ta

, followed by a new activity tnew (QoS parameters for it are defined below)

and tb. All incoming transitions to ta and outgoing transitions from tb remain

unaltered.

T (tnew) = Maxi∈{1,2...n}{T (ti)}C(tnew) =

∑1≤i≤n

C(ti)

R(tnew) =∏

1≤i≤n

R(ti)

3. Conditional system(and split/and join) : A conditional system is made

up of tasks (t1, t2, . . . , tn), one of which is possibly carried out subject to satis-

faction of the condition associated with it. The probabilities of execution of the

branches are given by (p1, p2, . . . , pn). These conditional tasks emanate from a

XOR split task and merge in a XOR join task. The reduction collapses all the

conditional tasks into one new task tnew) that is then sandwiched between the

split and the join tasks. However, the reduction is limited by the restriction

that there cannot be any other outgoing transitions from the split task and

incoming transitions to the join task except for the conditional branches. QoS


estimates of the task tnew is given below:

T (tnew) =∑

1≤i≤n

pi × T (ti)

C(tnew) =∑

1≤i≤n

pi × C(ti)

R(tnew) =∑

1≤i≤n

pi ×R(ti)

4. Loop System: Cardoso characterizes a loop task (tloop) by its probability (ploop)

of repeating the loop and the probabilities (po1, po2, . . . , pon) of its outgoing

transitions. After removal of the loop, the reduced task (tnew) only has outgoing

transitions with each of their probabilities relaxed by a factor of 1− ploop.

T (tnew) =T (tloop)

1− ploop

C(tnew) =C(tloop)

1− ploop

R(tnew) =(1− ploop)×R(tloop)

1− ploopR(tloop)

5. Fault Tolerant System: A k-out-of-n system with n tasks (t1, t2, . . . , tn), is

modeled along with an and split task and a XOR join task. The tasks in the n

branches can be replaced by a single task tnew and QoS estimated through the

following formulae:

T (tnew) = kthMini∈{1,2...n}{T (ti)}C(tnew) =

∑1≤i≤n

C(ti)

R(tnew) =1∑

i1=0

· · ·1∑

in=0

∑g

(n∑

j=1

ij − k

)

×((1− i1) + (2i1 − 1)R(t1))× · · ·×((1− in) + (2in − 1)R(tn))


A

B

D

EC

Figure 2.1: A system that is neither series nor parallel

X O R J o i n T a s k

X O R S p l i t T a s k

C o n d i t i o n a l T a s k s

S u c h a t r a n s i t i o ni s n o t a l l o w e d

Figure 2.2: Synchronization with links

where, g(x) takes up either 0 or 1 for values x > 0 and x ≥ 0 respectively

Limitations

There is a restrictive rider added with most of the reduction rule-sets given above.

For example, in the sequential system the start task cannot be a split and the end

task cannot be a join. Cardoso’s model adapts from the reductions used in standard

reliability theory for computing reliability of series-parallel systems [33, 27]. But the

model is not capable of handling complex systems such as the one shown in Figure 2.1

that can neither be reduced to a series nor decomposed as a parallel system. Reliabil-

ity modeling of such systems can be performed using various approaches such as path

tracing, state enumeration, decomposition method and cut set method. Again, only

structured programming constructs have been considered here. Presence of goto-like

transitions that extend from one structured construct to another as shown in Figure

2.2 prevent application of the proposed reductions. Cardoso treats QoS for a task as

a deterministic value. But due to the uncertainty that is generally associated with

web services, especially if they are not owned by the integrator’s enterprise, it is more


realistic to model QoS probabilistically.

2.1.2 Other Efforts

Hwang et. al. [28] propose a probabilistic framework for QoS computation. They

extend Cardoso’s model to have each QoS parameter for a web service represented by

a discrete random variable having a certain probability mass function (PMF). The

work discusses how to efficiently aggregate PMFs of a number of random variables

over different domains and contrasts greedy and dynamic programming approaches

for sample space reduction in the aggregation problem.

Canfora et al. [9] apply Cardoso’s QoS model with minor modifications in their

middleware that uses genetic algorithms for QoS aware composition and replanning.

They make use of estimated number of loop iterations in the reduction for a loop

rather than unfolding loops based on probability of repeating the loop. They also

introduce availability as a parameter with exactly same reduction rules as reliability.

Zeng et al. [61] model composite web services as state charts and put forward aggre-

gation functions to ascertain QoS of execution plans. The critical path of all possible

execution paths is determined and the duration of that path is taken to be the dura-

tion of the composite service. Estimation of successful execution rate and availability

taken into consideration only critical tasks based on the assumption that non-critical

tasks can be re-executed successfully without altering the QoS characteristics of the

final response. The model is simplistic since the state charts are allowed to have only

two types of compound states, namely, AND states and OR states, with no provision

for conditional or loop constructs. Menasce [43] presents a formulation to determine

throughput in composite web services from the flow graph that lists the web service

invocations. Jaeger et al. [35] propose aggregation of QoS dimensions on the workflow

patterns listed in Van der Aalst’s seminal work [1]. The approach is an elegant one

but the authors do not explain how to dig out such workflow patterns from a process

an to carry out an implementation of the same. Model-driven computation of QoS

has gained attention of late. We briefly look at some of the techniques used in the

section below.

2.2. FAULT TOLERANCE 15

Model Driven QoS Computation

D’Ambrogio and Bocciarelli [16] propose a model driven approach wherein a BPEL

process is described by an UML (Unified Modeling Language) model, extended ac-

cording to the UML Profile for Automated Business Processes [3]. The UML model

is then annotated with performance data and a LQN (Layered Queueing Network)

model is obtained, which is solved to predict performance of the BPEL process. Al-

though the process of conversion of models built according to the UML Profile into

BPEL has been thoroughly described in [3], the complex control flow offered by BPEL

have not been exhaustively mapped back onto the UML profile. However, BPEL to

UML transformation is an active research topic [48].

2.2 Fault Tolerance

Traditionally, software fault tolerance research has revolved around two approaches

- N-Version programming formulated by Avizienis [6] and Randell’s Recovery Blocks

[46]. N-version programs run multiple implementations providing the same function-

ality but having diverse designs in parallel. Thereafter, majority voting is done to

obtain the result of the programs. Recovery blocks invoke the redundant alternate

implementations sequentially if the output produced by one cannot pass assertion

checks. These two fault tolerant constructs have been studied in various contexts.

Reliability improvement that may be possibly derived through the use of these con-

structs have been analyzed [21, 20, 15, 44]. Other topics that have gained interest in

fault tolerance research include how to provision voting mechanisms [45] and generate

acceptance tests [4].

The notion of dependability has been studied at various levels in the area of web

services. Figure 2.3 lists the different levels where various standards exist to model

behavior of web services. Reliability of message exchanges has been documented in

OASIS standards such as WS-Reliability and WS-ReliableMessaging. A host of se-

curity standards (WS-Security, WS-Trust, WS-SecurityPolicy etc. make use of avail-

able encryption techniques to model integrity and confidentiality in SOAP messages.

2.2. FAULT TOLERANCE 16

W S - R e l i a b l eM e s s a g i n g ,W S - R e l i a b i l t y

W S - S e c u r i t y +

H T T P S

W S - T r a n s a c t i o n W S - C o o r d i n a t i o n

W S - B P E LF a u l t H a n d l e r sC o m p e n s a t i o nH a n d l e r s

R e d u n d a n c y i nW S c o m p o n e n t s

N e t w o r k L e v e l

M e s s a g e L e v e l

T r a n s a c t i o n L e v e l

C o m p o n e n t L e v e l

Figure 2.3: Dependability Stack in Web Services

WS-Transaction and WS-Coordination define mechanisms for transactional interop-

erability between Web services domains and seeks to ratify transactional qualities of

service in web services applications. At the level of web service composition, we have

WS-BPEL constructs such as fault handlers and compensation handlers that help the

designer incorporate error handling and backward recovery in case of failures. The

block at the top right corner signifies our focus area of research, i.e., how to make

use of redundant components to improve dependability. No web services standard

exists as of date in this area. However, there have been some research efforts that use

redundancy as a tool for dependabilty enhancement.

2.3. WS-BPEL 17

2.3 WS-BPEL (Business Process Execution Lan-

guage)

2.3.1 Evolution

Web services aim at providing an environment that supports flexible integration of

business processes implemented as heterogeneous systems and in diverse platforms

across enterprise boundaries. A standard process integration model is essential to let

business processes and applications carry out complex interactions that are often long

running.

A host of standards have come up in the process integration space, each sup-

ported by a company or a standards body. Microsoft’s XLANG (Web Services for

Business Process Design) [52], one of the earliest process modeling languages, is block-

structured with basic control flow structures such as sequence, switch, while, all (for

parallel routing), and pick (for forking activities based on timing or external triggers).

IBM’s WSFL (Web Service Flow Languages) is a unique graph based language, offers

capabilities to represent control flow as directed graphs that can be nested but must

be acyclic. Again it derives most of its control flow constructs from the workflow lan-

guage of IBM’s MQ Series Workflow. Web Service Choreography Interface (WSCI) is

an XML-based interface description language that describes the flow of messages ex-

changed by a Web Service participating in interactions with other services. WSCI was

conceived and developed by BEA, SAP, Intalio and Sun Microsystems. Intalio pro-

moted the Business Process Management Initiative (BPMI.org) which came up with

BPML (Business Process Markup Language). ebXML (Electronic Business using eX-

tensible Markup Language) contains BPSS (Business Process Schema Specification),

which is yet another workflow language with similar capabilities. The plethora of

standards, most of which are overlapping and add no real value, have contributed in

great measure only to WSAH (Web Services Acronym Hell).

Business Process Execution Language (BPEL), combines the capabilities of both

XLANG and WSFL. BPEL 1.0 was jointly developed by IBM, BEA, SAP, Siebel,

2.3. WS-BPEL 18

and Microsoft in August 2002. In April 2003, BPEL 1.1 [53], which came to be

known as BPEL4WS, was submitted to OASIS for ratification. WS-BPEL 2.0 was

approved as an OASIS standard in April, 2007 by a technical committee with rep-

resentatives from 37 different organizations. Two other standards, WS-Coordination

and WS-Transaction strengthen to BPEL’s cause in supporting long running trans-

actions. They lend a framework for distributed processes to interact and let ACID

transactions to happen between business activities. Also, WS-BPEL utilizes several

XML specifications: WSDL 1.1, XML Schema 1.0, XPath 1.0 and XSLT 1.0. Due to

BPEL’s greater expressive power (see a comparison of various workflow languages in

[57] in terms of the different workflow patterns supported by them) and the patronage

it received from the two giants IBM and Microsoft, it eventually managed to leave

the pack behind to emerge as the choicest of all web service composition languages.

2.3.2 Brief Overview

WS-BPEL is intended for modeling two types of processes: executable and abstract

processes. An abstract process may hide some of the concrete operational details that

are required by an executable artifact and can thereby serve as a process template

capturing the process logic embodying the domain specific best practices. WS-BPEL

lays down a grammar for capturing the behavior of a business process based on inter-

actions between the process and its partner processes. The notion of 〈partnerLinks〉is used to model peer-to-peer conversational partner relationships. Again, the actual

partner service may be dynamically determined within the process.

A WS-BPEL process specification is analogous to a flow-chart. Each element in

the process is called an activity. An activity can be of two types: basic or structured.

Basic activities either describe interactions with other partners or model primitive

steps in the process. Structured activities encode control-flow logic and can have

other activities nested in them. WS-BPEL 2.0 has nine different basic activities that

are listed below:

1. invoke: An 〈invoke〉 activity is used to call operations (either one way or

2.3. WS-BPEL 19

request-response) embodied in web services offered by partners of the business

process being described.

2. receive: A 〈receive〉 activity is used to accept inbound messages from part-

ners of the service being provided by the business process. A receive activity

annotated with createInstance = ”yes” denotes the starting point of execution

of the service.

3. reply: A 〈reply〉 activity is used to send a response to a request previously

accepted through an inbound message activity such as receive or pick.

4. assign: An 〈assign〉 is used carry out updates on variables.

5. throw: A 〈throw〉 activity is used to signal an internal fault explicitly.

6. wait: A 〈wait〉 activity forces the process to be delayed for a certain period of

time or wait until a certain deadline is reached.

7. exit: A 〈exit〉 activity immediately ends the business process instance terminat-

ing all running activities without execution of any fault handlers or termination

handlers.

8. rethrow: A 〈rethrow〉 activity is used inside fault handlers to rethrow the fault

they caught, propagating the fault name and the fault data of the original fault.

9. empty: An 〈empty〉 activity does nothing but sometimes finds use as a syn-

chronization point or for supression of faults.

Apart from these basic activities, WS-BPEL has a provision for addition of new

activities using the tag 〈extensionActivity〉. WS-BPEL also enumerates seven differ-

ent structured activities.

1. sequence: A 〈sequence〉 activity contains one or more activities that are per-

formed sequentially, in the order in which they appear within the 〈sequence〉element. The 〈sequence〉 activity completes when the last activity nested within

in the sequence has completed.

2.3. WS-BPEL 20

2. flow: A 〈flow〉 activity provisions execution of activities concurrently and also

allows for synchronization between the activities contained in it through the

notion of links. The 〈flow〉 completes on completion of all activities nested

directly within the flow. However, skipping execution of activities within a flow

is allowed if their enabling conditions evaluate to false.

3. if: An 〈if〉 activity consists of one or more conditional branches defined by

the 〈if〉 and optional 〈elseif〉 elements, followed by an optional 〈else〉 element.

The first branch whose 〈condition〉 holds good is taken, and the activity nested

within it is executed.

4. pick: A 〈pick〉 activity waits for the occurrence of exactly one event from a

set of events and executes the activity contained within that event. The events

can either be receipt of inbound messages (〈onMessage〉) or triggering of timer

based alarms (〈onAlarm〉).

5. while: A 〈while〉 activity is one of the three constructs for provisioning loops.

The activity contained in 〈while〉 is executed until the 〈condition〉 at the start

of the loop evaluates to false.

6. repeatUntil: In a 〈repeatUntil〉 activity, the contained activity is executed

until the given 〈condition〉 becomes true.

7. forEach: A 〈forEach〉 activity provides a loop structure controlled by an

implicit index variable that is initialized to 〈startCounterV alue〉 and ends in

〈finalCounterV alue〉. The number of iterations can be further limited by spec-

ifying a 〈completionCondition〉 wherein one can force the construct to happen

”atleast K-out-of-N” times, where K is the unsigned integer value given by the

〈branches〉 expression. The value of the parallel attribute lends an unique fea-

ture where the ”iterations” of the loop can happen in parallel. In case the

parallel attribute is set to ”yes”, the nested 〈scope〉 is replicated as many times

as the number of iterations of the loop and the index variable takes up val-

ues from 〈startCounterV alue〉 through 〈finalCounterV alue〉 in each of these

2.3. WS-BPEL 21

branches.

WS-BPEL’s notion of a 〈scope〉 offers the ability to specify a behavioral context

within which an activity may execute. A scope allows definition of variables, partner

links, message exchanges and correlation sets that are visible only within the scope.

Event handlers, fault handlers, a compensation handler, and a termination handler

may also be attached to a scope. A brief description of these handlers are given below.

• Event Handler: It provisions event driven programming to some degree in

WS-BPEL. The activity associated with an event is executed when the corre-

sponding event is fired. Again events may be either incoming messages or timer

based alarms.

• Fault Handler: The 〈catch〉 or 〈catchAll〉constructs inside a fault handler

help to intercept faults that might occur and specify appropriate measures that

need to be taken to negate their effects.

• Compensation Handler: It allows backward recovery through the compen-

sation logic that it contains.

• Termination Handler: It helps forced termination of a scope by terminating

its primary activity, stop all running event handler instances and then execution

the activity contained in the termination handler.

Chapter 3

QoS Computation

We have already argued in previous chapters that the workflow QoS models available

in literature are geared towards structured programming constructs and illustrated

their inabilities to cope with the graph based patterns supported through BPEL (See

Section 2.1.1). Our QoS model is specifically designed to deal with the complex graph-

like structures that may be written by tapping WS-BPEL’s greater expressive power.

Also, it provides mechanisms to handle fault handlers and event driven programming

that may be embedded in WS-BPEL processes .

The QoS dimensions considered in our framework, namely, response time, reliabil-

ity and cost; are the three most important parameters that all successful companies

must track in their strife to remain competitive [25, 24]. In the approach outlined

here, these QoS dimensions are evaluated at each activity enroute to QoS computation

for the overall BPEL process.

In a reduction based approach for computation of a QoS parameter of a structured

activity (sequence / parallel / conditional / loop), one independently determines

the values of the parameter for all child activities and then composes them using

an appropriate aggregation function. Since activities in WS-BPEL may be heavily

intertwined with synchronization links, independent computation of QoS parameters

followed by aggregation is not possible. We follow a more direct approach to infer QoS

of an activity by tracking all the dependencies that the activity may have. The term

dependencies of an activity refers to other activities that play a role in determining

23

when the activity may start. Effectively listing the dependencies of all activities in a

BPEL process is central to our approach and helps us to tackle the challenges posed

by the graph based nature of BPEL.

Reliability of a BPEL activity is typified by the probability that it will successfully

complete execution. Here, the term successful completion encompasses the activity

delivering its desired functionality to the effect that it measures upto the expecta-

tions of all other activities that might be dependent on it. A successful web service

invocation would mean the service being called with appropriate inputs, the service

being available and it performing the required functionality entirely in conformance

with its semantic implications. Reliability, as given in Service Level Agreements,

refers to the conditional probability that the web service will succeed provided it

gets a valid input. Again, network failures may jeopardize arguments on their way

to the service provider’s site and prevent results from reaching the client. Now, the

WS-composition can be assured to supply an appropriate input to a constituent web

service if all other activities on the execution path leading upto the point of invocation

have done their jobs properly. Thus, in order to derive the probability of successful

execution of a web service, one needs to multiply the conditional probability of its

success with the probability that all its dependencies have been successfully com-

pleted. The QoS model computes the probability of successful completion for each

activity in the BPEL process.

Response Time is modeled as a random variable characterized by values for mean

and standard deviation. The expected time taken for completion if the activity com-

pletes in a run of the BPEL process is estimated for each activity. All times are

measured relative to the start of the BPEL process. Response time of the overall

BPEL process can be taken to be the expected completion time of its main activity.

We follow the steps given below to compute the end time, ETX , of an activity X:

1. The expected time by which all the dependencies of the activity are complete

and the activity is ready to start, STX is determined.

3.1. PASSPORT APPLICATION SERVICE EXAMPLE 24

2. The end times of all the child activities (activities nested within X), are esti-

mated.

3. The end times of all the child activities and STX are suitably aggregated to

compute ETX .

Cost is computed more on the lines of the reduction based approach that is preva-

lent in QoS models for traditional workflow systems. For each activity, we compute

the probability PC that the activity will start execution given that the parent ac-

tivity starts. Broadly speaking, expected cost for an activity is calculated as sum of

costs of all its child activities after relaxing them by the respective PCs of the child

activities.

3.1 Passport Application Service - A Running Ex-

ample

We digress to introduce a running example of an online passport application service.

The passport office workflow is implemented with the help of WS-BPEL and deployed

as a web service. In this section, we briefly look at how the WS-BPEL constructs may

be effectively used to model the passport office workflow that calls several external

web services hosted by other government departments. This example will be cited at

several places throughout this thesis to elucidate various concepts.

Before a passport may be issued, the date of birth and the place of residence needs

to be verified. The passport application offers a choice for an age-proof document -

the applicant may either furnish the transcript issued by his/her education board or

his/her birth certificate or both. Depending upon the type of age-proof its verification

process will be delegated to either of the education board or the municipal office. The

address proof will always be verified by the municipal office. After verification of age

and address, one may proceed to issuing a passport only if the bank payment has

been made by the applicant.

3.1. PASSPORT APPLICATION SERVICE EXAMPLE 25

E d u c a t i o n B o a r d D O B V e r i f y

 

M u n i c i p a l B o a r d D O B V e r i f y

M u n i c i p a l B o a r d A d d r e s s V e r i f y



< l i n k n a m e = " X " > < l i n k n a m e = " Y " > < l i n k n a m e = " Z " >

< t r a n s i t i o n C o n d i t i o n >

v e r i f y R e s u l t = t r u e





< f l o w >



< c o n d i t i o n > $ B a n k P a y m e n t = " P a i d "



P a s s p o r t O f f i c eM a k e P a s s p o r t

< j o i n C o n d i t i o n >

( $ X o r $ Y ) a n d $ Z

Figure 3.1: Passport Office BPEL Process

In our example, all the verification processes and the passport issue process are

deployed as web services. The passport office workflow composes these web services

through WS-BPEL. A part of the BPEL process is illustrated graphically in Figure

3.1. The workflow logic described above is encapsulated in a flow activity and the

control flow is captured through the notion of synchronization links. The links named

X, Y and Z are used to ensure that the web service to issue passport may be started

only after the web services for verification have successfully completed execution.

Further, the 〈transitionCondition〉s defined at the source of each link guarantee that

the link may be taken (the status of the link evaluates to true) only if the furnished

document has been successfully validated. The boolean expression on the links defined

as the 〈joinCondition〉 at the invoke for passport issue web service examines whether

at least one age-proof is valid and the given address proof has been verified. The if

condition checks whether the bank payment has been done or not. The status of the

3.2. QOS MODEL USAGE 26

bank payment is set by some means that is outside the scope of this example.

It may be noted that the links enter the if activity from outside and control the

execution of an activity nested within it. This is an example of the goto type of

links, illustrated in Figure 2.2, that are not advocated by structured programming

paradigm.

3.2 QoS Model Usage

This section will answer questions like: ”Who is the targeted user?”, ”Does the model

find use at design time or run time?”, ”Does the model add any value if the dream of a

marketplace is not realized?”, ”What infrastructure should be in place for the model

to work?”, ”What are the inputs to the model and how would one obtain them?” etc.

3.2.1 User Environment

A tool that implements the QoS model for WS-BPEL will come as a boon to the

population of web service brokers who compose web services in BPEL out of readily

available web services. The web service designer is always striving to improve upon

quality whilst staying within his budget. At the very least, the orchestrator has to

ensure adherence to the QoS limits to be stated in the SLA that he may sign with his

clients. He tries out several combinations of binding concrete services to tasks in the

workflow and wants to have an idea of the QoS rendered by the composite service for

each of these combinations. This process of arriving at the optimum orchestration

may be automated [61, 9], but even there one requires estimation of QoS of the WS-

composition. Again, once the BPEL process has been deployed its QoS has to be

constantly monitored and re-orchestration might become necessary. Thus, the tool

may come in handy at run-time if a need arises for dynamic reconfiguration of the

deployed BPEL process.

As pointed out in earlier chapters, one of the primary motivations of such a design

lies in the fact that a marketplace for services will evolve where several competing

services will be available providing the same functionality. The dream of having

3.2. QOS MODEL USAGE 27

such a marketplace has taken a setback with the three giants, IBM, Microsoft and

SAP, discontinuing their public UDDI registries since early 2006. However, even the

internal repositories of large companies can be expected to have duplicate services

for the same functionality, each varying in their costs and correspondingly offering a

different degree of QoS. Thus, different web services may have to be chosen for the

same task in different projects depending upon the quality and cost requirements at

hand - our tool will simply facilitate such a selection.

3.2.2 Inputs to the Model

The web service composition in BPEL may include invocations of various other web

services and the service composer may have entered into service level agreements with

the providers of the constituent web services. The WS-composition can be assured

to have the component web services comply with the QoS limits mentioned in their

SLAs. The model presented here expects values of response time, reliability and

cost for all web service invocations of the BPEL process. The limits on these QoS

parameters stated in the SLAs after accounting for the failures and delays introduced

by the network can serve as inputs to the model. The input for reliability of a web

service is a number between 0 and 1. Response times inputs are specified by a 2-tuple

- (Mean, Standard Deviation). Costs are taken as fixed numeric values.

Average waiting times for all activities used to intercept inbound messages to

the business process are required by the model. Waiting time for receive is taken

to be the amount of time elapsed between the completion of all its dependencies

and the completion of the receive activity. It is assumed that if the process has

already received the corresponding incoming message, then the receive activity will

instantaneously complete, otherwise the particular thread of execution will halt and

can resume only after the message is received. Similarly, we require the average

waiting times for the onMessage events inside a pick activity or an event handler.

Waiting time of an onMessage event is taken to be the time between the start of the

parent pick activity or event handler and the instant when the corresponding message

arrives, if the message does arrive. Such average waiting times may be obtained from

3.3. MODEL PRELIMINARIES 28

logs of past executions of the business process collected by monitoring mechanisms

integrated with the BPEL engine.

The model also assumes certain parameters that characterize the control flow of

the business process to be available as inputs to it. These have been listed below:

• For every branch in an if activity, the probability of a branch being taken.

• For all events (both message based and timer based) attached to pick activities,

the probabilities that they get fired.

• For each transitionCondition, the probability of its success.

• For every catch or catchAll block, the fraction of failures of its parent scope

that it is able to catch.

• For all loops, the average number of iterations taken by them in an execution

of the business process.

• For all message based events in event handlers, the average number of times

they are fired in one execution of the associated scope.

Again, all of these attributes may be extracted from the execution logs of the business

process.

Our passport application example (See Section 3.1) will require as inputs the QoS

parameters, Reliability, Response Time and Cost for each of the four component web

services. Moreover, we will have to supply probabilities of the transition conditions

(defined at sources of links X, Y and Z) evaluating to true, and the probability that

the variable bankPayment attains the value “paid”.

3.3 Model Preliminaries

All computations in our QoS model happen at the level of a activity or a scope or a

handler, which are various units of encapsulation of process logic in WS-BPEL. The


BPEL workflow is represented as a graph where the activities/scopes/handlers are

represented by nodes. Naturally, the model maintains a host of parameters for each

node in the WS-BPEL process. A node, X, in the BPEL process is annotated by: (a)

its child activities, i.e., activities that are directly contained by X (b) its dependencies,

i.e., activities that must necessarily complete before X can start. Additionally, invoke

activities and scopes are attached to catch blocks, fault handlers and compensation

handlers that they may be associated with.

Dependencies and Join Conditions

Here, we formally define dependency of an activity and also categorize the nature of

a dependency. A unit of execution X in WS-BPEL to be dependent on another unit

Y , if either of the following holds:

• Type A: Y is a structured activity (cannot be a loop) / scope / catch block

/ event and X is any activity / scope / event contained in it. In this case, if

Y is a sequence then X can only be the first activity within it; if Y is a flow

then X (a child of Y ) must not contain any of the links defined under Y as its

incoming links. In other words, X is an activity that marks the beginning of

execution of a sequence or a flow. Loop constructs viz. while, repeatUntil and

forEach are handled differently and thus the activity enclosed by a loop does

not have a Type A dependency.

• Type B: X is the ith child of a sequence and Y is (i − 1)th child of the same

sequence, i.e., an activity is dependent on a prior activity in sequence provided

the dependent is not the first activity in sequence

• Type C: Y is the source of a link whose target is X.

It may be noted that Type B and Type C dependencies qualify as control dependencies

because they imply completion of Y as a necessary requirement for X to start. In

Type A dependencies, Y must start before X can but X should complete before Y

does. In the BPEL process for passport application, the 〈invoke〉 for the passport


issue web service has the enclosing 〈if〉 activity as its Type A dependency and the

three other 〈invoke〉 activities as its Type C dependencies. The〈invoke〉 activities for

verification and the 〈if〉 activity have the 〈flow〉 as their Type A dependency.

A join condition for an activity contains a boolean expression defined on the incom-

ing links for the activity. An activity may start only if its join condition evaluates to

true. If no join condition element is present, a disjunction (logical-OR) of all incom-

ing links is taken as default. A transition condition on a link is defined at its source

and refers to the condition defined on variables that must be passed for the link to

attain a true value. The probability that a link assumes a true value is dependent

on the successful completion of the source activity and the transition condition being

evaluated to true.

P (linki = true) = P (successsourcei).P (transitionCondi = true) (3.1)

The suppressJoinFailure attribute is instrumental in defining the behavior of ex-

ecution if a joinCondition is evaluated to false. If the value of the attribute is yes,

then the activity is skipped on the face of its joinCondition being false, but no

bpel:joinFailure is generated. A false status is assigned to all outgoing links of the

skipped activity. However, the default value of suppressJoinFailure is no. For our

purposes, we assume the suppressJoinFailure attribute for all activities to be set to

no and all statements made (if not explicitly mentioned otherwise) are based on such

an assumption. Section 3.4.2 looks at modeling QoS if the attribute has a yes value.

Evaluation of Join Condition

The QoS model will estimate the probability that an activity starts after successful

completion of all its dependencies. For such an estimation to happen, it becomes

imperative to compute the probability that the join condition for an activity evaluates

to true.

First, the boolean expression of the join condition is converted to a canonical Sum


of Products (SOP) form. A generalized canonical join condition may be written as:

(A11 ∧ A12 ∧ . . . ∧ A1n1) ∨ (A21 ∧ A22 ∧ . . . ∧ A2n2)

∨ . . . (Am1 ∧ Am2 ∧ . . . ∧ Amnm)

where, Aij represent status of links, ∧ stands for logical-AND and ∨ denotes logical-

OR.

We assume the events of success of any combinations of incoming links to be indepen-

dent. Thus, the probabilities of the links may be multiplied to obtain the probability

of the ANDed portions being true.

P (Ai1 ∧ Ai2 ∧ . . . ∧ AiNi) = P (Ai1 ∩ Ai2 . . . ∩ AiNi

)

= P (Ai1).P (Ai2). . . . .P (AiNi)

Note: Here, P (Ai) is used as a shorthand for P (Ai = true).

Then, we can compute probability of success of the join condition by applying the

axiom given below that would help us to deal with the ORed parts of the canonical

expression.

P (A1 ∨ A2 ∨ . . . ∨ An) = P (A1 ∪ A2 ∪ . . . An)

=n∑

i=1

P (Ai) +∑

i 6=j

P (Ai ∩ Aj) + . . . + P (A1 ∩ A2 ∩ . . . ∩ An)

Again, we assume independence of status of the incoming links and thus can apply:

P (n⋂

i=1

Ai) =n∏

i=1

P (Ai)

The invoke activity for the passport issue web service has a join condition: (X∨Y )∧Z.

First we convert it into its SOP form: (X∧Z)∨(Y ∧Z). If the probabilities of the links

evaluating to true are known, we can compute the probability of the join condition


attaining a true value as follows:

P ((X ∨ Y ) ∧ Z) = P ((X ∧ Z) ∨ (Y ∧ Z))

= P ((X ∩ Z) ∪ (Y ∩ Z))

= P (X).P (Z) + P (Y ).P (Z)− P (X)P (Y ).P (Z)

For an activity with incoming links (having Type C dependency) to start, the status

of all its incoming links must be determined and then the join condition (implicit or

explicit) must evaluate to true. The scopes and the structured activities also play a

role in deciding when an activity contained in them may be ready for execution. The

parent scope or parent structured activity must start for a child to start execution.

Again, an activity in sequence can only start if the prior activity has completed. Thus,

the probability that an activity X will start, P (startX), is computed as a product of

the probabilities introduced below:

• P (depAX): Probability that the parent activity or scope will start, P (startparent),

if X has Type A dependency and 1 otherwise.

• P (depBX): Probability that the previous activity Y in sequence will successfully

complete, P (successY ) , if X has Type B dependency and 1 otherwise.

• P (depCX): Probability that the joinCondition is true, P (joinCondX = true),

if X has Type C dependency and 1 otherwise.

P (startX) = P (depAX).P (depBX).P (depCX) (3.2)

The passport issue invoke activity in our example has only Type A and Type C

dependencies. Thus, the probability that it may start may be computed as:

P (StartpassportIssue) = P (Startif )× P ((X ∪ Y ) ∩ Z)

The reader should note that P (startX) refers to the probability that the activity

X will start with all preconditions met, i.e, at the point of execution of X the system

3.4. RELIABILITY MODELING 33

is in a state that is in absolute conformance with the expectation of X. For a more

formal description of how to compute P (startX) see Algorithm 1

Algorithm 1 findProbStart(X) : Finding the probability that an activity X willstart

probStart ← 1.0if X has a Type A dependency then

if P (startparent(X)) is not known thenfindProbStart(parent(X))

end ifprobStart ← probStart× P (startparent(X))

end ifif X has a Type B dependency then

if P (successpredecessor(X)) is not known thenfindProbSuccess(predecessor(X)) {See Algorithm 4}

end ifprobStart ← probStart× P (successpredecessor(X))

end ifif X has a Type C dependency then

for all Y such that X has a Type C dependency on Y doif P (successY ) is not known then

findProbSuccess(Y )end if

end forCompute P (joinCondX = true)probStart ← probStart× P (joinCondX = true)

end ifP (startX) ← probStart

3.4 Reliability Modeling

In our pursuit to estimate the QoS parameters, we first compute the reliability of

the BPEL process. As mentioned earlier, by reliability we refer to the probability of

successful execution, and we determine this probability, P (successi), for each activity

i, in the BPEL process. Again, successful completion encompasses a behavior that is

both syntactically correct and semantically desirable.

Algorithm 4 shows that in order to compute the probability of success of an activity


X, we first compute P (startX) and then if X is a structured activity, determine

P (success) for all its children. However, the computation of P (success) involves

steps that are specific to each activity or scope or handler.

Algorithm 2 findProbSuccess(X) : Finding the probability that an activity X willsuccessfully complete

findProbStart(X)for all Z such that Z is a child of X do

if P (successZ) is not known thenfindProbSuccess(Z)

end ifend for· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·Activity-Wise Computation of P (successX)· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·

3.4.1 Activity-Wise Reliability Computation

In our QoS model, invocations of external web services are considered to be the only

activities that may be regarded as source of failures. All basic activities except invoke

are assumed to complete successfully provided they do start. Hence, their probability

of successful completion is equal to the probability that they get to execute. Thus,

we have,

P (successX) = P (startX) (3.3)

where X may be one of the following:

(a) receive

(b) reply

(c) assign

(d) wait

(e) throw


(f) rethrow

(g) exit

(h) empty

(i) compensate

(j) compensateScope

Sequence

A sequence said to be complete when the last activity contained in it completes

successfully. An activity nested inside a sequence can only start if the previous

activity in the sequence has been successful. Thus, if the ith child of sequence is

executing, then all child activities from the first to the (i − 1)th can be taken to be

complete . Therefore, we can model success of a sequence by:

P (successsequence) = P (successlastChild) (3.4)

Flow

A flow activity is deemed to complete only if all activities enclosed by it are complete.

If the suppressJoinFailure attribute is set to no, the flow terminates because of a

joinCondition being evaluated to false and a bpel:joinFailure is thrown to the enclosing

scope. The notion of activities being skipped and the flow still completing is applicable

only when the suppressJoinFailure has the value yes. We track Type C dependencies

and estimate the probability of joinCondition being true, to figure out whether an

activity may start. The synchronization links in effect model the control flow of

execution. Thus, it may be contended (with a similar argument to that posed in case

of sequence) that the completion of all child activities without any outgoing links

would mark the completion of the flow. Therefore, the probability of success of a flow


activity may be written as:

P (successflow) =∏

∀iP (successflowSinki

) (3.5)

where, flowSinki is a child activity of flow with no outgoing links, i.e., an activity

where the 〈sources〉 element is absent. In our passport application example, the 〈if〉activity is the sink since it does not have any outgoing links. Thus, P (successflow) =

P (successif )

If

An if activity is complete when the activity nested in the taken branch completes.

It completes immediately when no condition evaluates to true and no else branch is

specified. In order to elucidate the model, we assume a else branch with an empty

activity to be inserted where no else exists. The probability with which a branch is

taken P (branchTakeni) is obtained as an input. The probability of selection of the

else branch is computed as

P (branchTakenelse) = 1−n−1∑i=1

P (branchTakeni)

where, n is the total number of branches in an if activity.

The probability of success of an if activity is calculated as a weighted sum of

the probabilities of success of the activities contained in all its branches, the weights

being the probability with which a branch gets selected for execution.

P (successif ) =n∑

i=1

P (branchTakeni)× P (successbranchActivity) (3.6)

The 〈if〉 activity in our example does not have an else branch. Thus, we simulate

an else branch, which is selected with a probability that the bank payment is not

done.


Pick

Pick activities are treated in a similar way as if activities; with probabilities of

selection of each event being taken as input. Since, WS-BPEL stipulates that exactly

one event should be executed when a pick is started, the inputs should be validated

to check whether they add upto 1.

P (successpick) =n∑

i=1

P (eventSelectedi)× P (successeventActivity) (3.7)

where, n is the total number of events in an pick activity.

Loops

QoS modeling for loops follows a reduction based approach in the sense that com-

putation of QoS parameters for the child activity may be performed independently

without requiring QoS information of its parent activity. This is facilitated by the

WS-BPEL stipulation (see WS-BPEL 2.0 Static Analysis requirement SA00070) that

synchronization links cannot enter into repeatable constructs by crossing their bound-

aries. Note that we have not included loops in the list of structured activities that

can lead to a Type A dependency because they are dealt with differently.

Loops are handled by unfolding them to the number of iterations that they make.

In case of while and repeatUntil, the number of iterations is taken as an input, pre-

sumably obtained from execution logs. In a forEach activity, the number of iterations

is taken to be either of the following:

numIterations =

B, unsigned int value of completionCondition, if it exists,

〈finalCounterV alue〉 − 〈startCounterV alue〉+ 1 otherwise.

The various iterations of the loop are assumed to be independent and in effect the loop

construct is captured as a number of copies of the contained child activity running

in sequence. Hence, the reduction applied to compute the reliability of loops can be


formulated as:

P (successloop) = P (startloop)× P (successchild)numIterations (3.8)

Invoke

Invoke activities denote the point of calling external web services that may be prone

to failures. For each invoke activity, the model expects (as input) the reliability of the

web service bound to it. SLAs list the rate of failures that may occur even though the

web service is fed with proper inputs. Apart from failures at the site of the provider,

one also has to take into account network failures. The QoS model demands as input,

the conditional probability Rws that an invocation to an external web service will

fail despite the call being made with proper arguments. At any point of time during

execution, if the business process is running then it is implicit that the system is in

a consistent state. Thus, if the process is active at the point of invocation it may be

assumed that it provides “correct” inputs to the callee web service. Now, probability

of a successful invocation can be formulated as below:

P (success′invoke) = P (success′invoke|proper inputs supplied)× P (proper inputs supplied)

= Rws × P (startinvoke) (3.9)

However, an invoke activity may have catch blocks and compensation handlers

attached to it and completion of an invoke activity would encompass their completion

too. We compute the improved probability of success P (success′′) (See Equation 3.16

of invoke after estimation of the number of faults that may handled through catch

blocks. We may write the expression for probability of success of an invoke activity

after incorporating the same for the attached compensation handler (see Equation

3.17) if there is any.

P (successinvoke) = P (success′′invoke)× P (successcompensationHandler) (3.10)


Scope

A scope provisions attachment of fault handlers, compensation handlers, event han-

dlers and termination handlers that together set the context within which the primary

activity of the scope may run. Reliability computation in scopes is done in the same

way as it is performed for the invoke activities. In the absence of any handler, the

probability of success of a scope is given by that of its child activity.

P (success′scope) = P (successscopeChild) (3.11)

The catch blocks inside a fault handler may help remove some of the faults, and

thereby improve the reliability of the scope to P (success′′scope). The following equation

models reliability of a scope in presence of fault handlers, compensation handlers and

event handlers.

P (successscope) = P (success′′scope)×P (successcompensationHandler)×P (successeventHandler)

(3.12)

The above equation when applied to the process (that is nothing but a special form

of scope) will give the reliability of the composite web service written in BPEL.

Fault Handler

The model assumes as input, the fraction of faults caught by each catch block

Fraction-Capture, out of the total number of faults produced by its scope. Note

that even a catchAll block may not have FractionCapture as 1 because there may

be uncaught semantic inconsistencies passed on. Now, the probability that a catch

block is started (after completion of its associated scope, successful or otherwise)

may be given by the probability that the associated invoke/scope fails and a fault

is captured by the catch. Here, we consider the conditional probability of success

of scope/invoke given it starts, in order to capture only those failures that originate

from the web service invocation or from within the child scope and then multiply it


with P (startscope/invoke).

P (startcatchi|startinvoke) = (1−Rws)× FractionCapturei

P (startcatchi|startscope) = (1− P (successscopeChild|startscope))× FractionCapturei

= (1− P (successscopeChild)/P (startscope))× FractionCapturei

P (startcatchi) = P (startcatchi

|startX)× P (startX)

where X = scope/invoke

(3.13)

The probability of successful completion of a catch block P (successcatch) is equal to

that of the activity nested in it. It may be noted that the child activity of a catch

may only have a Type A dependency (Type C dependencies are forbidden vide WS-

BPEL static analysis requirement SA00071) but it may have outgoing links. Now,

the fraction of faults removed FFR by all catch blocks may be computed as:

FractionFaultRemoval(FFR) =∑

∀i(FractionCapturei × P (successcatchi

)) (3.14)

The rate of faults thrown up by the web service invocation or the child scope may be

taken to be:

FaultRateX =

1−Rws if X is an invoke,

1− P (successscopeChild|startscope) if X is a scope(3.15)

After taking into account the fraction of faults removed, the improved probability

of the invoke / scope will stand as below:

P (success′′X) = P (success′X) + FaultRateX × FFR (3.16)

where, X may be invoke/scope


Compensation Handler

Compensation handlers are just a wrapper for an activity that are supposed to pro-

vide backward recovery. A compensation handler may be invoked through a compen-

sateScope or a compensate activity. Again, a compensation handler is only invoked

if the scope or the invoke being compensated is completed, i.e., the fault handlers

successfully catch all faults that may have occurred. Thus, probability that the com-

pensation handler start may be given by:

P (startCH) = P (successX)× P (successY )

where, X can be either compensateScope or compensate; Y can be either scope

or invoke. If the compensation handler is not invoked, there is no chance of any

failures emanating from it and hence the reliability may be taken to be 1. Now,

P (successcompensationHandler) may be computed as the weighted sum of the reliabili-

ties of the two cases - when it does execute and when it does not.

P (successcompensationHandler) = P (startCH)× P (successchild) + (1− P (startCH))× 1

(3.17)

Again, P (successchild) may be obtained independently of activities outside the com-

pensation handler because the child activity can have no Type C dependency (vide

WS-BPEL static analysis requirement SA00070)

Event Handler

In order to model QoS of event handlers one requires information about the number

of times an event may occur. For timer based events one may estimate the number

of times the timer goes off and the event completes within the lifetime of the scope

(see computation of start and end times for scopes in Section 3.5). For message based

events this information has to be derived from execution logs. The probability of

success of an event P (successevent) is given by the same for the scope that is enclosed

within it. The following equation throws some insight into how reliability modeling


may be performed for event handlers.

P (successeventHandler) =numEvents∏

i=1

P (successeventi)numOccuri (3.18)

where,

numEvents = number of events in the event handler,

numOccuri = number of occurrences of eventi

Our QoS model does not deal with the behavior where forced termination may

take place. Hence, termination handlers are outside the scope of this research.

3.4.2 Suppression of Join Failures

The entire exposition above assumes that a bpel : joinFailure is thrown whenever

a join condition is not satisfied. However, WS-BPEL allows skipping of activities

through Dead Path Elimination - a process where if a join condition evaluates to false

and the suppressJoinFailure attribute is set to yes, then the activity is skipped and

a false status is propagated on all its outgoing links with no fault generation. Since a

skipped activity is also deemed to be complete, the expression for the probability of

successful completion has to be suitably modified in such cases. Here the probability

of success of an activity is improved by the probability of the activity being skipped.

P (success′) = P (joinCondition = true)×P (success)+(1−P (joinCondition = true))

(3.19)

However, in the evaluation of P (link = true) (see Equation 3.1), the probability

of successful completion of the source will substituted by old P (success) and not

P (success′) because a failed status is propagated on each of the outgoing links of a

skipped activity.

3.5. RESPONSE TIME MODELING 43

3.5 Response Time Modeling

Response time is tracked by determining for each activity the expected time instant

when it starts and the expected time of its completion. Time is measured from the

start of the BPEL process, i.e., start time of the process scope is taken to be zero.

All time values are represented as random variables characterized by their means and

standard deviations.

Analogous to reliability modeling, here too we present a generalized algorithm for

computation of start times (ST) and then specifically design techniques to estimate

end times (ET) for each activity type. An activity may only start after all its Type

B and Type C dependencies have completed execution. In case the activity has a

Type A dependency, its parent activity should start before it does. The maximum of

all completion times of the Type B and Type C dependencies and start time of the

Type A dependency is thus taken to obtain the start time (ST) for an activity (see

Algorithm 3).

Note: In Algorithm 3, waiting times are taken into account only for those receive

activities that have their createInstance attribute set to no. Otherwise, the receive

activity is a start activity and marks the beginning of the business process. It may

be observed that all the start activities have a start time of zero. Scope, sequence and

flow are the only non-start activities allowed that may not have a control dependency

on start activities (vide WS-BPEL static analysis requirement SA00056) and thereby

they may also attain a zero start time.

Estimation of End Time (ET ) of an activity requires the start time of the activity

and the end times of all its children. Again, computation of End Time (ET ) varies

significantly from one activity to another and is detailed for each activity / scope /

handler in the next section.


Algorithm 3 findStartT ime(X) : Finding the Start Time (STX) for an activityX

Create an empty list TimeList of times each represented as (µ, σ)if X has a Type A dependency then

if STparent(X) is not known thenfindStartT ime(parent(X))

end ifAdd STparent(X) to TimeList

end ifif X has a Type B dependency then

if ETpredecessor(X)) is not known thenfindEndT ime(predecessor(X))

end ifAdd ETpredecessor(X) to TimeList

end ifif X has Type C dependency then

for all Y such that X has a Type C dependency on Y doif ETY is not known then

findEndT ime(Y )end ifAdd P (startY )× ETY to TimeList

end forend ifSTX ← Max of all elements in TimeListif X is a message based event in pick or a receive with createInstance = no then

STX ← STX + AverageWaitingT imeX

end if

Algorithm 4 findEndT ime(X) : Finding the End Time ETX for an activity X

findStartT ime(X)for all Z such that Z is a child of X do

if ETZ is not known thenfindEndT ime(Z)

end ifend for· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·Activity-Wise Computation of ETX

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·


3.5.1 Activity Wise Response Time Computation

Most basic activities (except invoke, receive and wait) may be treated as though they

complete instantaneously. Thus, their start times are equal to their end times.

ETX = STX (3.20)

where, X may be one of the following:

(a) receive

(b) reply

(c) assign

(d) throw

(e) rethrow

(f) exit

(g) empty

(h) compensate

(i) compensateScope

Wait

Wait activity is designed to either introduce a delay for given duration or wait until

a certain deadline is reached. In case an absolute deadline is specified, we need to

know the absolute time when our process starts so that we can obtain the relative

end time of completion as shown below.

ETwait =

STwait + duration , if for specifies delay,

deadline− Absolute Time of Start of Process , if until specifies deadline.

(3.21)


Sequence

A sequence is deemed to complete when the last activity enclosed in it ends. Thus,

we have:

ETsequence = ETlastChild (3.22)

Flow

As demonstrated in Section 3.4, the completion of all activities with no outgoing links

contained in a flow shall mark the end point of the flow. Thus, the end time of a flow

may be computed as below:

ETflow = Max∀i(ETsinki) (3.23)

If

The end time of an if activity will refer to its expected time of completion taking into

account the probabilities of execution of each of its branches. It is computed as the

weighted sum of the end times of all the branches, the weights being the probabilities

of taking the branches.

ETif =numBranches∑

i=1

P (branchTakeni)× ETbranchActivityi(3.24)

Pick

Pick activities are treated almost identically to if activities. Here, the weights refer

to the probability of occurrence of the concerned event.

ETpick =numEvents∑

i=1

P (eventOccursi)× ETeventActivityi(3.25)

Loops

The activity nested inside a loop cannot have any of three types of dependencies

(Type A and Type B are disallowed by definition, Type C not possible because of


static analysis requirement SA 00070). Thus, the start time of the child activity of

the loop will be evaluated to zero and its QoS computation performed as if it is the

primary activity of a BPEL process. The end times for while / repeatUntil loops are

obtained as:

ETloop = STloop + numIterations× ETloopChild (3.26)

ForEach allows each of its iterations being rolled out in parallel if the parallel

attribute is set to yes. In such a case, we simply take the end time of the forEach

loop to be equal to that of a single execution of the loop child assuming all the

unrolled loop branches executing in parallel to finish in more or less the same time.

Some degree of randomness is already accounted for because the times are represented

through random variables having a certain standard deviation.

ETforEach =

STforEach + numIterations× ETchild , if parallel = no

STforEach + ETchild , if parallel = yes(3.27)

Here, numIterations given by Equation 3.4.1.

Invoke

By setting appropriate parameters one may force the BPEL engine to implement time-

outs in synchronous web service calls made through Invoke activities. The fraction of

timeouts occurring in invocations of a particular web service may be obtained from

logs. Such an input enables us to write the expression for determining the expected

end time of a web service call.

ET ′invoke = STinvoke + (1− fractionT imeout)× Tws + fractionT imeout× TimeOut

(3.28)

where,


Tws = Total time taken by the web service call to return

TimeOut = Wait time before an invocation is timed out

Scope

Ignoring the attached handlers, the end time of a scope may be taken to be that of

its child activity.

ET ′scope = ETchildActivity (3.29)

Fault Handler

The fraction of faults captured by a catch block multiplied by the fault rate of the

enclosed scope/invoke gives the probability that the catch block will be executed.

The end time of a scope or an invoke can thus be incremented by the time taken for

execution of all catch blocks and may be estimated by the following expression:

ET ′′X = ET ′

X +Max∀i(fractionCapturei×FaultRate× (ETcatchi−STcatchi

)) (3.30)

where, X may be a scope or a invoke.

Compensation Handler

Compensation handlers may run concurrently with other activities of a scope and

other handlers. The start time of a compensation handler would be given by the

start times of compensate or compensateScope activities that are responsible for

invoking it. Again, the end time of the child activity inside a compensation handler

may be computed independently (taking its start time as zero) due to absence of

dependencies. The expected end time may be obtained as:

ETcompensationHandler = P (startCH)× (STcompensationHandler + ETchild) (3.31)


Event Handler

Event handlers also run concurrently along with other handlers and the primary

activity of the scope to which it is attached. All events are enabled when the parent

scope of the event handler starts. The event handler is disabled when the primary

activity of the scope ends but already running instances of events are allowed to

complete. The start time of the associated child scope of an event as computed by

Algorithm 3 refers to the time when the first instance of that event may be ready to

be fired. The time taken by each instance of an event to complete may be taken as:

Tevent = WaitingT ime + (ETchildScope − STchildScope)

ETevent = STevent + numOccurs× Tevent

In case of message based events, WaitingT ime is taken to be average waiting time

of the event obtained from execution logs and for timer based alarms it refers to the

value of the timer specified in the alarm. Now, end time for an event handler may be

computed as follows:

ETeventHandler = Max∀iETeventi (3.32)

Finally, we can write the equations to estimate the end times of invoke activities

and scopes.

ETinvoke = Max(ET ′′invoke, ETcompensationHandler)

ETscope = Max(ET ′′scope, ETcompensationHandler, ETeventHandler) (3.33)

3.5.2 Operations on Random Variables

In our model for response time, all times (ST and ET) are represented as random

variables characterized by their means and standard deviations. Here, we look at how

mean and standard deviation are computed for a resultant random variable that is

obtained as a function of other random variables. We list some of the operations that

are commonly used in the equations laid down in Section 3.5.1


(a) Addition of two random variables: If two random variables A and B are

added to get a third random variable R, the mean and standard deviation for

R are obtained as follows:

µR = µA + µB

σR =√

σ2A + σ2

B

(b) Addition of a random variable and a constant: If a constant t is added to

a random variable A, the mean and standard deviation for the resultant random

variable R are computed as:

µR = µA + t

σR = σA

(c) Multiplication of a random variable by a constant: If a random vari-

able A is multiplied by a constant t, the mean and standard deviation for the

resultant random variable R are computed as:

µR = t× µA

σR = | t | ×σA

(d) Weighted Sum of random variables: In case a weighted sum is performed

on a set of random variables (A1, A2, · · · , An) with the help of constant weights

(p1, p2, · · · , pn) that sum upto 1, we obtain mean and standard deviation for

the resultant random variable R as below:

µR =n∑

i=1

pi × µAi

σR =

√√√√n∑

i=1

p2i × σ2

Ai

3.6. COST MODELING 51

(e) Maximum of random variables: Unfortunately, there is no fixed expression

for obtaining mean and standard deviation for the maximum of a set of random

variables that may have any distribution. Even, if we assume all the variables

to be normally distributed - a good approximation for characterizing the be-

havior of response times which have inherent randomness, we cannot arrive at

a generalized expression that would work for any number of random variables.

Thus, we apply simulation over a large number of data points to estimate mean

and standard deviation of a random variable that may represent the maximum

of a set of normally distributed random variables with (µ, σ) being specified for

each. Although the resultant random variable does not follow an exact nor-

mal distrbution, it has been observed that there is a very small deviation of its

probability distribution from a standard bell shaped curve.

3.6 Cost Modeling

The cost model presented here gives the expected cost for each activity in the WS-

BPEL process, if it may be executed in a run of the business process. Cost modeling

is different than the models for response time or reliability in the sense that whilst

calculating the cost of an activity one is concerned only with costs of activities that

are nested under it. However, the model computes for each activity the probability

that its starts given its parent has already started. This probability, referred to as

PC, is used extensively in our model for aggregation of the costs of child activities in

order to estimate the expected cost of an activity.


PCX = P (startX |startparent(X))

=P (startX ∩ startparent(X))

P (startparent(X))

=P (startparent(X)|startX).P (startX)

P (startparent(X))

=P (startX)

P (startparent(X))Since, P (startparent(X)|startX) = 1

= P (depBX)× P (depCX) See Equation 3.2 (3.34)

The basic methodology followed in obtaining the expected cost of an activity is

calculating a weighted sum of costs of all child activities where the weights are given

by their PCs. Thus, for each activity we determine its expected cost if it starts

execution. The precondition for such a computation to happen is that the expected

costs of all the child activities are known.

3.6.1 Activity-Wise Cost Computation

Zero costs are associated with all basic activities except invoke. Thus, we detail out

the steps for cost determination only for structured activities, invoke, scopes and

handlers.

Sequence and Flow

For a sequence or a flow, the expected cost is simply a weighted sum of the expected

costs of all its activities with their PCs being the weights.

CostX =

|children(X)|∑i=1

PCchildi× Costchildi

(3.35)

where X may be a sequence or a flow


If and Pick

To determine the cost of if / pick activities, we take a weighted sum of costs of all

branches / events and the weights are given by the probabilities of selection of those

branches / events.

Costif =numBranches∑

i=1

P (branchTakeni)× CostbranchActivityi(3.36)

Costpick =numEvents∑

i=1

P (eventOccursi)× CosteventActivityi(3.37)

Loops

PC of the child activity of a loop will always evaluate to 1 because it cannot have

any dependencies. Thus, there is no scope of relaxation of the cost of the child of a

loop. Cost estimation in loops are handled as below:

Costloop = numIterations× CostloopChild (3.38)

Invoke

The cost of an invoke activity is computed as a sum of the cost of invocation of the web

service and the costs associated with the various catch blocks and the compensation

handler if they are present.

Costinvoke = Costws + Costcatches + CostcompensationHandler (3.39)

Scope

The cost of a scope is given as follows:

Costscope = Costchild + Costcatches + CostcompensationHandler + CosteventHandler (3.40)


Handlers

The cost of all catch blocks are summed up after taking into account the probability

with which they will execute.

Costcatches = FaultRate×∑

∀ifractionCapturei × CostcatchActivityi

(3.41)

The cost of a compensation handler is taken as the cost of its enclosed activity after

relaxation by its PC.

PCCH = P (startCH)/P (startinvoke/scope)

CostcompensationHandler = PCCH × CostchildActivity (3.42)

The cost an event handler of an event is determined by the sum of costs of all the

events multiplied by their number of occurrences.

CosteventHandler =numEvents∑

i=1

numOccuri × Costeventi (3.43)

Chapter 4

QoS Improvement with Fault

Tolerance

During orchestration of a web service composition, the designer may be unable to find

a web service for some task in the workflow that meets the reliability requirements. If

the task is central to the workflow, the impact on the reliability of overall web service

composition will be severe. Again, it may be possible that all web services available

to perform a certain task on the critical path in a workflow, show huge variations in

their response time (indicated by high standard deviations in our QoS model). In

such cases, fault tolerant constructs may be used to create dependable web services

out of undependable ones and attain the desired reliability and performance levels.

Ofcourse, keeping redundant implementations will mean incurring higher costs.

In this chapter, we present four conventional fault tolerance (FT) techniques,

namely, N-version programming, Recovery Blocks, Return Fastest Response and Dead-

line Mechanisms ; show how they may be implemented with the help of standard

WS-BPEL 2.0 and derive expressions that help in QoS determination of these FT-

constructs. The first two constructs focus on reliability improvement and the latter

two seek to enhance performance. The rest of chapter is organized as follows. For

each of the fault tolerance approaches, we list the scheme used in its WS-BPEL imple-

mentation, their QoS estimation formulae, followed by a demonstration of the utility

of the approach through the running example of the Passport Application Service

4.1. N-VERSION PROGRAMMING (NVP) 56

4.1 N-Version Programming (NVP)

Avizienis and Chen [6] defines N-version programming as the independent generation

of N ≥ 2 functionally equivalent programs, called “versions”, from the same initial

specification. In a N-version program, multiple implementations of a program having

diverse designs are invoked in parallel. After all of them complete execution, a voting

function decides the output of the N-version program. Traditionally, N-version pro-

grams use majority voting on some attribute of the output of the program. However,

voting mechanisms often turn out to complex (especially in inexact cases), require

considerable user intervention and are seldom automatically generated. Our N-version

framework assumes as input - multiple web services having the same functionality and

a voter web service that acts upon the outputs of the redundant services to supply

the result.

N-version programming may used to improve reliability of a task without com-

promising on time very much. Thus, it may be applied to tasks on the critical path

of a workflow where increase in response time is an issue. The N-version program

has to wait for all the implementations to be complete and also the additional voting

method, but it performs better with respect to response time as compared to recovery

blocks when the response times of services are close to each other. However, it may

prove to be more costly than other FT approaches.

4.1.1 WS-BPEL implementation

Given, a set of web services (A1, A2, . . . , An) offering the same functionality and

a voting web service, our N-version program in BPEL can be laid out as follows.

A1, A2, . . . , An are all invoked from inside a 〈flow〉. The voter web service is called

after the completion of the flow to obtain the result. Thus, the 〈flow〉 and the

〈invoke〉 for the voter web service are in a 〈sequence〉. The voter web service must

expect as input an array of the output type of the other web services and produce

an output of the same type as that of the redundant web services. All the 〈invoke〉activities have attached catch blocks to ensure that only outputs of successful web

4.1. N-VERSION PROGRAMMING (NVP) 57

services reach the voting web service.

4.1.2 QoS formulation

We denote the reliability, time and cost of a redundant web service Ai to be denoted by

(Ri, Ti, Ci). We consider the voting web service to perform majority voting and thus

it may work only if k ≥ 2 out of n services have completed successfully. We compute

the probability that at least two implementations succeed as below. (k denotes the

number of successful services, P (X) gives the probability that X is successful, P (X)

gives the probability that X fails)

P (k = 0) =n∏

i=0

(1−RAi)

P (k = 1) =n⋃

i=0

P (A1A2 . . . Ai . . . An)}

=n⋃

i=0

RAi× P (k = 0)

1−RAi

P (k ≥ 2) = 1− (P (k = 0) ∪ P (k = 1)) (4.1)

Now, the Rvoter will denote the conditional probability of success of the voting

web service provided atleast two redundant services have been successful. The voter

program requires at least two values to perform matching or majority voting.

P (voter) = P (voter|K ≥ 2)× P (k ≥ 2)

= Rvoter × P (k ≥ 2) (4.2)

Finally, reliability of the N-version program is given by the probability of success

of the voter service.

RNV P = P (voter) (4.3)

Time taken by a N-version program may be estimated as below.

TNV P = Tvoter + Max∀iTAi(4.4)

4.2. RECOVERY BLOCKS 58

Cost is simply a sum of the costs of all the redundant web services and the voter web

service.

CNV P =n∑

i=0

CAi(4.5)

4.2 Recovery Blocks

In a recovery block, the redundant components are executed sequentially. A comput-

ing element is run and an acceptance test or assertion check is applied to the result.

An alternative implementation is invoked only if the output fails the acceptance test.

There can be many redundant services lined up, each of which may be executed only

if all the services prior to it have failed the assertion test.

Recover block is used when the user has an order of preference amongst the various

services. The primary service is the choicest of all the redundant implementations

and thus is invoked first. A recovery block always costs less than an equivalent NVP

implementation with same services, assuming the costs of the assertion checker and

the voter to be comparable.

In our case, the redundant programs (A1, A2, . . . , An) in a recovery block as well

as the assertion checker (AC) are implemented as web services. Again, the framework

assumes as input the values of (R, T, C) for each of these web services.

4.2.1 WS-BPEL Implementation

All invocations of redundant web services along with the various calls made to the

assertion checker (after execution of each Ai) are in a 〈sequence〉. The alternative

implementation are conditionally invoked inside 〈if〉 activities if the implementations

prior in sequence have either failed to respond (error status set by an attached catch)

or failed the assertion test.

4.2. RECOVERY BLOCKS 59

4.2.2 QoS Formulation

In a recovery block, a service Ai is said to be successful when it returns successfully

and its output is passed by the assertion checker. Thus, the conditional probability

that a service succeeds given it starts execution may be written as:

P (Ai|StartAi) = RAi

×RAC

Since, a service in a recovery block may only start if preceding activity failed

after being invoked. The probability of start of a service Ai is given by the following

recurrence:

P (StartA1) = 1

P (StartAi) = P (StartAi−1

)× P ( ¯Ai−1|StartAi−1) if i > 1 (4.6)

where, P ( ¯Ai−1|StartAi−1) = 1− P (Ai−1|StartAi−1

)

Now, the probability of success of a service may be given as:

P (Ai) = P (Ai|StartAi)× P (StartAi

)

Finally, since the events of success of the redundant services are mutually exclusive,

we can write:

RRB = P (A1 ∩ A2 ∩ . . . ∩ An)

=n∑

i=0

P (Ai) (4.7)

Both Response Time and cost of a recovery block construct is estimated as a

weighted sum of the response times/costs of the several units of execution (including

time taken by the assertion checker in each case), the weights being the probabilities

of execution for the units, P(Start) (given by Equation 4.6 However, the time/cost

4.3. RETURN FASTEST RESPONSE 60

for assertion checker is incorporated only in case the service returns.

TRB =n∑

i=0

P (startAi)× (TAi

+ RAi× TAC) (4.8)

CRB =n∑

i=0

P (startAi)× (CAi

+ RAi× CAC) (4.9)

4.3 Return Fastest Response

The Return Fastest Response construct primarily aims to improve the performance

levels of a web service. All the redundant services are executed in parallel and the

first response obtained is chosen as the result of the FT-construct. Such a construct

is especially helpful if the redundant programs have comparable reliabilities and per-

formance is crucial.

4.3.1 WS-BPEL Implementation

The WS-BPEL implementation of this construct is the same as that proposed in

[17]. The set of redundant web services are invoked in parallel in a 〈flow〉 activity.

However, a 〈flow〉 activity completes on the completion of all the activities nested

inside it which will not serve our purpose. Thus, on completion of execution of one

activity we copy its result as output and forcibly terminate the 〈flow〉 activity to

stop all other executions of redundant services by throwing a fault that is caught by a

fault handler of the enclosing scope. These faults are simply ignored within the fault

handlers by placing an empty activity in the catch block.


In any invocation of a Return Fastest Response (RFR) block, the service that returns

within minimum time gets counted. Thus, in order to compute the reliability of such

a block, we require the probability that a service takes minimum time and happens

to be the first to return. The probability, P (FirstAi) that a web service Ai returns

4.4. DEADLINE MECHANISM 61

the first response in an execution of the RFR block is written as:

P (FirstAi) = P (TAi

= Min{TA1 , TA2 , . . . , TAn}) (4.10)

The reliability of the RFR block may be then computed as a weighted sum of

reliabilities, RAi, of all the constituent services.

RRFR =n∑

i=0

P (FirstAi)×RAi

(4.11)

Time taken by the RFR block is simply the minimum of the response times of all

services.

TRFR = Min∀i{TAi} (4.12)

Cost of the RFR block is the sum of costs of all services in it.

CRFR =n∑

i=0

CAi(4.13)

Note that time in our QoS model is represented as random variables and mini-

mum of a set of random variables is obtained through simulation. In Equation 4.10

and Equation 4.12 P (FirstAi) and Min{TAi

} are estimated by simulating the time

quantities as normally distributed random variables.

4.4 Deadline Mechanism

In case of time critical applications, a designer may want to impose hard deadlines for

completion of certain activities within the workflow. Deadline mechanisms support

setting deadlines for completion of tasks and provision forking off redundant services

for a task if a primary service does not complete within some specified length of time.

In our model, the user sets a hard deadline for completion of a task. If a output is

to be returned from the deadline mechanism block, then one of the constituent services

must return within the specified hard deadline. Moreover, the designer is allowed to

specify the time instants when the alternate implementations may be invoked if no

response is received from services that have been running.

4.4.1 WS-BPEL implementation

Deadline mechanisms can be incorporated with the help of the event handler construct

available in WS-BPEL. The primary web service is run inside a scope and the alternate

services are placed inside the event handler of the same scope with a timer based alarm

event, 〈onAlarm〉, associated with each alternate implementation. We ensure that

the response is returned as soon as it is received from a constituent service invocation,

in the same manner as in the Return Fastest Response construct. If a service returns

successfully we copy its response to the output and throw a fault that is caught by

an empty fault handler of the enclosing scope. In case as service fails we make the

scope to wait till the deadline is reached with the help of a wait activity inside the

catchAll block attached to the invoke activity for the service.


The model would require the hard deadline HD for the process and for each service

Ai in a deadline mechanism (DM) block, its Firing Time TFAiapart from the regular

inputs of (R, T, C). We denote the sum of the firing time and the response time

as T ′ for our purposes. In a similar analysis to that done in case of Return Fastest

Response, for each service in a DM block, we find the probability that it returns the

first response.

P (FirstAi) = P (T ′

Ai= Min{T ′

A1, T ′

A2, . . . , T ′

An} and T ′

Ai< HD) (4.14)

The reliability of the DM block is then computed as a weighted sum of reliabilities,

RAi, of all the constituent services in exactly the same way as in RFR.

RDM =n∑

i=0

P (FirstAi)×RAi

(4.15)


Time taken by the DM block is the minimum of the sums of response times and firing

times of all services.

TDM = Min∀i{T ′Ai} (4.16)

For cost estimation in deadline mechanism we find for each service the probability

that it is invoked inside the DM block. A service inside a DM block may start if a

successful response has not been generated by other services till the point of its firing.

P (StartAi) = P (TFAi

≥ Min{T ′A1

, T ′A2

, . . . , T ′An

, HD}) (4.17)

Cost of the DM block is a weighted sum of costs of all services in it, the weights

being the probabilities of start of the services.

CDM =n∑

i=0

P (StartAi)× CAi

(4.18)

Chapter 5

Implementation Details

We have implemented the QoS model for WS-BPEL 2.0 processes presented in the

last two chapters in a stand-alone software using Java 1.5. The QoS Calculator built

takes as input the values for reliability, time and cost for the constituent web services

and the various essential control flow parameters of the business process outlined

in Section 3.2.2. Also, for improving QoS through fault tolerant constructs it takes

WSDLs of several redundant web services and web services for voting and assertion

checking. In this chapter, we look at the utility of some of the important modules in

our implementation and describe the interactions that take place between them.

Figure 5.1 shows that the implementation may be organized under two major heads,

namely, BPEL parser and QoS Calculator. We discuss these two modules, explore the

components within them and explain the flow of data in and out of these modules.

5.1 BPEL Parser

The BPEL parsing unit is responsible for parsing the BPEL file and converting the

business process into an internal graph-like representation consisting of nodes which

may be either activities or scopes or handlers. Such a node is the central data structure

in the implementation as QoS computation happens at this level. We track all three

types of dependencies (we can treat them as different classes of directed edges) for

5.1. BPEL PARSER 65

Use r I n t e r f a c e

X M L P a r s e r W S D L P a r s e r

B P E L P a r s e r

A c t i v i t y G r a p h G e n e r a t i o n

D e p e n d e n c y M a n a g e r

B o o l e a nC o n d i t i o n P a r s e r

C r e a t e B P E LP r o j e c t F i l e s

S e r v i c e N o d e G e n e r a t i o n

N V P

R F R D M

R B

Re l iab i l i t y M o d e l e r

R e s p o n s e T i m e Mode le r

C o s t M o d e l e r R a n d o m V a r i a b l e S imu la to r

A c t i v i t y G r a p h a n n o t a t e d w i t h d e p e n d e n c i e sa n d c o n t r o l f l o w p a r a m e t e r s

S e r v i c e N o d e st a g g e d w i t h Q o S p a r a m e t e r s

Q o S C a l c u l a t o r

Fau l t To le ran tB P E L P r o j e c t F i l e s

E s t i m a t e d Q o S f o rO v e r a l l P r o c e s s + A c t i v i t y W i s e Q o S

S o u r c e B P E L P ro j ec t F i l es

C o n t r o l F l o w P a r a m e t e r s

Q o S V a l u e s f o r C o m p o n e n t W S

Figure 5.1: Block Diagram of Implementation

each node in the graph. Thus, the module takes the input BPEL file and produces a

graph structure out of it that clearly shows the dependencies between the activities,

scopes and handlers present in the BPEL process. The invoke nodes in the graph are

specially stored as service nodes and annotated with the values of the QoS parameters

for the web services being called. In case of redundant web services being introduced

they are also represented as service nodes. Further, several service nodes may be

suitably composed to form a fault tolerant structure which is composite service node.

Whilst adding fault tolerant constructs for various constituent web services, the user

is shown a listing of the service nodes already introduced and for each original external

web service he chooses one service node that may be a composite one. The QoS values

are also listed against every service node. Waiting times of incoming messages are

set against the nodes receiving them. Various other inputs (listed in Section 3.2.2)

are also appropriately stored in the data structures maintained for nodes and links.

Figure 5.1 shows the different sub-modules within the system. Here, we discuss

each of them briefly.

5.1. BPEL PARSER 66

(a) XML Parser:We make use of a Xerces implementation (Xerces2 Java Parser

2.9.1) of an XML-DOM parser for all our purposes. The DOM style of parsing

suits use better than SAX parsing because we need to keep the parsed DOM

tree in memory for a great length of time and frequently access widely separated

parts of the document at the same time.

(b) Workflow Graph Generation: Transforming the BPEL process alongwith

other inputs into a workflow data structure that may be easily tapped by the

QoS calculator is the key functionality rendered by the BPEL parsing unit.

The XML-DOM tree produced by the XML parser is traversed in order to build

the workflow graph structure. All activities / links / handlers for which one

would require additional inputs from the user are extracted and fed into the

User Interface module so that the user may be intimated about what all inputs

are to be supplied. For example, the User Interface is provided with operations,

portTypes and partnerLinks(required for uniquely identifying an invoke) for all

invoke activities so that it can seek QoS inputs from the user. The workflow is

generated after all forms of dependencies between nodes have been incorporated

into it.

(c) User Interface: The BPEL file is first loaded and passed onto the XML parser.

The workflow generation module provide a list of fields whose values will be

necessary for QoS computation and the user is required to enter these inputs

through a graphical interface. The inputs are passed back into the workflow

builder so that they may be properly maintained for future use.

(d) Dependency Manager: All dependencies in the BPEL process are properly

classified and maintained against the dependent nodes. Each link gets annotated

by the probability that its transition condition is evaluated to true (this is

obtained as an input).

(e) Boolean Condition Parser: A join condition defined for an activity/scope is

a boolean expression (containing boolean operators such as AND and OR) on

5.2. QOS CALCULATOR 67

incoming links. The expression is parsed from the BPEL file and converted into

a tree where each non-leaf vertex is an operator and leafs contain identifiers for

links. This expression tree is then transformed such that it represents a canon-

ical boolean expression (ORs of ANDs analogous to the sum of products form).

The probability of success of such an expression may be easily evaluated given

the probability of success of the links that are involved in it (See Evaluation of

Join Conditions in Section 3.3).

(f) Service Node Generation: This block provisions generation of BPEL imple-

mentations for each of the four fault tolerant constructs. It also documents how

the various redundant web services are arranged and which are the ones that

will feature in the final BPEL file to be created.

(g) WSDL Parser: WSDLs of the various redundant web services are taken as

input. They are parsed to show the user the various elements present in them

and allow him to select the operations, port types, bindings and addresses to

be used in the fault tolerant constructs.

(h) Creation of BPEL project files: The final service nodes are considered in

the production of the fault tolerant BPEL file. Also all WSDLs and deployment

descriptors are suitably enhanced to generate a BPEL project that is ready to

be deployed. We follow the directory structure laid out by the ActiveBPEL

engine and produce files that may be readily deployed using it.

5.2 QoS Calculator

The QoS calculator implements the various mathematical formulations presented in

the previous two chapters. It may be noted that reliability modeling must be carried

out before response times and costs may be estimated because the latter computations

make use of P (start) and P (success). All values that help in QoS computation viz.

P (start), P (success), ST , ET , PC and Cost are evaluated and stored at each node

in the workflow graph. All QoS computations for the fault tolerant constructs are

5.2. QOS CALCULATOR 68

performed at the service node level. Figure 5.1 also shows a module for performing

simulations to compute the maximum, minimum etc. for a set of random variables

used to represent time values. A random variable is simulated by a set of 100000 data

points having the specified mean and standard deviation.

Chapter 6

Conclusions

A comprehensive QoS model has been presented in this work. The model is capable

of estimating QoS of arbitrarily complex structures that may be written through WS-

BPEL. To the best of our knowledge, no QoS estimation technique exists in literature

for graph based flow languages. As argued in previous chapters, most QoS modeling

frameworks are customized for structured programming languages and thus support

only a handful of workflow patterns. In presence of goto-like links that may extend

from within one structured construct into another, QoS modeling for an activity has

to involve tracking QoS of more activities than just those which are nested within it.

This work should also come as the first attempt that lends QoS determination support

in presence of event driven programming, fault handling and backward recovery.

Unlike some other approaches in literature, our model does not call for transfor-

mation of BPEL processes to formalizations which have ratified QoS determination

frameworks. Instead, it offers a direct approach that is firmly grounded in probability

theory. The model tries to exhaustively cover all aspects of BPEL and can scale very

well in face of increasing complexity in BPEL processes.

The QoS calculator provides values for reliability, response time and cost for each

activity / scope / handler. This feature enables the designer to track the critical

parts of the program more closely. For example, the end times for each activity allows

70

the user to detect the activity that might be a possible performance bottleneck for

the composite process. Again low reliabilities for certain activities may prompt the

application of redundancies and fault tolerance constructs. The designer may choose

to modify parts of the BPEL process workflow (differently organize the control flow or

add/remove fault handlers and compensation handlers etc.) and then check whether

the QoS has improved. The model will aid the integrator to improve upon QoS by

making appropriate changes to the bindings for constituent web services.

The QoS tool also provisions increasing dependability of the web service com-

position through fault tolerant constructs. All fault tolerant constructs have been

implemented in a manner that is compliant with WS-BPEL 2.0 standard. Again, our

work becomes the first to introduce the notion of enforcing hard deadlines through

the deadline mechanism constructs in web services compositions. However, the QoS

computation of a fault tolerant BPEL file cannot be performed in an uniform way

through our QoS model. In order to detect the presence of redundant services inside

a BPEL file will require to track data dependencies in the BPEL file. It may be

noted that our QoS determination model for WS-BPEL only tracks control depen-

dencies. Finding fault tolerance structures inside a BPEL file through static analysis

of the BPEL code remains a challenge and provides scope for future work. Our QoS

tool may be utilized to create highly dependable web service compositions out of

undependable web services.

References

[1] Aalst, W. M. P. V. D., Hofstede, A. H. M. T., Kiepuszewski, B., andBarros, A. P. Workflow Patterns. Distrib. Parallel Databases 14, 1 (2003),5–51.

[2] Aggarwal, R., Verma, K., Miller, J., and Milnor, W. Constraintdriven web service composition in METEOR-S. In SCC ’04: Proceedings of the2004 IEEE International Conference on Services Computing (Washington, DC,USA, 2004), IEEE Computer Society, pp. 23–30.

[3] Amsden, J., Gardner, T., Griffin, C., and Iyengar, S. Draft UML 1.4profile for automated business processes with a mapping to BPEL 1.0, 2004.

[4] Anderson, T., and Kerr, R. Recovery blocks in action: A system supportinghigh reliability. Proceedings of the 2nd international conference on Softwareengineering (1976), 447–457.

[5] Avizienis, A. The N-Version Approach to Fault-Tolerant Software. IEEETrans. Softw. Eng. 11, 12 (1985), 1491–1501.

[6] Avizienis, A., and Chen, L. On the implementation of N-version program-ming for software fault tolerance during execution. Proc. IEEE COMPSAC 77(1977), 149–155.

[7] Baresi, L., and Guinea, S. Towards Dynamic Monitoring of WS-BPELProcesses. Proceedings of the 3rd International Conference on Service OrientedComputing (2005).

[8] Berbner, R., Spahn, M., Repp, N., Heckmann, O., and Steinmetz, R.Heuristics for QoS-aware web service composition. In ICWS ’06: Proceedingsof the IEEE International Conference on Web Services (Washington, DC, USA,2006), IEEE Computer Society, pp. 72–82.

[9] Canfora, G., Penta, M. D., Esposito, R., and Villani, M. L. Anapproach for QoS-aware service composition based on genetic algorithms. InGECCO ’05: Proceedings of the 2005 conference on Genetic and evolutionarycomputation (New York, NY, USA, 2005), ACM, pp. 1069–1075.

[10] Cappiello, C., Pernici, B., and Plebani, P. Quality-agnostic or quality-aware semantic service descriptions?

REFERENCES 72

[11] Cardoso, A. J. S. Quality of Service and Semantic Composition of Workflows.PhD thesis, University of Georgia, Athens, Georgia, 2002.

[12] Chen, L., and Avizienis, A. N-Version Programming: A Fault-ToleranceApproach to Reliability of Software Operatlon. Fault-Tolerant Computing,1995,’Highlights from Twenty-Five Years’., Twenty-Fifth International Sympo-sium on (1995).

[13] Clark, D. D., Shenker, S., and Zhang, L. Supporting real-time applica-tions in an integrated services packet network: Architecture and mechanism. InSIGCOMM (1992), pp. 14–26.

[14] Cruz, R. L. Quality of service guarantees in virtual circuit switched networks.IEEE Journal on Selected Areas in Communications 13, 6 (1995), 1048–1056.

[15] CSENKI, A., Square, N., and London, E. Recovery Block reliabilityanalysis with Failure Clustering. Dependable Computing for Critical Applications(1991).

[16] D’Ambrogio, A., and Bocciarelli, P. A model-driven approach to describeand predict the performance of composite services. In WOSP ’07: Proceedingsof the 6th international workshop on Software and performance (New York, NY,USA, 2007), ACM, pp. 78–89.

[17] Dobson, G. Using WS-BPEL to implement Software Fault Tolerance for WebServices. In EUROMICRO ’06: Proceedings of the 32nd EUROMICRO Con-ference on Software Engineering and Advanced Applications (Washington, DC,USA, 2006), IEEE Computer Society, pp. 126–133.

[18] Dobson, G. Using WS-BPEL to Implement Software Fault Tolerance for WebServices. Proceedings of the 32nd EUROMICRO Conference on Software Engi-neering and Advanced Applications (2006), 126–133.

[19] Dobson, G., Hall, S., and Sommerville, I. Dependable Service En-gineering: A Fault-tolerance based Approach. Submitted to ACM Trans-actions on Software Engineering and Methodology, http://digs. sourceforge.net/papers/2005 tosem ftc. pdf (2005).

[20] Dugan, J., and Lyu, M. System reliability analysis of an N-version program-ming application. Reliability, IEEE Transactions on 43, 4 (1994), 513–519.

[21] Ege, M., Eyler, M., and Karakas, M. Reliability analysis in N-versionprogramming with dependent failures. Euromicro Conference, 2001. Proceedings.27th (2001), 174–181.

[22] Frolund, S., and Koistinen, J. Quality-of-service specification in distributedobject systems. IOP/BCS Distributed Systems Engineering Journal (December1998). to appear.

REFERENCES 73

[23] Gorbenko, A., Kharchenko, V., Popov, P., Romanovsky, A., andBoyarchuk, A. Development of Dependable Web Services out of UndependableWeb Components. School of Computing Science, University of Newcastle uponTyne CS-TR-863 (2004).

[24] G.Rommel. Simplicity Wins: How Germanys Mid-Sized Industrial CompaniesSucceed,. Harvard Business School Press, Boston, 1995.

[25] G.Stalk, and T.M.Hout. Competing Against Time: How Time-Based Com-petition is Reshaping Global Markets. Free Press, New York, 1990.

[26] Hillier, F. S., and Lieberman, G. J. Introduction to Operations Research,7th ed. Mc-Graw Hill, 2002.

[27] Hoyland, A., and Rausand, M. System Reliability Theory: Models andStatistical Methods. John Wiley and Sons, 1994.

[28] Hwang, S.-Y., Wang, H., Tang, J., and Srivastava, J. A probabilisticapproach to modeling and estimating the QoS of web-services-based workflows.Inf. Sci. 177, 23 (2007), 5484–5503.

[29] International Standards Organization. Model for quality assurance, iso9000:1987 ed., 1987.

[30] International Standards Organization. Information Technology - Soft-ware Product Evaluation - Quality Characteristics and Guidelines for their Use,iso/iec is 9126 ed., 1991.

[31] International Telecommunication Union (ITU). Terms and definitionsrelated to quality of service and network performance including dependability,ITU recommendation e.800 ed., 1994.

[32] International Telecommunication Union (ITU). Communications Qual-ity of Service: A framework and definitions, ITU recommendation g.1000 ed.,2001.

[33] Ireson, W., Jr., C., and Moss, R. Y. Handbook of Reliability Engineeringand Management. McGraw Hill, New York, 1996.

[34] Jaeger, M., and Ladner, H. Improving the QoS of WS Compositions Basedon Redundant Services. Next Generation Web Services Practices, 2005. NWeSP2005. International Conference on (2005), 189–194.

[35] Jaeger, M. C., Rojec-Goldmann, G., and Muhl, G. QoS Aggregation forWeb Service Composition using Workflow Patterns. In EDOC ’04: Proceedingsof the Enterprise Distributed Object Computing Conference, Eighth IEEE Inter-national (Washington, DC, USA, 2004), IEEE Computer Society, pp. 149–159.

[36] Jalote, P. Software Design Faults. John Wiley and Sons Ltd, 1994, ch. FaultTolerance in Distributed Systems, pp. 355–396.

REFERENCES 74

[37] Leymann, F. Web Services Flow Language (WSFL 1.0). IBM, May 2001.

[38] Looker, N., and Munro, M. WS-FTM: A Fault Tolerance Mechanism forWeb Services. University of Durham, Technical Report 19 (2005).

[39] Looker, N., and Xu, M. Increasing Web Service Dependability Through Con-sensus Voting. Computer Software and Applications Conference, 2005. COMP-SAC 2005. 29th Annual International 2 (2005).

[40] Ludwig, H. Web services QoS: external SLAs and internal policies or: how dowe deliver what we promise? Web Information Systems Engineering Workshops,2003. Proceedings. Fourth International Conference on (2003), 115–120.

[41] Maximilien, E. M., and Singh, M. P. Conceptual model of web servicereputation. SIGMOD Rec. 31, 4 (2002), 36–41.

[42] Menasce, D. Composing Web Services: A QoS View. IEEE INTERNETCOMPUTING (2004), 88–90.

[43] Menasce, D. A. QoS issues in web services. IEEE Internet Computing 6, 6(2002), 72–75.

[44] Musa, J., Iannino, A., and Okumoto, K. Software reliability: measure-ment, prediction, application. 1987.

[45] Purtilo, J. M., and Jalote, P. An Environment for Developing Fault-Tolerant Software. IEEE Trans. Softw. Eng. 17, 2 (1991), 153–159.

[46] Randell, B. System structure for software fault tolerance. In Proceedings ofthe international conference on Reliable software (New York, NY, USA, 1975),ACM, pp. 437–449.

[47] Randell, B., and Xu, J. The Evolution of the Recovery Block Concept.Software Fault Tolerance (1995), 1–22.

[48] Reiter, T. Transformation of web service specification languages into UMLactivity diagrams. Diploma thesis.University of South Australia, 2005.

[49] Rud, D., Schmietendorf, A., and Dumke, R. Performance Modeling ofWS-BPEL-Based Web Service Compositions. Services Computing Workshops,2006. SCW’06. IEEE (2006), 140–147.

[50] Sherchan, W., Loke, S. W., and Krishnaswamy, S. A fuzzy model forreasoning about reputation in web services. In SAC ’06: Proceedings of the 2006ACM symposium on Applied computing (New York, NY, USA, 2006), ACM,pp. 1886–1892.

[51] Tartanoglu, F., Issarny, V., Romanovsky, A., and Levy, N. Coordi-nated Forward Error Recovery for Composite Web Services. Proceeding of the22nd International Symposium on Reliable Dependable Systems, SRDS 2003 .

REFERENCES 75

[52] Thatte, S. XLANG: Web Services for Business Process Design. Microsoft,2001.

[53] Thatte, S. BPEL4WS: Business Process Execution Language for Web ServicesVersion 1.1. BEA, IBM, Microsoft, SAP and Siebel, May 2003.

[54] Tian, M., Gramm, A., Naumowicz, T., Ritter, H., and Freie, J. Aconcept for QoS integration in Web services. Web Information Systems Engineer-ing Workshops, 2003. Proceedings. Fourth International Conference on (2003),149–155.

[55] Tian, M., Gramm, A., Ritter, H., and Schiller, J. Efficient selectionand monitoring of QoS-aware web services with the WS-QoS framework. In WI’04: Proceedings of the 2004 IEEE/WIC/ACM International Conference on WebIntelligence (Washington, DC, USA, 2004), IEEE Computer Society, pp. 152–158.

[56] Tian, M., Gramm, A., Ritter, H., and Schiller, J. Efficient Selectionand Monitoring of QoS-Aware Web Services with the WS-QoS Framework. WebIntelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Con-ference on (2004), 152–158.

[57] van der Aalst, W. Don’t go with the flow: Web services composition stan-dards exposed, 2003.

[58] Wang, G., Wang, C., Chen, A., Wang, H., Fung, C., Uczekaj, S.,Chen, Y.-L., Guthmiller, W. G., and Lee, J. Service level managementusing QoS monitoring, diagnostics, and adaptation for networked enterprise sys-tems. In EDOC ’05: Proceedings of the Ninth IEEE International EDOC En-terprise Computing Conference (Washington, DC, USA, 2005), IEEE ComputerSociety, pp. 239–250.

[59] Wohed, P., van der Aalst, W. M., Dumas, M., and ter Hofstede,A. H. Analysis of web services composition languages: The Case of BPEL4WS,2003.

[60] Wu, B., Chi, C.-H., and Xu, S. Service Selection Model Based on QoSReference Vector. IEEE Congress on Services (2007), 270–277.

[61] Zeng, L., Benatallah, B., Ngu, A. H., Dumas, M., Kalagnanam, J.,and Chang, H. QoS-aware middleware for web services composition. IEEETrans. Softw. Eng. 30, 5 (2004), 311–327.

[62] Zeng, L., Lei, H., and Chang, H. Monitoring the QoS for web services.Service-Oriented Computing - ICSOC 2007 (2007), 132–144.

[63] Zhou, C., Chia, L., and Lee, B. DAML-QoS ontology for Web services. WebServices, 2004. Proceedings. IEEE International Conference on (2004), 472–479.

REFERENCES 76

[64] Zinky, J. A., Bakken, D. E., and Schantz, R. E. Architectural supportfor quality of service for CORBA objects. Theory and Practice of Object Systems3, 1 (1997).

qos in ws-bpel processesto ascertain qos for bpel processes. in this thesis, we illustrate the...

Documents