[acm press the 12th international conference - paris, france (2010.11.08-2010.11.10)] proceedings of...

8
A Sampling-Based Approach to Identify QoS for Web Service Orchestrations Eduardo Blanco [email protected] Yudith Cardinale [email protected] María-Esther Vidal [email protected] Universidad Simión Bolívar Departamento de Computación y T.I. Apartado 89000, Caracas 1080-A, Venezuela ABSTRACT QoS parameters are used to describe services in terms of their behavior and can be used to rank services according to non-functional criteria. To provide an accurate charac- terization of the quality of a service, we propose a sampling- based technique. The proposed technique uses Adaptive and Sequential Sampling strategies to estimate the QoS parame- ters that satisfy the required confidence levels while the size of the sample remains small. QoS estimates are used by a hybrid composer, named PT-SAM, to identify the service compositions that satisfy a functional condition and best meet non-functional criteria of a user query. PT-SAM adapts a Petri-Net unfolding algorithm to find a desired marking from an initial state by using a utility function defined on QoS estimates and functional properties of the available ser- vices. PT-SAM uses a QoS-based utility function to guide the search into portions of good quality service composi- tions; thus, PT-SAM is able to scale up to large-scale search spaces of services. We report on the quality of the sampling techniques and the performance of the composer. First, we show correlation between the estimates and the real values of the QoS parameters; then, we report on the benefits of us- ing these estimates to traverse large search spaces of service compositions (e.g., in the range of 1,000 to 100,000 services). Our experiments show that the quality of the compositions identified by our algorithm is close to the optimal solution produced by the exhaustive algorithm. Categories and Subject Descriptors D.2.8 [Software]: Metrics—performance measures ; G.2.2 [Graph Theory]; G.3 [Probability and Statistics]; H.2.6 [Database Management]: Database Machines Keywords Web Service Composition, Query Optimization, Semantic Matching, Estimation Techniques, QoS Estimation Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. iiWAS2010 8-10 November, 2010, Paris, France. Copyright 2010 ACM 978-1-4503-0421-4/10/11 ...$10.00. 1. INTRODUCTION In the context of Service-Oriented Architectures (SOA) [12], Web Services (WSs) support registration, routing, manage- ment, and interoperability of independent and autonomous applications. SOA relies on policies, practices and frame- works to ensure that the appropriate services are retrieved; it provides the basis for software reusability and interop- erability of heterogeneous applications. This heterogeneity is associated with diverse functionalities (e.g., ticket pur- chase, payment) and with different QoS values (e.g., re- sponse time, cost, reliability, throughput, or trust). This variability, caused by multiple distributed services deliver- ing the same functionality, must be adequately managed in order to ensure efficient implementations of service compo- sitions that satisfy complex user queries. In SOA, services are usually described in terms of func- tional and non-functional properties. Service functionality is defined based on input and output parameters, and pre- and post-conditions; QoS parameters specify non-functional properties and are used to rank services according to non- functional criteria. Services with the highest quality will produce more effi- cient implementations; thus, precise estimates of the QoS that describe a given set of services are required. However, it is not always possible to know the information of QoS pa- rameters beforehand. In this paper, we propose a sampling- based solution to accurately estimate the values of the QoS parameters of WSs. Sampling techniques have been successfully applied as the basis for a variety of approximate techniques in cost-based query optimization [13,15,22,23,27]. The challenge of these methods is to reach estimates that meet a given confidence level while sample size remains small. We propose two metric dependent sampling techniques based on Adaptive Sampling and Sequential Sampling [13, 15, 23], to accurately estimate the QoS values of available services, e.g., execution time and cardinality. These sampling- based techniques are supported by a well-developed statis- tical theory that establishes a bound on the sampling stop conditions which ensures that the quality of the solution is within a given level of confidence. In the context of WS compositions, QoS estimates are combined by a method called QoS aggregation to compute an aggregated QoS value. These combined values are used to verify whether a composition of services satisfies the QoS requirements of the user request [7, 16, 17, 25, 30, 31]. In addition to the sampling technique, we also propose a hybrid solution to identify WS Compositions that takes ad- Web Services 1 iiWAS2010 Proceedings 25

Upload: maria-esther

Post on 01-Feb-2017

214 views

Category:

Documents


0 download

TRANSCRIPT

A Sampling-Based Approach to Identify QoS for WebService Orchestrations

Eduardo [email protected]

Yudith [email protected]

María-Esther [email protected]

Universidad Simión BolívarDepartamento de Computación y T.I.

Apartado 89000, Caracas 1080-A, Venezuela

ABSTRACTQoS parameters are used to describe services in terms oftheir behavior and can be used to rank services accordingto non-functional criteria. To provide an accurate charac-terization of the quality of a service, we propose a sampling-based technique. The proposed technique uses Adaptive andSequential Sampling strategies to estimate the QoS parame-ters that satisfy the required confidence levels while the sizeof the sample remains small. QoS estimates are used bya hybrid composer, named PT-SAM, to identify the servicecompositions that satisfy a functional condition and bestmeet non-functional criteria of a user query. PT-SAM adaptsa Petri-Net unfolding algorithm to find a desired markingfrom an initial state by using a utility function defined onQoS estimates and functional properties of the available ser-vices. PT-SAM uses a QoS-based utility function to guidethe search into portions of good quality service composi-tions; thus, PT-SAM is able to scale up to large-scale searchspaces of services. We report on the quality of the samplingtechniques and the performance of the composer. First, weshow correlation between the estimates and the real valuesof theQoS parameters; then, we report on the benefits of us-ing these estimates to traverse large search spaces of servicecompositions (e.g., in the range of 1,000 to 100,000 services).Our experiments show that the quality of the compositionsidentified by our algorithm is close to the optimal solutionproduced by the exhaustive algorithm.

Categories and Subject DescriptorsD.2.8 [Software]: Metrics—performance measures; G.2.2[Graph Theory]; G.3 [Probability and Statistics]; H.2.6[Database Management]: Database Machines

KeywordsWeb Service Composition, Query Optimization, SemanticMatching, Estimation Techniques, QoS Estimation

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.iiWAS2010 8-10 November, 2010, Paris, France.Copyright 2010 ACM 978-1-4503-0421-4/10/11 ...$10.00.

1. INTRODUCTIONIn the context of Service-Oriented Architectures (SOA) [12],

Web Services (WSs) support registration, routing, manage-ment, and interoperability of independent and autonomousapplications. SOA relies on policies, practices and frame-works to ensure that the appropriate services are retrieved;it provides the basis for software reusability and interop-erability of heterogeneous applications. This heterogeneityis associated with diverse functionalities (e.g., ticket pur-chase, payment) and with different QoS values (e.g., re-sponse time, cost, reliability, throughput, or trust). Thisvariability, caused by multiple distributed services deliver-ing the same functionality, must be adequately managed inorder to ensure efficient implementations of service compo-sitions that satisfy complex user queries.

In SOA, services are usually described in terms of func-tional and non-functional properties. Service functionalityis defined based on input and output parameters, and pre-and post-conditions; QoS parameters specify non-functionalproperties and are used to rank services according to non-functional criteria.

Services with the highest quality will produce more effi-cient implementations; thus, precise estimates of the QoSthat describe a given set of services are required. However,it is not always possible to know the information of QoS pa-rameters beforehand. In this paper, we propose a sampling-based solution to accurately estimate the values of the QoSparameters of WSs.

Sampling techniques have been successfully applied as thebasis for a variety of approximate techniques in cost-basedquery optimization [13,15,22,23,27]. The challenge of thesemethods is to reach estimates that meet a given confidencelevel while sample size remains small.

We propose two metric dependent sampling techniquesbased on Adaptive Sampling and Sequential Sampling [13,15, 23], to accurately estimate the QoS values of availableservices, e.g., execution time and cardinality. These sampling-based techniques are supported by a well-developed statis-tical theory that establishes a bound on the sampling stopconditions which ensures that the quality of the solution iswithin a given level of confidence.

In the context of WS compositions, QoS estimates arecombined by a method called QoS aggregation to computean aggregated QoS value. These combined values are usedto verify whether a composition of services satisfies the QoSrequirements of the user request [7,16,17,25,30,31].

In addition to the sampling technique, we also propose ahybrid solution to identify WS Compositions that takes ad-

Web Services 1 iiWAS2010 Proceedings

25

vantage of search meta-heuristic techniques to consider func-tional conditions expressed as input and output attributes,and non-functional constraints represented by a set of QoSparameters and their permissible values. Functional andnon-functional requirements are considered at the same timeto compute, in relatively short time, ”good” service com-positions; goodness is measured in terms of the combina-tion of functional and non-functional degrees of satisfaction.We use Petri Nets as a formalism to model services, theirdependencies and WS compositions. The cost-based com-poser, named PT-SAM, adapts a Petri Net unfolding algo-rithm to identify orderings of WS compositions, the unfold-ing method is guided by the QoS-based utility function de-fined on the QoS estimates, to prune the space of possibili-ties, while maximize the execution quality parameter valuesand meet functional requirements. We report on the qualityof the solutions identified by the proposed techniques. Ourexperiments show a linear relationship between estimatesand real values; also, we can observe that the quality of theidentified compositions is close to the quality of the optimalsolution.

To summarize, our contributions are the following:

• Sampling-based techniques to accurately estimate QoSparameters.

• Hybrid metrics to estimate QoS values of atomic andcomposite WSs.

• A hybrid composer that relies on information aboutthe functionality of the services and an aggregation ofQoS estimates, to guide the search toward the spaceof compositions that best meet the given functionaland non-functional user requirement. The techniqueproposed in [7, 8] is enhanced with QoS estimates toimprove the quality of the selected WS compositions.

The remainder of the paper is structured as follows. Westart summarizing related approaches. Then, we define ourapproach in section 3. An experimental study is reported insection 4, and section 5 outlines our conclusions and futurework.

2. RELATED WORKIn this section we briefly describe related approaches in

the areas of QoS estimation and service composition.

2.1 Sampling-based Techniques to Estimate QoSParameters

Al-Masri et al. [1,2] propose a Web Service Crawler Engine(WSCE) that describes services in terms of 13 QoS param-eters. The QWS dataset1 has been created by WSCE; it iscomprised of 5,000 WSs. WSCE is able to discover servicesand estimate their QoS parameters by continuously moni-toring them over a six-day period; the estimates representaverages of the measurements collected during the observedperiod from a given third party site. Thus, the predictedQoS parameters are particular to the site from which theservices were invoked, and estimates of parameters such asperformance and response time might not precisely describethe behavior of a service when it is invoked from a differentserver. Malak et. al. [24] propose a multi-agents solution

1http://www.uoguelph.ca/∼qmahmoud/qws/index.html

that uses a Neural network to predict WS QoS althoughthis technique is general and the prediction can be done in aparticular site, there is not guarantee of the confidence levelof the predicted values.

In the literature, sampling techniques have been success-fully applied as the basis for a variety of approximate tech-niques. For example, in the context of query optimization,different sampling-based algorithms have been proposed toefficiently estimate the cardinality of a query [13, 15, 22, 23,27]. The challenge of these methods is to reach estimatesthat satisfy the required confidence levels, while the size ofthe sample remains small. A key decision involves whento stop sampling the population; this is determined by themean and variance of the sample in comparison to the tar-get population, and there are different methods to reachthis decision. In [23], mean and variance are approximatedusing upper bounds which are defined in terms of the cardi-nality constraints that characterize the relationships amongthe objects in the population to be sampled. The samplingtechnique proposed described in [13,15], does not define anupper bound for these statistics; in contrast, mean and vari-ance are estimated on-the-fly. In [15], mean and variance arecomputed from a small portion of the data, which is sam-pled in the first stage; if the variance is low, this approachmay reach relatively good estimates. Finally, in [13], meanand variance are recomputed during each iteration of thesampling; the last two techniques are able to reach betterestimates but require, in general, more time for sampling.

In this paper, building on related work of sampling-basedtechniques, we devise a QoS parameter prediction tool anduse strategies proposed in [23] and [13], to accurately esti-mate the QoS values of the available services by only sam-pling the amount of data needed to reach good levels of accu-racy. Additionally, we define a quality metric that estimatesthe QoS parameter values of an execution plan to resolve auser query by aggregating the QoS parameter values of theservices that comprise the plan.

2.2 The QoS-aware Service Selection ProblemThe problem of identifying a coordinated set of concrete

services that instantiate a given abstract workflow is knownas the QoS-aware service selection problem, and it has beenshown to be NP-hard [30]; a survey of existing approachescan be found in [14, 21]. This problem is a combinatorialoptimization problem and several heuristics have been pro-posed to find a relatively efficient solution in a reasonablyshort period of time [5,6,11,18,26,28]. Ko et al. [18] proposea constraint-based approach that encodes the non-functionalpermissible values as a set of constraints whose violationneeds to be minimized; to traverse the space of possibly op-timal solutions, a hybrid algorithm that combines the tabusearch and simulating annealing meta-heuristics is imple-mented. In [11] a restriction of the service selection problemis encoded as a Linear Programming problem providing ascalable solution. Barreiro et al. [4] describe a two-fold so-lution to compose services in terms of functional and non-functional requirements. In the first step a planning algo-rithm is performed and services that meet the functionalrequest are identified; in the second step, QoS parametersare considered to select the services that best meet the non-functional requirements among the ones chosen in the firststep. Recently, two new planning-based approaches havebeen proposed. Kuter et al. [19] extend the SHOP2 plan-

iiWAS2010 Proceedings Web Services 1

26

ning algorithm to select the trustworthy composition of ser-vices that implements a given OWL-S process model, whileSohrabi et al. [29] develop a HTN planning-based solutionwhere user preference metrics and domain regulations areused to guide the planner into the space of relevant com-positions. Finally, a genetic-based algorithm to identify thecomposition of services that best meet the quality criteriafor a set of QoS parameters is presented in [20]. Althoughthese solutions are able to efficiently solve the optimizationproblem and scale up to a large number of abstract pro-cesses, they are not suitable to identify the set of servicesthat satisfy a given user request comprised of functionaland non-functional requirements. In contrast, we proposea hybrid approach that receives and considers at the sametime, a functional user request expressed by input and out-put attributes and non-functional conditions represented bya set of QoS parameter and their permissible values; thus,the returned composition satisfies the functional request andmeets the non-functional restrictions.

Finally, in [7], we define two algorithms to identify WScompositions. These algorithms follow different strategiesto prune the space of possibilities, while maximize the ex-ecution quality parameter values and meet non-functionalrequirements. We also introduce the principles for the aggre-gation of QoS in a composition to deliver upper and lowerboundaries. This technique is used by both algorithms toguide them toward the space of good compositions. In thispaper, we enhance the quality of our previous defined ap-proach and define a Petri Net based solution named PT-SAM;in addition, we propose a sampling-based technique to es-timate QoS parameters and a cost-based approach to iden-tify service compositions. Our approach takes advantage ofthe estimates and search meta-heuristics techniques to effi-ciently identify the service compositions that best meet thefunctional and non-functional conditions expressed in a userrequest.

3. OUR APPROACHIn this section we define our proposed framework. First,

we formally define a query, a service graph, and the condi-tions to be satisfied by a composition of services that meet auser request; based on these definitions we formalize theWebService Composition Problem-(WSC). Then, we describe thesampling technique to estimate QoS parameters of services,and a predictive model to approximate an aggregated valueof the QoS parameters that characterizes a composition ofservices. Finally, we present the PT-SAM algorithm.

3.1 The Web Service Composition Problem-Framework

We assume that the relationships among existing servicesand attributes are given in an input dependency graph G.Thus, given a Query Q, our problem is to generate an Execu-tion Plan G� that satisfies the functional and non-functionalcriteria. To ensure functionality, nodes in G� correspond tothe services in G required to evaluate Q, and edges indicateservice execution dependencies, i.e., edges induce a partialorder of the services that guarantees the satisfaction of theinput restrictions of each selected service. Additionally, theExecution Plan G� is built in a way that its cost is minimizedand the non-functional criteria are met.

Definition 1. Query: AQueryQ is a pair (F,NF ), where

F and NF represent functional and non-functional require-ments, respectively. The functional requirement F is repre-sented by a pair (I,O), where I is a set of input attributesand O is the set of attributes that need to be produced.The non-functional requirement NF is represented by a setof triples, (P,Op, V a), where, P corresponds to a QoS pa-rameter, Op to a relational operator, and V a to a value.

In this sense, each tuple (P,Op, V a) in NF establishes apermissible range for the parameter P .

Next, we formally define a service graph, and the condi-tions required to satisfy a given query.

Definition 2. Service Graph: A Service Graph G =(V,E, nf) is a directed bipartite graph. Nodes in V areof two types: attributes and services. Edges in E representrelationships between attributes and services, such that, ifT is an attribute and S is a service, then:

• if (T, S) ∈ E, then T corresponds to an input param-eter of the service S,

• if (S, T ) ∈ E, then elements of the attribute T areproduced by the service S.

• nf is a set of pairs (P, v) representing the non-functionalproperties of G, where P is a QoS parameter and v thevalue that describes G in terms of P .

Definition 3. Satisfiability: LetQ = (F,NF ) be a Query.Let G = (V,E, nf) be a Service Graph. The set nf satisfiesNF iff for each triple (P,Op, V a) in NF , exists a pair (P, v)in nf and the expression v Op V a holds.

Definition 4. Petri-Net: A Petri-Net is a directed bipar-tite Service Graph G = (P ∪ T, E, nf), where:

• P is a finite set of nodes, called Places

• T is a finite set of nodes, called Transitions and P∩T =∅

• E ⊆ (P × T ) ∪ (T × P ) is a set of directed edgescalled arcs, known as the flow relation. A pair (p, t)indicates that p is an Input Place of t; a pair (t, p)expresses that p is an Output Place of t.

A Petri-Net G is marked by assigning ”tokens” to Places.When all the Input Places of a Transition have at least onetoken, the Transition can be ”fired”. When a Transition isfired, a token is removed from each of its Input Places andtokens are added to its Output Places.

Definition 5. Marking: A Marking of a Petri-Net G =(P ∪ T, F ), is a mapping M : P → N, such that, for p ∈ Pand predecessors(p) �= ∅, if there exists a transition t ∈predecessors(p) and t is fired, then M(p) = |T |+ 1; other-wise, M(p) = 0. If predecessors(p) = ∅, M(p) = |T | + 1,i.e., p is part of the initial Marking.

A marking M for G is a specific configuration of markedPlaces in G.

Definition 6. Initial Marking (MI). Let Q = (F,NF )be a Query with F = (I,O). The Initial Marking is theMarking induced by Q, such that, if a ∈ I , MI(a) = |T |+1,otherwise MI(a) = 0.

Web Services 1 iiWAS2010 Proceedings

27

Definition 7. Firing sequence. A firing sequence σ ={s1, . . . , sn | si ∈ S} such that MI

σ→ Mn iff there are

markings M1, ...,Mn with MIs1→ M1 . . .Mn−1

sn→ Mn.

Definition 8. Final Marking (MF ). Let Q = (F,NF )be a Query with F = (I,O). A Final Marking for Q is aMarking MF iff there exists a firing sequence σ, such that,MI

σ→ MF and ∀o ∈ O, MF (o) > 0.

Definition 9. Execution Plan or Service Composi-tion: Let Q = (F,NF ) be a Query. An Execution Plan forQ is a Petri-Net G = (P ∪ T,E, nf), such that,

• there exists a Final Marking MF for Q in G

• nf satisfies NF (See Definition 3).

Note that a firing sequence σ corresponds to a set of firedtransitions, that in fact represents the selection of severalWSs.

Definition 10. Cut-off Service: Let ≺ be an adequateorder of Transitions in a Petri-Net and β be a prefix, i.e., apath in the Petri-Net. Let e�� be a Transition in the Petri-Net, and Marking(e��) is a Marking induced when the Tran-sition e�� is fired. A Transition e is a cut-off Transition inβ with respect to ≺ iff β contains some event e�, such that,Marking(e) = Marking(e�) and e� ≺ e.

Cut-off Transitions refer to Transitions where a Petri-Netunfolding algorithm can stop and all possible Markings arereached. In consequence, we define the WSC problem as fol-lows:

Definition 11. The Web Service Composition Pro-blem-(WSC) Given a Query Q = (F,NF ), a Petri-Net G =(V,E) that represents the relationships among the availableservices and the attributes, the WSC problem is to identifyinga Petri-Net G� = (V �, E�) that corresponds to an ExecutionPlan for Q i.e., if F = (I,O), then o ∈ O are in a MarkingM for G and the values of the QoS parameters in nf satisfythe constrains expressed in NF .

3.2 A Sampling-Based Strategy to Estimate QoSparameters

In this section we present our proposed prediction toolthat uses strategies proposed in [23] and [13], to accuratelyestimate the QoS values of the available services. Althoughthere exist a great variety of sampling techniques, we havechosen these two because they have shown to perform quitewell predicting values of parameters whose distributions arenon-uniform and have high variance. Other techniques suchas double sampling techniques [15], are more sensitive tothis type of distributions and its predictions could be lessaccurate.

Our sampling technique assumes that there is a popula-tion P of all the different valid instantiations of the inputattributes of a service s, and the set S corresponds to thepartition of the answer produced by s into n partitions ac-cording to the instantiations of the input arguments of s.Each element in S is associated with QoS values such as itsexecution cost and its cardinality, and the population S ischaracterized by the statistics mean and variance of theseQoS parameters.

The objective of our sampling technique is to identify asample of S, called ES, such that the mean and varianceof the QoS values of ES are valid within a predeterminedaccuracy and confidence level.

Without loss of generality let’s define the sampling tech-nique for cardinality and execution cost. To estimate themean of the cardinality (resp., execution cost) of ES, say Y ,

within Yd with probability p, where 0 ≤ p < 1 and d > 0, and

α = d×(d+1)

(1−√

p), the sampling method assumes an urn model.

The urn has n balls from which m samplings are repeatedlytaken, until the sum z of the cardinalities (resp., executioncosts) of the samples is greater than α×(VY ). The estimated

mean of the cardinality (resp., execution cost) is: Y = zm

The values d and 1

(1−√

p)are associated with the relative

error and the confidence level, and V and Y represent thecardinality (resp., execution cost) variance and mean of S,respectively. The sampling techniques stop sampling whenthe sum z exceeds an upper bound b. We consider the fol-lowing two methods, based on the estimation of mean (Y )and variance (V ) of the cardinality (resp., execution cost),to determine the upper bound b. Our objective is to reach(through sampling) ES with mean Y , such that, with a highconfidence (at least α), the relative error of the estimationis greater than some given constant �.

P (|Y − YY

| ≥ �) = α

3.2.1 Adaptive SamplingFollowing this sampling technique [23], the upper bound

b is defined as an approximation of VY . Accordingly, b is

defined as the maximal cardinality (resp., execution cost) ofthe answers produced by the service s, i.e., b corresponds toan upper bound of the cardinalities (resp., execution costs)of the objects in S. The stop condition of the samplingis defined in terms of the sum of the estimates (z) or thenumber of samples (m):

(z ≥ k1 × b× (1�+ 1)

1�) ∨ (m ≥ k2 × h)

where, h = 100� and k1 and k2 are constants. k1 considersthe case when the distribution from which the variance ofthe cardinality (resp., execution cost) of the objects of ESis computed, is normally distributed, while k2 representsthe situation when the cardinalities (resp., execution cost)of the objects in S are not normally distributed. The firstsub-condition imposes an upper bound on the sum of z. Thesecond sub-condition establishes a sanity bound and controlsthe termination of the sampling process when b is high andan oversampling can arise. This sampling technique can beprecise and efficient, if the value b nicely fits the propertiesof the cardinality (resp., execution cost) of the objects inS. However, if b over-estimates the maximum cardinality(resp., execution cost) value, this sampling technique canbe expensive and inefficient. We refer the reader to [23] fordetails.

3.2.2 Sequential SamplingTo overcome the limitations of the Adaptive Sampling

technique, Sequential Sampling [13] offers a more generalsolution. Following this technique, the statistics V and Y

iiWAS2010 Proceedings Web Services 1

28

are estimated at each sampling step using all the observa-tions seen so far. Let Vt and Yt be the estimators of V andY after t objects in S have been sampled with replacement;as previously, z corresponds to the sum of observed cardi-nalities (resp., execution cost) of objects in ES, and m is thenumber of samples. Then, the stop condition is as follows:

(m ≥ 1) ∧ (Vt > 0) ∧ (�×max(z,m× υ) ≥ tα(t× Vt)1/2)

Thus, the termination condition is determined at everysampling step, and sampling will terminate when the desiredaccuracy and confidence level α are reached; tα is definedbased on α and a standardized normal random variable. Theexpression υ represents a bound in the cardinality (resp.,execution cost) values that can be considered during eachiteration. The term max(z,m × υ) is a sanity bound thathelps to solve the oversampling problem that arises whenthe cardinalities (resp., execution cost) of objects in S aresmall.

3.3 Quality Estimation of a Composite WebService

We propose a cost model to estimate QoS parameters val-ues of an Execution Plan; this cost model is used to guidethe PT-SAM algorithm into the space of low execution costservice compositions.

The quality of an Execution Plan can be defined accordingto the QoS parameters that will be minimized/maximized.In this paper we define the Execution Plan quality basedon a function that aggregates functional and non-functionalrequirements in Query Q = (F,NF ):

Quality(G,Q, t) = Cost(G,NF, t) ×NumOutputs(G,F, t) (1)

where:

• G is an Execution Plan.

• NumOutputs(G,F, t) is the number of Query Outputsin F which could be reached from Transition t.

• Cost(G,NF ) is the normalized cost for Execution PlanG. And it should be defined according to NF .

Then, if NF = {(T ime,≤,MaxT ime)}, the cost of anExecution Plan G = (P ∪ T,E, nf), denoted as f(G), isobtained by evaluating Equation 2 which is a combinationof the values related to the non-functional parameters ofservices s ∈ T . It is defined as follows:

f(G) =X

s∈T

InCard(s) ×ExecutionT ime(s) (2)

where, InCard(s) and ExecutionT ime(s) correspond to thevalues estimated by our proposed sampling techniques (Seesection 3.2). Note that InCard(s1) refers to the size of theset that Service s needs to be invoked with.

In this sense, the normalized cost for the Execution PlanG w.r.t. Q is defined in Equation 3.

Cost(G,Q, t) =MaxTime

f(G) + h(G,F, t)(3)

where h(G,F, t) is an admissible heuristic that estimates thecost of reaching a Final Marking MF for Q.

We illustrate how these Equations guide PT-SAM into thespace of Firing Sequences that will produce the Final Mark-ing MF for Q = (F,NF ). Suppose an Execution Plan Gwith estimated execution time of 200 and {MaxT ime <400} ∈ NF , and there exist two Transitions, t1 and t2, thatcan be fired. Each Transition is annotated with values forInCard, ExecutionT ime, and the admissible heuristic h, asshown in Table 1:

Table 1: Estimated QoS values for Firable Transi-tions t1 and t2Transition InCard(t) ExecutionT ime(t) h(G,F, t)

t1 5 10 20t2 5 15 5

Note that Cost(G,Q, t1) = 270, Cost(G,Q, t2) = 280,Quality(G,Q, t1) = 1.48, and Quality(G,Q, t2) = 1.42. PT-SAM will fire Transition t1 because the resulting ExecutionPlan has a better quality, i.e., the resulting Execution Planis closer to the Final Marking MF .

In section 4, we will empirically show the quality of theproposed metric by reporting on its predictive capability.

3.4 PT-SAM: Petri-NET Service Aggregation Match-making

In order to consider functional and non-functional require-ments to solve the WSC problem (see Definition 11), wehave extended the greedy algorithm SAM [10] by adapting aPetri-Net unfolding algorithm [9]. The goal of the PT-SAM al-gorithm is to identify a set of services that need to be addedto the plan to reach the desired Marking from the InitialMarking. The Initial Marking corresponds to a state in thePetri-Net where only the Places associated with the QueryInputs are marked. The desired Marking corresponds to thestate where Query Outputs are marked. In order to iden-tify efficient plans, PT-SAM uses a meta-heuristic to prune thespace of service compositions, while minimizing/maximizingQoS values specified in the non-functional condition.PT-SAM, defined in Algorithm 1, starts by creating a Petri-

Net and marks the Places that correspond to the Input at-tributes in the Query (Step 1 and 2). PT-SAM iterates untilthe desired Marking has been reached or there are no moreservices to be added to the plan (Step 9). Each iteration be-gins by selecting a service that improves the quality of theplan i.e., the state induced by the new plan is closer to thedesired Marking (Step 3), the selection of the service is basedon the quality metric defined in Equation 1. Then, PT-SAMfilters out the services that are cutoff (Step 4), because theywill not produce new Markings (see Definition 10). Only ser-vices that represents new information will be added to theplan, improving the plan quality (Step 5). Then, the list ofcandidate services is updated first by adding the new firableservices (Step 7) and then removing the service added tothe plan (Step 8). PT-SAM is sound and complete as shownin [7].

It is important to note that in traditional definitions andimplementations of Petri-Nets, if a Place p precedes a setof Transitions Sucs(p) and |Sucs(p)| > 1, then p is repli-cated for each ti ∈ Sucs(p) to avoid exclusivity in firingTransitions, such that only pi precedes ti and for each t thatprecedes p, t precedes pi.

Web Services 1 iiWAS2010 Proceedings

29

Algorithm 1: PT-SAM: Petri-NET ServiceAggregation MatchmakingInput: Query Q = (F,NF ), where F = (I, O) with I is the

set of inputs and O is the set of outputs and NF is a setof non-functional properties (See Definition 1)

Input: OT : Ontology describing the domainInput: OWS: Ontology of Web ServicesOutput: G: an Execution Plan that satisfies Q

begin

Create a Petri-Net G = (T ∪ P,E, nf)1

Initializate σ

Assing ∀i ∈ I,M(i) ← |T | + 12

Candidates ← {t ∈ T : ∃ i ∈ I and (i, t) ∈ E}repeat

Select t ∈ Candidates s.t.: (∀t1 ∈ Candidates:3

Quality(G,Q, t) ≥ Quality(G,Q, t1))if ¬ isCutOff(t) then4

Fire t

Add t to σ5

foreach p ∈ successors(t, G) do6

M(p) ← |T | + 1Candidates ← Candidates ∪ {tc : (p, tc) ∈ E ∧7

|predecessors(tc)| = |{pp : (pp, tc) ∈E ∧ M(pp) > 0}|}

Candidates ← Candidates − {t}8

until Candidates = ∅ or ( ∃MF for Q in G)9

if if ∃MF for Q in G then

Return G10

else

Return ERROR

end

The traditional Petri Net implementation increases thenumber of nodes in the Petri-Net which makes more expen-sive the unfolding process. Our definition of Marking (SeeDefinition 5) allows PT-SAM to remember which Places havebeen marked even if all the Transitions fed by a Place phave been already fired. Thus, PT-SAM speeds up the pro-cess of unfolding a Petri-Net. In experimental studies, wehave observed that PT-SAM overcomes traditional Petri Netimplementations by up to 4 orders of magnitude for Petri-Nets with at most 400 Transitions.

4. EXPERIMENTSWe present an empirical study that shows the predictive

capability of the Quality function; the execution time of thecost-based optimization techniques; and the quality of theidentified compositions by the PT-SAM algorithm when theestimates are used to guide the search.

4.1 Predictive capability of the Sampling Strate-gies

A set of 16 WSs was defined over the site DBLP Datasetavailable on http : //dblp.uni − trier.de/. These WSs weredeveloped using Open JDK 6 and deployed on Apache Tom-cat 6.0.20 application server running in two different do-mains in order to generate different QoS values for the sameservices.

We ran the Adaptive and Sequential Sampling techniquesto estimate the cardinality and execution time of this set of16 real-world services. The relative error of the estimationis 0.05 while the confidence level was set to 0.95; the con-stants tα and k2 are 1.1972893 and 1.43350, respectively. Wesampled less than 1% of the population. For these services,Adaptive Sampling converges at 36 samples, while Sequen-tial Sampling just requires up to 8 samples.

For each service, we have compared the estimated and ac-tual values of cardinality and execution time, and we reportthe Pearson correlation coefficient2 between these values inTable 2. Particularly, the correlation is at least 0.64 for car-dinality and at least 0.87 for execution time, i.e., there is alinear relationship between estimates and real values. Thisresult suggests that the proposed sampling techniques canaccurately estimate the QoS parameters, while the numberof samples is less than 1% of the whole population.

Table 2: Correlation of QoS Estimates using twosampling strategies

Adaptive SequentialCardinality 0.93 0.64

Execution Time 0.87 0.92

We also composed the 16 services and computed estimatesof execution time and cardinality by using the previouslydefined formulas. Similarly, we ran these compositions andcomputed the actual costs and cardinalities. First, we couldobserve that there is a high variance in the distribution ofthese two QoS parameters. In Table 3 we report on the

Table 3: Standard Deviation of the Execution Timeof Service Compositions

Size Number of Plans Mean StdDev

1 40 15.17 13.012 100 81.98 121.843 51 479.27 1617.72

mean and standard deviation. We compare the estimatesand the actual values, and we note that, even though thedata is non-uniform and is comprised of some out-siders, wecould obtain a relatively *high* correlation between the es-timated values and the actual ones. In Table 4, we showthe correlation coefficients of our experiments after remov-ing 4 out-siders out of a total of 191 points. We report a

Table 4: Pearson correlation coefficient for 187 Ser-vice Compositions

Plan Size Adaptive Sequential

1 0.94 0.932 0.41 0.383 0.49 0.50

All Plans 0.55 0.54

positive correlation between these two values, i.e., there is a

2Pearson’s correlation is a number between -1 and +1 thatmeasures the degree of association between two variables(call them X and Y). A positive value for the correlationimplies a positive association (large values of X tend to beassociated with large values of Y and small values of X tendto be associated with small values of Y). A negative valuefor the correlation implies a negative or inverse association(large values of X tend to be associated with small values ofY and vice versa).

iiWAS2010 Proceedings Web Services 1

30

linear trend between these two values, especially for Adap-tive Sampling with a correlation coefficient of 0.57. Thisresult suggests that our cost model can be a good predictivetool of the execution time of service compositions. However,we note that the correlation of service compositions is lowerthan the correlation of atomic services; we hypothesize thatthe decrease of the predictive capability may be caused byunexpected network delays that are not represented in ourcost formulas.

4.2 Effectiveness of PT-SAM using the ProposedSampling Techniques

We have also empirically studied the effectiveness of thesearch strategy guided by our sampling-base cost models. Totest our techniques, we generated a base dependency graphcomprised of 100 services. This graph was generated usingone of the Web Service Model: Barabasi-Albert model [3]with an average degree of 4. In order to expand the sizeof possible solutions two different strategies were followed.First, sets of co-replicas were added to the base graph pereach pair of services. Then, for each service in this newgraph, we added a random number of replicas. This numberwas selected by following a uniform distribution between 2and 5; each new service was associated with QoS values.At the end, we obtained a dependency graph comprised of4,748 services.

A set of queries, classified according to their size, wasgenerated for the final generated graph. The size measuresthe number of services in the optimal plan in the wholespace of solutions for each query. The sizes of queries rangefrom one to twenty services. There are twenty queries foreach size, and a total of 400 queries. These queries wererandomly generated following a uniform distribution.

We compare PT-SAM to SAM and to DP-Best, which is anexhaustive approach that finds all possible solutions of eachquery. SAM [10] is a greedy-based algorithm that is able toidentify the service compositions that satisfy a given func-tional requirement. SAM can be very efficient; however, sincecost or quality of the services is not considered, the outputcomposition can be far from the optimal.

All the solutions were run in a SUN workstation with 2GBytes of memory, two Dual Core AMD Opteron proces-sors 180 with 2.4 GHz and running Ubuntu 9.04 operatingsystem. OpenJDK 6 virtual machine was used to develop andrun the programs. The OWL-S API was used to parse theWeb Services definitions and to deal with the OWL classi-fication reasoning process.

4.2.1 Optimization TimeIn Figure 1, time consumed by DP-Best was computed

for queries of size up to four. We could not exhaustivelyproduce the search space of larger queries, because DP-Bestran out-of-memory.

As can be seen in Figure 1, we can note that SAM and PT-SAM have a similar behavior, and both techniques are ableto produce solutions to the queries in less than 40 seconds.

4.2.2 Plan QualityIn Figure 2, we present the estimated evaluation cost of

plans identified by each approach. We note that SAM pro-duces plans that are clearly more expensive that those pro-duced by PT-SAM.

From Figures 1 and 2 we can see that, PT-SAM produced

Figure 1: Optimization Time

Figure 2: Estimated Cost of identify solutions

plans that are closer to the optimal, while its evaluationremains low.

5. CONCLUSIONS AND FUTUREWORKWe have proposed a sampling-based approach to estimate

QoS parameters of Web Services. We have empirically stud-ied the predictive capability of our approach, and the ob-served behavior indicates that accurate estimates can begenerated, while the number of samples remains low. Wealso have presented a Petri-Net-based search technique, namedPT-SAM, to traverse the space of service compositions; thistechnique uses cost and cardinality estimates to guide thesearch into the space of good quality service compositionsthat satisfy a user request. Initial experimental results showthat the proposed technique is able to identify compositionsclose to the optimal, while the evaluation time remains low.In the future we plan to extend the proposed sampling tech-niques to estimate other type of QoS parameters, and usethese estimates to identify good compositions. We also planto conduct a more exhaustive experimental study by usingother real-world services.

6. REFERENCES[1] E. Al-Masri and Q. Mahmoud. Investigating Web

Services on The World Wide Web. In WWW, pages

Web Services 1 iiWAS2010 Proceedings

31

795–804, 2008.[2] E. Al-Masri and Q. H. Mahmoud. Discovering the

Best Web Service. In WWW ’07: Proceedings of The16th International Conference on World Wide Web,pages 1257–1258, New York, NY, USA, 2007. ACM.

[3] A. L. Barabasi and R. Albert. Emergence of scaling inrandom networks. Science, 286(5439):509–512,October 1999.

[4] D. Barreiro, O. Licchelli, P. Albers, and R.-J.de Araujo. Personalized Reliable Web ServiceCompositions. In WONTO, 2008.

[5] D. Berardi, F. Cheikh, G. D. Giacomo, and F. Patrizi.Automatic Service Composition via Simulation. Int. J.Found. Comput. Sci, 19(2):429–451, 2008.

[6] D. Berardi, G. D. Giacomo, M. Mecella, andD. Calvanese. Composing Web Services withNondeterministic Behavior. In ICWS, pages 909–912,2006.

[7] E. Blanco, Y. Cardinale, and M.-E. Vidal. AggregatingFunctional and Non-Functional Properties to IdentifyService Compositions, page pp. IGI BOOK (53), 201.Accepted to be published in 2010.

[8] E. Blanco, Y. Cardinale, M.-E. Vidal, and J. Graterol.Techniques to Produce Optimal Web ServiceCompositions. In 2008 IEEE Congress on Services2008 - Part I (SERVICES-1 2008), pages 553–558,Honolulu, Hawaii, USA, 2008. IEEE ComputerSociety.

[9] B. Bonet, P. Haslum, S. Hickmott, and S. Thiebaux.Directed unfolding of petri nets. pages 172–198, 2008.

[10] A. Brogi, S. Corfini, and R. Popescu.Composition-oriented Service Discovery. In Proc. ofSoftware Composition’05, LNCS, volume 3628, pages15–30, 2005.

[11] V. Cardellini, E. Casalicchio, V. Grassi, and F. L.Presti. Flow-Based Service Selection for Web ServiceComposition Supporting Multiple QoS Classes. InProc. of IEEE 2007 Int’l Conf. on Web Services, 2007.

[12] T. Erl. Service-Oriented Architecture : Concepts,Technology, and Design. Prentice Hall PTR, August2005.

[13] P. Haas and A. Swami. Sequential SamplingProcedures for Query Estimation. In Proc. of VLDB,1992.

[14] Hong Qing Yu and Stephan Reiff-Marganiec.Non-functional Property Based Service Selection: ASurvey and Classification of Approaches. November2008.

[15] W. Hou, G. Ossoyoglu, and Doglu. Error-constrainedcount query evaluation in relational databases. InProc. of SIGMOD, 1991.

[16] M. C. Jaeger, G. Muhl, and S. Golze. Qos-awarecomposition of web services: An evaluation of selectionalgorithms. LNCS, 3760:646–661, October 2005.

[17] M. C. Jaeger, G. Rojec-Goldmann, and G. Muehl.QoS Aggregation for Web Service Composition usingWorkflow Patterns. In Proceedings of Eighth IEEEInternational Conference on Enterprise DistributedObject Computing (EDOC’04), volume 00, pages149–159. IEEE Computer Society, 2004.

[18] J. M. Ko, C. O. Kim, and I.-H. Kwon.

Quality-of-Service Oriented Web Service CompositionAlgorithm and Planning Architecture. Journal ofSystems and Software, 81(11):2079–2090, 2008.

[19] U. Kuter and J. Golbeck. Semantic web servicecomposition in social environments. In InternationalSemantic Web Conference, pages 344–358, 2009.

[20] F. Lecue. Optimizing qos-aware semantic web servicecomposition. In International Semantic WebConference, pages 375–391, 2009.

[21] Q. Li, A. Liu, H. Liu, B. Lin, L. Huang, and N. Gu.Web services provision: solutions, challenges andopportunities (invited paper). In Proceedings of the 3rdInternational Conference on Ubiquitous InformationManagement and Communication (ICUIMC ’09),pages 80–87, New York, NY, USA, 2009. ACM.

[22] Y. Ling and W. Sun. A Supplement toSampling-Based Methods for Query Size Estimation ina Database System. SIGMOD Record, 21(4):12–15,1992.

[23] R. Lipton and J. Naughton. Query Size Estimation ByAdaptive Sampling (Extended Abstract). In PODS’90: Proc. of the 9th ACMSIGACT-SIGMOD-SIGART symposium on Principlesof database systems, pages 40–46. New York, NY,USA, 1990.

[24] J. S. Malak, M. Mohsenzadeh, and M. A. Seyyedi.Web service qos prediction based on multi agents. InICCTD ’09: Proceedings of the 2009 InternationalConference on Computer Technology andDevelopment, pages 265–269, 2009.

[25] D. Menasce. Composing Web Services: A QoS View.IEEE Internet Computing, 8(6):88–90, November2004.

[26] H. Rahmani, G. GhasemSani, and H. Abolhassani.Automatic Web Service Composition ConsideringUser Non-functional Preferences. Next GenerationWeb Services Practices, 0:33–38, 2008.

[27] E. Ruckhaus, E. Ruiz, and M. Vidal. Queryoptimization in the semantic web. In Theory andPractice of Logic Programming. Special issue on LogicProgramming and the Web, 2006.

[28] S. Sardina and G. D. Giacomo. Composition ofcongolog programs. In C. Boutilier, editor, Proceedingsof the 21st International Joint Conference onArtificial Intelligence (IJCAI), pages 904–910,Pasadena, California, USA, July 11-17 2009.

[29] S. Sohrabi and S. A. McIlraith. Optimizing webservice composition while enforcing regulations. InInternational Semantic Web Conference, 2009.

[30] H. Wada, P. Champrasert, J. Suzuki, and K. Oba.Multiobjective Optimization of SLA-aware ServiceComposition. In IEEE Congress on Services,Workshop on Methodologies for Non-functionalProperties in Services Computing, 2008.

[31] T. Yu, Y. Zhang, and K.-J. Lin. Efficient algorithmsfor Web Services Selection with End-to-End QoSConstraints. ACM Trans. Web, 1(1):6, 2007.

iiWAS2010 Proceedings Web Services 1

32