[IEEE 22nd International Conference on Data Engineering (ICDE'06) - Atlanta, GA, USA (2006.04.3-2006.04.7)] 22nd International Conference on Data Engineering (ICDE'06) - Syntactic Rule Based Approach toWeb Service Composition

Download [IEEE 22nd International Conference on Data Engineering (ICDE'06) - Atlanta, GA, USA (2006.04.3-2006.04.7)] 22nd International Conference on Data Engineering (ICDE'06) - Syntactic Rule Based Approach toWeb Service Composition

Post on 27-Mar-2017




0 download

Embed Size (px)


  • A Syntactic Rule Based Approach to Web Service Composition

    Ken PuUniversity of Torontokenpu@cs.toronto.edu

    Vagelis HristidisFlorida International University


    Nick KoudasUniversity of Toronto



    This paper studies a problem of web service composi-tion from a syntactic approach. In contrast with other ap-proaches on enriched semantic description such as state-transition description of web services, our focus is in thecase when only the input-output type information from theWSDL specifications is available.

    The web service composition problem is formally for-mulated as deriving a given desired type from a collec-tion of available types and web services using a prescribedset of rules with costs. We show that solving the minimalcost composition is NP-complete in general, and presenta practical solution based on dynamic programming. Ex-periements using a mixture of synthetic and real data setsshow that our approach is viable and produces good results.

    1 Introduction

    Several efforts, including the Web Service ConversationLanguage (WSCL), the Business Process Execution Lan-guage for Web Services (BPELWS), and the integration ofweb service calls on XQuery [10] address the issue of webservice composition. A natural problem is to automate thecomposition of web services. Towards this goal, a large cor-pus of work has focused on composing semantically rich(semantically annotated) web services. For instance, in[2, 5, 1] web services are modeled as finite-state machinesthat reflect their functionality, and their compositions aremodeled as compositions of state machines. Such a seman-tically rich description of the web services allows discoveryof high-quality compositions [2] as well as verification ofexisting ones [5]. However, WSDL, which is the currentlywidely adopted standard for description of web services, of-fers limited syntactic description of the services. With onlysuch limited information on the input-output types of theweb services, is it possible to automatically produce com-positions to perform specific tasks?

    In this paper, we propose a syntactic approach to webservice composition, given only their WSDL descriptions.In particular, we view the web services as black boxes ca-pable of transforming XML fragments of the input type toXML fragments of the output type, and hence the prob-

    lem of web service composition becomes a type-derivationproblem as follows: Given a collection T0 of base types, de-scribed in XML Schema, a collection S of web services, anda target type t, derive t from T0 using the services in S.

    The target type is derived by applying a set of derivation-rules, or simply rules. These rules, which we will presentin detail, describe the ways in which new types are derivedfrom existing ones by application of web services. The tar-get type is derived from the base types by a series of ap-plications of these rules, which include sequential or paral-lel invocation of web services, and iteration over contain-ers. Each step of a derivation (application of a rule) incursa cost according to a cost model, which generally favorsderivations with fewer rule applications (i.e., fewer struc-tural transformations and web service applications).Example: Consider the type books which is a collection ofbook elements. Each book element is identified either by itstitle or its ISBN number, along with the author names andthe publisher.

    books : book[1,] :


    [title : string orISBN : string

    authors : author[1,] :[

    fname : stringlname : string

    publisher : string

    Suppose that we have two web services described as fol-lows:

    WS1 :

    [title : stringpublisher : string

    [bestprice : numericbookstore : string

    WS2 : ISBN : string

    [bestprice : numericbookstore : string

    Suppose that one wishes to find the best prices of all thebooks in a collection (of type books). The target type is:

    bestprices : bestprice[1,] :[

    bestprice : numericbookstore : string

    One can manually verify that the target type bestpricescan indeed be derived from the collection of base types T0 ={books} and web services S= {WS1, WS2}. The derivation is notunique, but a sensible one is the following.

    Derivation Plan:1. Iterate through the collection books,2. if it is identified by its title, then3. pass the title and publisher information to WS1,4. else, if it is identified by its ISBN, then5. pass the ISBN number to WS2.6. Combine the outputs and form the collection bestprices.

    Proceedings of the 22nd International Conference on Data Engineering (ICDE06) 8-7695-2570-9/06 $20.00 2006 IEEE

  • The rules proposed in this paper support this derivationas we provide rules for iteration (line 1), if-then-else on atype (line 2 and 4), type-construction and destruction (lines3, 5 and 6), and application of web services (lines 3 and 5).We use a cost model to eliminate non-sensible derivations(such as using the authors last name as the publisher, andso on).

    Naturally, the quality of the generated candidate compo-sitions depends on the choice of the derivation rules. The setof rules used in this paper, which are shown to be sound ex-perimentally, are based on previous work on schema match-ing, type derivation and tree edit distance, which are ar-eas related to this work as we explain in Section 2. How-ever, our algorithmic framework is generic and a differentset of rules can be specified according to the applicationneeds 1. Our case studies show that one can still find use-ful derivations using real-life WSDL files. In this paper, wepresent our theoretical and algorithmic results on the type-derivation problem.

    2 Related Work

    Tree edit distance works [16, 21] (see [3] for a sur-vey) define a minimal set of operations (typically add node,delete node, and relabel node) to tackle the problem oftransforming a labeled tree T1 to a labeled tree T2 apply-ing a minimum sequence of operations. Our problem ismore complex since the labels of an XML Schema tree carryadditional information (e.g., list, union, complex type, tagname, base-type, and so on) which makes plain relabelinginapplicable. For example, if two XML Schema trees differonly on the label of a single node, whose label is unionfor the first tree and list for the second, then they haveunit distance in tree edit distance, but they are incompatiblein our framework. Tree matching works [4] have the samelimitation.

    Dong et al. [6] propose a method to discover similarWeb Services based on the textual descriptions (in WSDL)and their input/output parameters names. It does not dealwith composition however. Paolucci et al. [11] describe aframework to semantically match Web services describedusing DAML-S. Recent work has tackled the problem ofcomposing web services by exploiting the flow diagramsthat represent them. In particular, web services are viewedas state machines [20, 18, 1, 2].

    Wombacher et al. [18] determines if a composition isvalid by the intersection of the corresponding state ma-chines. Furthermore, since the web services are modeled astransition structures, one can verify the web service compo-sition by simulation [9] or by model-checking methods [5].Such works, which are complementary to our approach, as-sume access to the behavioral characteristics of the servicewhich may or may not be available; we only assume ac-cess to the service type signature which is typically avail-able through WSDL (and XML Schema). In SWORD [12],

    1For instance, if we want to create more conservative compositions, wecan remove the optional rule (described in Figure 3).

    web services are treated as input-output functions, and a setof rules of composition is specified. In our work, we alsotreat web services as functions, but work with a more com-plex set of rules that are capable constructing more sophis-ticated composition plans.

    Schema matching tackles the problem of finding corre-spondences between the elements of two given schemas.Much work has been conducted on matching relationalschemata [14], but recently there has also been work onmatching of semi-structured schemata [13, 17, 15, 7, 8].These algorithms match two schemata by matching theirtree structures, exploiting possible semantic constraints[7, 17, 15]. In [8], the management of these mappings isconsidered. In contrast, our work tackles the problem oftransforming a set of base types (schemata) to a target typeusing the available web services as transformation tools.The web services can be invoked in various ways duringthe transformation process according to a set of derivationrules.

    3 The Optimal Type-Derivation Problem

    In this section, we formally define the optimal type-derivation problem with respect to an arbitrary set of deriva-tion rules. Then, we introduce a set of XML-centric ruleswhich are used to capture the ways in which XML Schemascan be modified, combined, and web services invoked, inthe derivation process. These rules naturally correspond toprogramming constructs found in BPEL, such as sequentialand parallel invocations, iteration over lists, and branch-ing based on node-types. We also included limited struc-tural transformations, which can easily be implemented byXQuery, as part of the rule-set. Derivations using these rulescorrespond to compositions of web services coupled withpossible structural transformations.

    We represent an XML Schema as a tree of labeled nodes.The fragment of XML Schema we consider contains com-plex types of sequence, choice, and elements can haveminOccurs and maxOccurs constraints. As a labeled tree,each node can be a tag name, a complex type construc-tor (sequence, union), or a primitive type such as int orstring. A union-node represents a choice-complex type,that is, the instance of which can only be one of the chil-dren types in accordance with the XML Schema specifica-tion. Each node may optionally have a multiplicity modifier[m,n] indicating that in the instance, its occurrence is be-tween m and n inclusively this corresponds to the minOc-curs and maxOccurs constraints in XML Schema. We calla node optional if it is either a union-node, or a node withminOccurs = 0 constraint. Types can, thus, be representedby terms such as UNION(title(string), ISBN(string)). A web ser-vice is simply a multi-input-multi-output function of theform f : s1,s2, ,sn t1, tm where si and ti are com-plex types.

    PROBLEM FORMULATION: Let Ai and B be sets of types,called environments, and si and t types. We write Ai si tomean that the type si can be derived from the environment

    Proceedings of the 22nd International Conference on Data Engineering (ICDE06) 8-7695-2570-9/06 $20.00 2006 IEEE

  • Ai. Intuitively, an environment Ai is the collection of avail-able types, and Ai si is to say that, by some means, we canderive si from Ai. A derivation-rule, or simply a rule, hasthe form

    A1 s1 An snB t


    It reads that if si can be derived from the environment Ai,then t can be derived from the environment B. The rule canbe written as r(B t) to indicate that r derives t from theenvironment B. A rule is grounded if it requires no Ai si,written /0Bt . We further assume that each rule r carries a costgiven by a generic cost function cost(r(B t)). Derivationrules can be composed in the natural way to form deriva-tions.

    Formally, a derivation D(B t) is defined as,

    any rule r(B t) is a derivation, and

    if r(B t) is a rule of the formA1 s1 An sn

    B tr,

    and Di(Ai si) are derivations, then

    D1(A1 s1) Dn(An sn)

    B tr is a derivation.

    Nothing else is a derivation for B t.

    A derivation D(B t) = {Di(Aiti)}iBt r can be viewed as a tree

    of rules, with the root being the rule AitiBt r, and sub-treesDi(Ai ti). It says that type t is derived from the environ-ment B by the derivation D. The derivation is groundedif its leaf-rules are all grounded. The cost for a derivationD(B t), written D(B t) is defined as,

    if D(B t) = r(B t), then D(B t)= cost(r(B t)),

    if D( t) = D1(A1s1) Dn(Ansn)Bt r, then

    D(B t)= cost(r(B t))+g(D1(A1 s1), . . . ,Dn(An sn)),

    where g( ) is either sum, or maximum 2 depending on therule r.

    Definition 1. The optimal type derivation problem is: givena rule set R, a cost function cost, an environment A and atype t, find a grounded derivation D(A t) with the minimalcost D(A t).

    The classical tree-edit distance problem [3] is a specialinstance of the optimal type derivation problem where theenvironment contains the source tree, and the derived typeis the target tree. There are three rules: insert-, delete-and replace, each of unit cost. Tree edit-scripts corre-spond to derivations where the cost is always additive. Thederivations, as defined, generalize tree-edit scripts in vari-ous ways. Derivations, in general, edit multiple trees, ortypes, in the environment to derive the new tree, and the

    2In most cases, the cost of a derivation is the sum the costs of the sub-derivations and the applied rule. However, as defined in Figure 3, if theapplied rule is union-rule, then we take the maximum of the costs of thesub-derivations.

    rules allowed are a richer set that includes the basic tree-edit operations, but also ways of invoking web services.

    From this point, we refer to types in the environment asthe base-types, and the type to be derived as the target type.

    3.1 Derivation rules for XML Schema

    We use rules to express ways in which XML-documents(more accurately XML Schemas) can be transformed, com-bined and evaluated by web services. The set of rules in-clude basic structural transformations (similar to the clas-sical tree-edit operations), merging structures (substitutingparts of a schema document with parts from other schemadocuments), and finally applying web services (evaluatinga tuple of documents using available web services). In thisframework, the web service composition problem is for-mally posed as a type-derivation problem in which the base-types are the available inputs, and the target type is the de-sired output. A derivation of the target type is then a plan ofhow the available web services can be composed, togetherwith possibly some structural modifications to the interme-diate results, to produce the desired output.

    /0A t

    MEM, if t A.A1 t1 A2 t2 Ai tn

    i Ai c(t1, t2, . . . ,tn)CON

    A tA t|p

    DES, where p is not under an optional node.

    A tA t|p

    DELA t A s

    A t p


    A t A sA t[s]p


    Figure 1. Structural transformations

    BASIC STRUCTURAL TRANSFORMATION RULES: Thefirst set of rules deals with structural transformations. Werefer to this set of rules as the structural rules, shown inFigure 1.Membership: If type t is part of the environment, then it canbe derived from the environment by the membership-rule(MEM).Construct: One can derive new types from existingtypes by introducing new tags. Suppose one hasderived ISBN(string) and city(string), then cer-tainly, we should be able to construct a new elementX(ISBN(string), city(string)) by introducing thenew tag X. This is captured by the construction-rule (CON)shown in Figure 1.Destruct: Contrary to the construct-rule, suppose thatw...


View more >