the active xml project: an overview serge abiteboul · omar benjelloun · tova milo lazy query...

25
The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu, Milo, Preda presented by: Irene Genitsaridh Univ. of Crete hy561 April 28, 2009

Upload: dina-gilmore

Post on 11-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,

The Active XML project: an overview

Serge Abiteboul · Omar Benjelloun · Tova Milo

Lazy Query Evaluation for Active XML

Abiteboul, Benjelloun, Cautis, Manolescu, Milo, Preda

presented by: Irene Genitsaridh Univ. of Crete hy561 April 28, 2009

Page 2: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,

The problem addressed is web data management.

Web characteristics High heterogeneity of data sources. Autonomy of data sources. The scale of the Web.

Result Web revolution is setting up new standards.

Page 3: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,

A language based on XML, Web services and XQuery, for complex data management tasks.

XML suitable model for web data exchange.Xquery is a query language for XML

promoted by the W3C (SQL of the Web) .Web services are network-accessible

programs taking XML parameters and returning XML results.

Page 4: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,

Embedding calls to Web services inside XML documents.

Page 5: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,

Materialization. The service invocation is done

using the SOAP protocol the result of this invocation is used to enrich the document.

The same document at different times will have different

semantics.

Page 6: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,

Axml Services are Web services that accept AXML documents as input parameters, and return AXML documents as results.

Materialization becomes a recursive process, since calling an AXML service may return some data that may contain new service calls.

After invoking [email protected].

Page 7: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,

The data exchanged by Web services is controlled by schemas for their input and output, specified within a WSDL description. Similarly, schemas are used to control the axml data exchange. DTD-like syntax

The schema distinguishes between accepting a concrete type, e.g., a temperature element, and accepting a service call returning data of this particular type.

The actual syntax in the system is an extension of XML Schema.

Page 8: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,

A site about a city’s night-life (restaurants- movies). Query: /goingOut/movies//show[title= "The Hours"]/schedule.

No point in materializing calls below the path:

/goingOut/restaurants. Avoid materializing a call found below: /goingOut/movies

Solution: ( Naive approach ) Materializing all the calls in the document recursively, until a fixpoint is reached, and finally running the query over the resulting document.

Page 9: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,

Evaluation approach: Lazy evaluationIdentifying in advance a tight superset of theservice calls that should actually be invoked to answer a query.

General Problem: Service calls may appear anywhere in the data, and dynamically in results of previously materialized calls.

Solution: Force sufficient conditions for termination or that the computation halts if a full state is not reached after some time limit.

Page 10: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,

Sample Active Xml Document.

Page 11: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,

A sample query.

Queries are modeled by tree patterns. The relevant functions in the above Axml doc are 1, 3, 4 and 10.

Page 12: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,

Computing the set of relevant service calls: Given a query, generate a set of auxiliary queries that, when evaluated on a document, retrieve all service calls that are relevant to the query.

Advantage : In contrast to the naive approach only functions that may contribute to the query result are invoked.

Disadvantage: There is a tradeoff between accuracy and efficiency. It is expensive to exactly detect which calls are relevant and which are not.

The challenge is thus to find the right balance between the efforts spent on ruling out irrelevant calls and the actual time saved by avoiding their invocation.

Page 13: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,

Pruning via typing: The return types of services are used to rule out more irrelevant service calls.

Pushing queries For instance, getNearbyRestos may return many

restaurants. As we are only interested in five-star ones, and more precisely, only in their names and addresses.

Push to the function call a precise subquery, specifying that it has to apply the five-star rating selection, and only return the relevant names and addresses.

Page 14: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,

Algorithms to find a complete relevant rewriting: Linear path queries (LPQ)1./*()2./nyHotels/*()3./nyHotels/hotel/*()4./nyHotels/hotel/name/*()5./nyHotels/hotel/rating/*()6./nyHotels/hotel/nearby/*()7./nyHotels/hotel/nearby//*()8./nyHotels/hotel/nearby//restaurant/*()9./nyHotels/hotel/nearby//restaurant/name/

*()10./nyHotels/hotel/nearby//restaurant/

address/*()11./nyHotels/hotel/nearby//restaurant/

rating/*()

Page 15: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,

Correct, but usually inaccurate. Ignores filtering conditions in the path from the root or in other branches that could make some of the functions irrelevant (e.g. there is no chance that a getNearbyRestos() function node under a hotel is relevant, if the hotel rating is not “*****”).

Constructing one linear path query per node.

Page 16: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,

Node Focused Queries.Instead of constructing one linear path query per node in the query, it is used an algorithm called NFQ that includes the filtering conditions from the original query.

In Contrast with Linear Path Queries, now the function nodes that are relevant for a query q are precisely the ones retrieved by the NFQs of q.

Page 17: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,

Service calls sequencing: The relationships among the

calls are analyzed to derive an efficient sequence of call invocations appropriate to answer the query. An algorithm based on NFQ called NFQA is

used to compute a (possibly infinite) relevant rewriting. If it terminates, the obtained document is complete for the query q.

Page 18: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,

F-guide: A specialized access structure in the style of data-guides is used to speed up the search for relevant calls.

The structure acts as an index, summarizing concisely the occurrences of functions (service calls) in the documents (hence its name, F-guide).

The F-guide also holds the path extents: for each path we keep pointers to the corresponding function call

nodes in the document.

Page 19: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,

It is adopted a distributed architecture based on the peer-to-peer paradigm to support the AXML language.

Each participant may act both as a client and as a server.

AXML peers have essentially three facets: Repository. Server (may provide Web services for other

peers to use). Client (may invoke the corresponding Web

services that other peers provide).

Page 20: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,

Enforce the following policy: Temperature information is refreshed daily.

Simple constructs in the language support specifying when service calls are invoked. So the language will enable specifying the above policy.

In this situation: Service calls should generally be kept inside

AXML documents, for future reuse. Materialization will not replace service calls by

their results anymore, but will append the results of each call next to it.

Page 21: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,
Page 22: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,

In an AXML peer, AXML services can be defined as parameterized queries or updates over the peer’s AXML documents.

Sample Service

Page 23: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,

The AXML peer is implemented in Java.

The AXML peer relies on the Apache Xerces XML parser to parse documents, and manipulate them. The AXML peer also uses the Apache Xalan processor for XPath queries and XSLT transformations.

The Tomcat servlet engine: the AXML Peer needs to act as a Web server.AXML documents can be turned into a Web application through Java Server Pages.

Axis is a Java toolkit that enables Web services functionality both on the server-side and the client-side. The AXML peer relies on the X-OQL engine to execute complex queries on XML documents.

Page 24: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,

Some of the applications that we developed using AXML peers. Peer-to-peer auctions: The main goal of this

application is to illustrate the flexible discovery mechanism of new peers and auctions.

Electronic patient record management: The goal of this application, is to show that AXML can seamlessly manage distributed data and the privacy of this data. This is done by combining the AXML language with GUPster framework (access control).

Academic and Industrial Collaborations Distribution of Mandriva Linux : Aims at better

management the production and distribution of Open Source software.

Page 25: The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,