polytics: provenance-based analytics of data-centric ...moskovitch1/docs/icde17.pdf · that data...

2
POLYTICS: Provenance-Based Analytics of Data-Centric Applications Pierre Bourhis CNRS, UMR 9189 - CRIStAL Daniel Deutch Tel Aviv University Yuval Moskovitch Tel Aviv University Abstract—We consider in this demonstration the analysis of complex data-intensive applications. We focus on three classes of analytical questions that are important for application owners and users alike: Why was a result obtained? What would be the result if the application logic or database is modified in a particular way? How can one interact with the application to achieve a particular goal? Answering these questions efficiently is a fundamental step towards optimizing the application and its use. Noting that provenance was a key component in answering similar questions in the context of database queries, we have developed POLYTICS, a system that employs novel provenance-based solutions for these analytic questions for data-centric applications. We propose to demonstrate POLYTICS using an online bicycle shop application as an example, letting participants play the role of both analysts and users. Video: https://youtu.be/mOBpUh7luO4 I. I NTRODUCTION Our proposed demonstration focuses on the analysis of complex applications that rely on, and dynamically update, an underlying database in the course of their execution. The complexity of such applications leads to many challenges, faced by both the application owners and users. The owner needs to analyze the application and logs of its executions, so that she can identify bugs and misuses and ultimately optimize the application; the user typically wishes to identify optimal uses of the application. We start by presenting the example used throughout our demonstration. It involves an online bicycle shop allowing users to view bicycles, parts and accessories, and choose products to add to the shopping cart. Upon item selection, the system updates its price according to the availability of discount deals. Before payment, the user can remove products from the shopping cart, and if the order is not empty she can pay and exit. We now present the main scenario for our analysis. Example 1.1: Consider a user who first adds a mountain bike to the shopping cart, and then a cycling helmet. Assuming a discount deal for combined bike and helmet purchase, the helmet price is updated to a discount price. Then, before payment, the user removes the bike from the shopping cart. Due to a bug in the application logic, the helmet price may remain as if the discount applies, although eventually the user has not purchased the mountain bike. Upon viewing the wrong price, multiple questions arise: why was it obtained as such? what would be the price if the owner changed the database / application logic in a particular way? how to interact with the system to obtain a correct price? We model the online shop as a data-centric process whose partial state machine and underlying database are shown in Figures 2 and 1 respectively. Transitions are associated with insertion/deletion/modification queries, in turn captured by (union of) Conjunctive Queries augmented with a +/-/M sign for insertion/deletion/modification. The database includes a relation for each item type, a Deals relation for special offers, a Cart relation storing the products selected by the user, and relations standing for user input (R p , R b , R a and R c ) in different transitions. The formal model appears in [1]. In the context of database queries, a prominent approach (see e.g. [2], [3], [4]) for analyzing answers is based on the tracking of provenance, i.e. a record of the transformations that data undergoes. The idea is to efficiently track the “core” aspects of the transformations that have taken place, and then use it for answering questions such as the above. Such solution is absent in the context of data-centric applications. To address this need, we have implemented POLYTICS, PrOvenance- based anaLYTICS for data-centric applications. POLYTICS leverages the model and algorithms for provenance generation and usage from [1]; we next briefly highlight the role of provenance in answering analytical questions. Example 1.2: Figure 1 depicts a database fragment with provenance annotations next to tuples. Intuitively, the prove- nance of an output tuple (in this example, tuples of the Cart table) describes the relevant actions leading to it; in the case of (Helmet, $25) these are p 1 and p 3 , which are associated in the state machine with insertion of bicycles and accessories (resp.) to the shopping cart. It also includes relevant database tuples, such as the mountain bike (d 1 ) and the helmet (d 3 ), as well as the existence of a deal (d 4 ) and the user choices (u 1 ,u 2 ). Importantly, it also shows the way in which they are combined to form the output (in this case via conjunction). This is highly useful for explaining a result tuple (i.e. answering a “why” question): such explanation would contain only events that have affected the result, along with the relevant data items. In the above example, an explanation to the helmet price would be the insertion of bike and helmet to the shopping cart, and the deal on the helmet; the provenance expression “translates” into such intuitive explanation. Furthermore, provenance is useful for “what-if” analysis, where every hypothetical sce- nario corresponds to a boolean assignment to the provenance annotations. For instance, assigning false to d 1 and true to the other variables corresponds to the scenario where the mountain bike are not available. The shopping cart in this case would contain only the helmet, showing a price of $50 (instead of $25). Last, for “how-to?” queries, a more complex provenance expression is generated, capturing the set of all possible executions rather than a particular one; a SAT solver is then used to find an execution yielding a tuple or sub-instance

Upload: others

Post on 21-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: POLYTICS: Provenance-Based Analytics of Data-Centric ...moskovitch1/docs/ICDE17.pdf · that data undergoes. The idea is to efficiently track the “core” aspects of the transformations

POLYTICS: Provenance-Based Analyticsof Data-Centric Applications

Pierre BourhisCNRS, UMR 9189 - CRIStAL

Daniel DeutchTel Aviv University

Yuval MoskovitchTel Aviv University

Abstract—We consider in this demonstration the analysis ofcomplex data-intensive applications. We focus on three classes ofanalytical questions that are important for application ownersand users alike: Why was a result obtained? What would bethe result if the application logic or database is modified in aparticular way? How can one interact with the application toachieve a particular goal? Answering these questions efficientlyis a fundamental step towards optimizing the applicationand its use. Noting that provenance was a key component inanswering similar questions in the context of database queries,we have developed POLYTICS, a system that employs novelprovenance-based solutions for these analytic questions fordata-centric applications. We propose to demonstrate POLYTICSusing an online bicycle shop application as an example, lettingparticipants play the role of both analysts and users.

Video: https://youtu.be/mOBpUh7luO4

I. INTRODUCTION

Our proposed demonstration focuses on the analysis ofcomplex applications that rely on, and dynamically update,an underlying database in the course of their execution. Thecomplexity of such applications leads to many challenges,faced by both the application owners and users. The ownerneeds to analyze the application and logs of its executions,so that she can identify bugs and misuses and ultimatelyoptimize the application; the user typically wishes to identifyoptimal uses of the application. We start by presenting theexample used throughout our demonstration. It involves anonline bicycle shop allowing users to view bicycles, parts andaccessories, and choose products to add to the shopping cart.Upon item selection, the system updates its price accordingto the availability of discount deals. Before payment, the usercan remove products from the shopping cart, and if the orderis not empty she can pay and exit. We now present the mainscenario for our analysis.

Example 1.1: Consider a user who first adds a mountainbike to the shopping cart, and then a cycling helmet. Assuminga discount deal for combined bike and helmet purchase, thehelmet price is updated to a discount price. Then, beforepayment, the user removes the bike from the shopping cart.Due to a bug in the application logic, the helmet price mayremain as if the discount applies, although eventually the userhas not purchased the mountain bike. Upon viewing the wrongprice, multiple questions arise: why was it obtained as such?what would be the price if the owner changed the database /application logic in a particular way? how to interact with thesystem to obtain a correct price?

We model the online shop as a data-centric process whosepartial state machine and underlying database are shown in

Figures 2 and 1 respectively. Transitions are associated withinsertion/deletion/modification queries, in turn captured by(union of) Conjunctive Queries augmented with a +/−/Msign for insertion/deletion/modification. The database includesa relation for each item type, a Deals relation for specialoffers, a Cart relation storing the products selected by theuser, and relations standing for user input (Rp, Rb, Ra andRc) in different transitions. The formal model appears in [1].

In the context of database queries, a prominent approach(see e.g. [2], [3], [4]) for analyzing answers is based on thetracking of provenance, i.e. a record of the transformationsthat data undergoes. The idea is to efficiently track the “core”aspects of the transformations that have taken place, and thenuse it for answering questions such as the above. Such solutionis absent in the context of data-centric applications. To addressthis need, we have implemented POLYTICS, PrOvenance-based anaLYTICS for data-centric applications. POLYTICSleverages the model and algorithms for provenance generationand usage from [1]; we next briefly highlight the role ofprovenance in answering analytical questions.

Example 1.2: Figure 1 depicts a database fragment withprovenance annotations next to tuples. Intuitively, the prove-nance of an output tuple (in this example, tuples of the Carttable) describes the relevant actions leading to it; in the case of(Helmet, $25) these are p1 and p3, which are associated in thestate machine with insertion of bicycles and accessories (resp.)to the shopping cart. It also includes relevant database tuples,such as the mountain bike (d1) and the helmet (d3), as wellas the existence of a deal (d4) and the user choices (u1, u2).Importantly, it also shows the way in which they are combinedto form the output (in this case via conjunction). This is highlyuseful for explaining a result tuple (i.e. answering a “why”question): such explanation would contain only events thathave affected the result, along with the relevant data items. Inthe above example, an explanation to the helmet price wouldbe the insertion of bike and helmet to the shopping cart, andthe deal on the helmet; the provenance expression “translates”into such intuitive explanation. Furthermore, provenance isuseful for “what-if” analysis, where every hypothetical sce-nario corresponds to a boolean assignment to the provenanceannotations. For instance, assigning false to d1 and trueto the other variables corresponds to the scenario where themountain bike are not available. The shopping cart in thiscase would contain only the helmet, showing a price of $50(instead of $25). Last, for “how-to?” queries, a more complexprovenance expression is generated, capturing the set of allpossible executions rather than a particular one; a SAT solver isthen used to find an execution yielding a tuple or sub-instance

Page 2: POLYTICS: Provenance-Based Analytics of Data-Centric ...moskovitch1/docs/ICDE17.pdf · that data undergoes. The idea is to efficiently track the “core” aspects of the transformations

BicyclesItem Price

Mtn Bike $2000 d1

PartsItem Price

Wire Lock $7 d2

AccessoriesItem Price

Helmet $50 d3

DealsBuy Item Get Item Discount PriceMtn Bike Helmet $25 d4

Rb

ItemMtn Bike u1

Rp

ItemRa

ItemHelmet u2

Rc

ItemMtn Bike u3

CartItem Price

Mtn Bike $2000 (p1 ∧ (u1 ∧ d1)) ∧ ¬(p5 ∧ u3)

Helmet $50 (p3 ∧ (u2 ∧ d2)) ∧ ¬(p3 ∧ (d4 ∧ (p1 ∧ (u1 ∧ d1)) ∧ (p3 ∧ (u2 ∧ d2))))

Helmet $25 p3 ∧ (d4 ∧ (p1 ∧ (u1 ∧ d1)) ∧ (p3 ∧ (u2 ∧ d2)))

Fig. 1: Database with provenance

Homepage

Bicycles

Parts

Access-ories

Cart Payment

Cart+,p1 (i, p):-Bicycles(i, p), Rb(i, p)

CartM,p1 (i1, op, i1, np):-Cart(i1, op), Cart(i2, p), Deals(i2, i1, np)

Cart+,p2 (i, p):-Parts(i, p), Rp(i)

CartM,p2 (i1, op, i1, np):-Cart(i1, op), Cart(i2, p),Deals(i2, i1, np)

Cart+,p3 (i, p):-Acc(i, p), Ra(i)

CartM,p3 (i1, op, i1, np):-Cart(i1, op), Cart(i2, p), Deals(i2, i1, np)

Cart−,p5 (i, p):-Cart(i, p), Rc(i)

Qp6g =H():-Cart(x, y)

Fig. 2: Partial Process Logic

of interest, such as a desired helmet price.

Related Work: The use of data provenance for “why”(e.g. [2]), “what-if” (e.g. [3]) and “how-to” (e.g. [4]) has beenextensively studied, focusing on database queries rather thanon data-centric processes. In [5] we have proposed a “what-if” analysis of data-centric processes; POLYTICS supports asignificantly larger class of applications (specifically, ones thatcan update the underlying database) and of analysis questions(including “why” and “how-to”).

II. SYSTEM OVERVIEW

POLYTICS’s server side is implemented in C#, and clientside in Angular JS using Bootstrap framework. The clientweb application is deployed on Node.js JavaScript runtimeenvironment and runs on Windows 10. The system architectureis depicted in Figure 3. The system requires the application’sowner to provide a description of the application, includingan FSM describing its flow and its database. Each actionin the FSM and each DB relation should further be asso-ciated with a textual description (for presentation purposes).POLYTICS may be used both by system analysts and users.For “what-if” and “why” analysis, users/analysts interact witha dedicated interface; for “how-to” analysis they interact witha wrapper of the original application allowing them to viewrecommendations for navigation. We next explain the mainsystem components.

Provenance engine: The provenance engine consistsof two generators: (1) real-time generator, that tracks theprovenance of executions, and is used for “why” and “what-if” analysis; (2) static generator, that computes, based on theapplication structure, a provenance expression capturing theset of all possible executions, used for “how-to” analysis.

Analysis Interface: Provenance is fed to the analysisengine whose output is demonstrated in Fig. 4 (the “how-to”screen is omitted for lack of space). The “why” interface allowsto choose an output tuple and view a textual representationof its reason, based on the provenance and on the textualdescription provided for each action and DB relation. The“What-if” screen allows to apply hypothetical modificationsand observe, in an interactive speed, their effect on the output.

III. DEMONSTRATION SCENARIO

We will demonstrate the usefulness of POLYTICS in thecontext of a simple online bicycle shop as described above.

Smart+shopActions

Response

How-toqueries

Navigationsequence

User+\Analyst

Static Real+time

Provenance+engine

ActionsDB+FSM Provenance

Online+shopAnalysisinterface

Why+queries

Explanation

What-if+queries

ResultsUser+\

Analyst

Fig. 3: System architecture

Fig. 4: Analysis output

We will first introduce the shop to participants and allowthem to freely interact with it, while we track provenance. Wewill then use the tracked provenance for “why” and “what-if”analysis: we will show explanations computed by POLYTICSfor items in the shopping cart, and will consider hypotheticalmodification to the application logic and data, observing theiranticipated effect on the execution and its artifacts. Further,we will show the usefulness of how-to analysis to users; tothis end, we will let the participant to select products anddesired prices (which we give them, simulating a case where afriend has reported a purchase with a particular price), and letPOLYTICS generate a recommended sequence of navigationactions that would lead to purchase at the desired price. Finally,we will allow the audience to look “under the hood”, showingand explaining the underlying provenance expressions.

ACKNOWLEDGMENTThis research was partially supported by the Israeli Sci-

ence Foundation (ISF, grant No. 1636/13), by the BlavatnikInterdisciplinary Cyber Research Center and by Intel.

REFERENCES

[1] P. Bourhis, D. Deutch, and Y. Moskovitch, “Analyzing data-centricapplications: Why, what-if, and how-to,” in ICDE, 2016.

[2] S. Roy and D. Suciu, “A formal approach to finding explanations fordatabase queries,” in SIGMOD, 2014.

[3] S. Assadi, S. Khanna, Y. Li, and V. Tannen, “Algorithms for provisioningqueries and analytics,” in ICDT, 2016.

[4] A. Meliou, W. Gatterbauer, and D. Suciu, “Reverse data management,”PVLDB, vol. 4, no. 12, 2011.

[5] D. Deutch, Y. Moskovitch, and V. Tannen, “PROPOLIS: provisionedanalysis of data-centric processes,” PVLDB, vol. 6, no. 12, 2013.