polytics: provenance-based analytics of data-centric ...moskovitch1/docs/icde17.pdf · that data...
TRANSCRIPT
POLYTICS: Provenance-Based Analyticsof Data-Centric Applications
Pierre BourhisCNRS, UMR 9189 - CRIStAL
Daniel DeutchTel Aviv University
Yuval MoskovitchTel Aviv University
Abstract—We consider in this demonstration the analysis ofcomplex data-intensive applications. We focus on three classes ofanalytical questions that are important for application ownersand users alike: Why was a result obtained? What would bethe result if the application logic or database is modified in aparticular way? How can one interact with the application toachieve a particular goal? Answering these questions efficientlyis a fundamental step towards optimizing the applicationand its use. Noting that provenance was a key component inanswering similar questions in the context of database queries,we have developed POLYTICS, a system that employs novelprovenance-based solutions for these analytic questions fordata-centric applications. We propose to demonstrate POLYTICSusing an online bicycle shop application as an example, lettingparticipants play the role of both analysts and users.
Video: https://youtu.be/mOBpUh7luO4
I. INTRODUCTION
Our proposed demonstration focuses on the analysis ofcomplex applications that rely on, and dynamically update,an underlying database in the course of their execution. Thecomplexity of such applications leads to many challenges,faced by both the application owners and users. The ownerneeds to analyze the application and logs of its executions,so that she can identify bugs and misuses and ultimatelyoptimize the application; the user typically wishes to identifyoptimal uses of the application. We start by presenting theexample used throughout our demonstration. It involves anonline bicycle shop allowing users to view bicycles, parts andaccessories, and choose products to add to the shopping cart.Upon item selection, the system updates its price accordingto the availability of discount deals. Before payment, the usercan remove products from the shopping cart, and if the orderis not empty she can pay and exit. We now present the mainscenario for our analysis.
Example 1.1: Consider a user who first adds a mountainbike to the shopping cart, and then a cycling helmet. Assuminga discount deal for combined bike and helmet purchase, thehelmet price is updated to a discount price. Then, beforepayment, the user removes the bike from the shopping cart.Due to a bug in the application logic, the helmet price mayremain as if the discount applies, although eventually the userhas not purchased the mountain bike. Upon viewing the wrongprice, multiple questions arise: why was it obtained as such?what would be the price if the owner changed the database /application logic in a particular way? how to interact with thesystem to obtain a correct price?
We model the online shop as a data-centric process whosepartial state machine and underlying database are shown in
Figures 2 and 1 respectively. Transitions are associated withinsertion/deletion/modification queries, in turn captured by(union of) Conjunctive Queries augmented with a +/−/Msign for insertion/deletion/modification. The database includesa relation for each item type, a Deals relation for specialoffers, a Cart relation storing the products selected by theuser, and relations standing for user input (Rp, Rb, Ra andRc) in different transitions. The formal model appears in [1].
In the context of database queries, a prominent approach(see e.g. [2], [3], [4]) for analyzing answers is based on thetracking of provenance, i.e. a record of the transformationsthat data undergoes. The idea is to efficiently track the “core”aspects of the transformations that have taken place, and thenuse it for answering questions such as the above. Such solutionis absent in the context of data-centric applications. To addressthis need, we have implemented POLYTICS, PrOvenance-based anaLYTICS for data-centric applications. POLYTICSleverages the model and algorithms for provenance generationand usage from [1]; we next briefly highlight the role ofprovenance in answering analytical questions.
Example 1.2: Figure 1 depicts a database fragment withprovenance annotations next to tuples. Intuitively, the prove-nance of an output tuple (in this example, tuples of the Carttable) describes the relevant actions leading to it; in the case of(Helmet, $25) these are p1 and p3, which are associated in thestate machine with insertion of bicycles and accessories (resp.)to the shopping cart. It also includes relevant database tuples,such as the mountain bike (d1) and the helmet (d3), as wellas the existence of a deal (d4) and the user choices (u1, u2).Importantly, it also shows the way in which they are combinedto form the output (in this case via conjunction). This is highlyuseful for explaining a result tuple (i.e. answering a “why”question): such explanation would contain only events thathave affected the result, along with the relevant data items. Inthe above example, an explanation to the helmet price wouldbe the insertion of bike and helmet to the shopping cart, andthe deal on the helmet; the provenance expression “translates”into such intuitive explanation. Furthermore, provenance isuseful for “what-if” analysis, where every hypothetical sce-nario corresponds to a boolean assignment to the provenanceannotations. For instance, assigning false to d1 and trueto the other variables corresponds to the scenario where themountain bike are not available. The shopping cart in thiscase would contain only the helmet, showing a price of $50(instead of $25). Last, for “how-to?” queries, a more complexprovenance expression is generated, capturing the set of allpossible executions rather than a particular one; a SAT solver isthen used to find an execution yielding a tuple or sub-instance
BicyclesItem Price
Mtn Bike $2000 d1
PartsItem Price
Wire Lock $7 d2
AccessoriesItem Price
Helmet $50 d3
DealsBuy Item Get Item Discount PriceMtn Bike Helmet $25 d4
Rb
ItemMtn Bike u1
Rp
ItemRa
ItemHelmet u2
Rc
ItemMtn Bike u3
CartItem Price
Mtn Bike $2000 (p1 ∧ (u1 ∧ d1)) ∧ ¬(p5 ∧ u3)
Helmet $50 (p3 ∧ (u2 ∧ d2)) ∧ ¬(p3 ∧ (d4 ∧ (p1 ∧ (u1 ∧ d1)) ∧ (p3 ∧ (u2 ∧ d2))))
Helmet $25 p3 ∧ (d4 ∧ (p1 ∧ (u1 ∧ d1)) ∧ (p3 ∧ (u2 ∧ d2)))
Fig. 1: Database with provenance
Homepage
Bicycles
Parts
Access-ories
Cart Payment
Cart+,p1 (i, p):-Bicycles(i, p), Rb(i, p)
CartM,p1 (i1, op, i1, np):-Cart(i1, op), Cart(i2, p), Deals(i2, i1, np)
Cart+,p2 (i, p):-Parts(i, p), Rp(i)
CartM,p2 (i1, op, i1, np):-Cart(i1, op), Cart(i2, p),Deals(i2, i1, np)
Cart+,p3 (i, p):-Acc(i, p), Ra(i)
CartM,p3 (i1, op, i1, np):-Cart(i1, op), Cart(i2, p), Deals(i2, i1, np)
Cart−,p5 (i, p):-Cart(i, p), Rc(i)
Qp6g =H():-Cart(x, y)
Fig. 2: Partial Process Logic
of interest, such as a desired helmet price.
Related Work: The use of data provenance for “why”(e.g. [2]), “what-if” (e.g. [3]) and “how-to” (e.g. [4]) has beenextensively studied, focusing on database queries rather thanon data-centric processes. In [5] we have proposed a “what-if” analysis of data-centric processes; POLYTICS supports asignificantly larger class of applications (specifically, ones thatcan update the underlying database) and of analysis questions(including “why” and “how-to”).
II. SYSTEM OVERVIEW
POLYTICS’s server side is implemented in C#, and clientside in Angular JS using Bootstrap framework. The clientweb application is deployed on Node.js JavaScript runtimeenvironment and runs on Windows 10. The system architectureis depicted in Figure 3. The system requires the application’sowner to provide a description of the application, includingan FSM describing its flow and its database. Each actionin the FSM and each DB relation should further be asso-ciated with a textual description (for presentation purposes).POLYTICS may be used both by system analysts and users.For “what-if” and “why” analysis, users/analysts interact witha dedicated interface; for “how-to” analysis they interact witha wrapper of the original application allowing them to viewrecommendations for navigation. We next explain the mainsystem components.
Provenance engine: The provenance engine consistsof two generators: (1) real-time generator, that tracks theprovenance of executions, and is used for “why” and “what-if” analysis; (2) static generator, that computes, based on theapplication structure, a provenance expression capturing theset of all possible executions, used for “how-to” analysis.
Analysis Interface: Provenance is fed to the analysisengine whose output is demonstrated in Fig. 4 (the “how-to”screen is omitted for lack of space). The “why” interface allowsto choose an output tuple and view a textual representationof its reason, based on the provenance and on the textualdescription provided for each action and DB relation. The“What-if” screen allows to apply hypothetical modificationsand observe, in an interactive speed, their effect on the output.
III. DEMONSTRATION SCENARIO
We will demonstrate the usefulness of POLYTICS in thecontext of a simple online bicycle shop as described above.
Smart+shopActions
Response
How-toqueries
Navigationsequence
User+\Analyst
Static Real+time
Provenance+engine
ActionsDB+FSM Provenance
Online+shopAnalysisinterface
Why+queries
Explanation
What-if+queries
ResultsUser+\
Analyst
Fig. 3: System architecture
Fig. 4: Analysis output
We will first introduce the shop to participants and allowthem to freely interact with it, while we track provenance. Wewill then use the tracked provenance for “why” and “what-if”analysis: we will show explanations computed by POLYTICSfor items in the shopping cart, and will consider hypotheticalmodification to the application logic and data, observing theiranticipated effect on the execution and its artifacts. Further,we will show the usefulness of how-to analysis to users; tothis end, we will let the participant to select products anddesired prices (which we give them, simulating a case where afriend has reported a purchase with a particular price), and letPOLYTICS generate a recommended sequence of navigationactions that would lead to purchase at the desired price. Finally,we will allow the audience to look “under the hood”, showingand explaining the underlying provenance expressions.
ACKNOWLEDGMENTThis research was partially supported by the Israeli Sci-
ence Foundation (ISF, grant No. 1636/13), by the BlavatnikInterdisciplinary Cyber Research Center and by Intel.
REFERENCES
[1] P. Bourhis, D. Deutch, and Y. Moskovitch, “Analyzing data-centricapplications: Why, what-if, and how-to,” in ICDE, 2016.
[2] S. Roy and D. Suciu, “A formal approach to finding explanations fordatabase queries,” in SIGMOD, 2014.
[3] S. Assadi, S. Khanna, Y. Li, and V. Tannen, “Algorithms for provisioningqueries and analytics,” in ICDT, 2016.
[4] A. Meliou, W. Gatterbauer, and D. Suciu, “Reverse data management,”PVLDB, vol. 4, no. 12, 2011.
[5] D. Deutch, Y. Moskovitch, and V. Tannen, “PROPOLIS: provisionedanalysis of data-centric processes,” PVLDB, vol. 6, no. 12, 2013.