triggers over xml views of relational dataname>crt 15 ... vide an efficient and scalable way to...

26
Triggers over XML Views of Relational Data Feng Shao Antal Novak Jayavel Shanmugasundaram Cornell University {fshao, afn, jai}@cs.cornell.edu Abstract Current systems that publish relational data as XML views are passive in the sense that they can only re- spond to user-initiated queries over the XML views. In this paper, we propose an active system whereby users can place triggers on (unmaterialized) XML views of relational data. In this architecture, we present scal- able and efficient techniques for processing triggers over XML views by leveraging existing support for SQL triggers in commercial relational databases. We have implemented our proposed techniques in the context of the Quark XML middleware system. Our performance results indicate that our proposed techniques are a fea- sible approach to supporting triggers over XML views of relational data. 1 Introduction XML has emerged as a dominant standard for informa- tion exchange on the Internet. A large fraction of data, however, continues to be stored in relational databases. Consequently, there has been a lot of interest in publish- ing relational data as XML. A powerful and flexible way to achieve this goal is to create XML views of relational data [12, 24, 27, 33]. In this way, the data can continue to reside in relational databases, while Internet applications can access the same data in XML format through the XML view. This architecture is shown in Fig. 1. As a concrete example, consider a supplier that stores its product catalog information in a relational database. In order to expose the product catalog as an XML web service to buyers, the sup- plier can create an XML view of the product catalog and expose this view as a web service. Current systems that support XML views of relational data are passive in the sense that they can only support user-initiated queries over the views. For instance, in the Permission to copy without fee all or part of this material is granted pro- vided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005 XML Query Query Result Trigger Figure 1: XML views of relational data. web services example above, current systems only allow buyers to explicitly initiate a request to query the catalog for products of interest. In this paper, we propose an ac- tive system that allows users to specify triggers over XML views. Thus, a buyer can set a trigger to be notified when- ever a new product is introduced, or when a product of in- terest goes out of stock, without having to repeatedly query the XML view to detect these changes. At a high level, there are two approaches to supporting triggers over XML views. The first approach is to material- ize the entire XML view and implement triggers over this view. However, this approach suffers from the overhead of replicating and incrementally maintaining the material- ized XML view on every relational update that affects the view, even though users may only be interested in relatively rare events. Another practical downside of this approach is that it requires a full-function XML database that supports incremental view updates and triggers, even though the un- derlying relational database supports all of this functional- ity and is typically much more optimized for these tasks. In fact, none of the XML (or XML-enhanced relational) databases that we are aware of support triggers over incre- mentally maintained XML views, while virtually all major commercial relational databases have sophisticated support for SQL triggers over relational tables [16, 19, 21]. To address the above issues, we propose an alternative approach of implementing triggers over XML views of re- lational data, which is by translating XML triggers into SQL triggers. The primary benefits of this approach are that: (a) it avoids having to materialize the XML view, (b) it does not require a sophisticated XML data management

Upload: vutruc

Post on 02-Jul-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Triggers over XML Views of Relational Data

Feng Shao Antal Novak Jayavel Shanmugasundaram

Cornell University{fshao, afn, jai}@cs.cornell.edu

Abstract

Current systems that publish relational data as XMLviews are passive in the sense that they can only re-spond to user-initiated queries over the XML views. Inthis paper, we propose an active system whereby userscan place triggers on (unmaterialized) XML views ofrelational data. In this architecture, we present scal-able and efficient techniques for processing triggersover XML views by leveraging existing support for SQLtriggers in commercial relational databases. We haveimplemented our proposed techniques in the context ofthe Quark XML middleware system. Our performanceresults indicate that our proposed techniques are a fea-sible approach to supporting triggers over XML viewsof relational data.

1 IntroductionXML has emerged as a dominant standard for informa-tion exchange on the Internet. A large fraction of data,however, continues to be stored in relational databases.Consequently, there has been a lot of interest in publish-ing relational data as XML. A powerful and flexible wayto achieve this goal is to create XML views of relationaldata [12, 24, 27, 33]. In this way, the data can continue toreside in relational databases, while Internet applicationscan access the same data in XML format through the XMLview. This architecture is shown in Fig. 1. As a concreteexample, consider a supplier that stores its product cataloginformation in a relational database. In order to expose theproduct catalog as an XML web service to buyers, the sup-plier can create an XML view of the product catalog andexpose this view as a web service.

Current systems that support XML views of relationaldata are passive in the sense that they can only supportuser-initiated queries over the views. For instance, in the

Permission to copy without fee all or part of this material is granted pro-vided that the copies are not made or distributed for direct commercialadvantage, the VLDB copyright notice and the title of the publication andits date appear, and notice is given that copying is by permission of theVery Large Data Base Endowment. To copy otherwise, or to republish,requires a fee and/or special permission from the Endowment.

Proceedings of the 31st VLDB Conference,Trondheim, Norway, 2005

XML Query Query Result Trigger

Figure 1: XML views of relational data.

web services example above, current systems only allowbuyers to explicitly initiate a request to query the catalogfor products of interest. In this paper, we propose an ac-tive system that allows users to specify triggers over XMLviews. Thus, a buyer can set a trigger to be notified when-ever a new product is introduced, or when a product of in-terest goes out of stock, without having to repeatedly querythe XML view to detect these changes.

At a high level, there are two approaches to supportingtriggers over XML views. The first approach is to material-ize the entire XML view and implement triggers over thisview. However, this approach suffers from the overheadof replicating and incrementally maintaining the material-ized XML view on every relational update that affects theview, even though users may only be interested in relativelyrare events. Another practical downside of this approach isthat it requires a full-function XML database that supportsincremental view updates and triggers, even though the un-derlying relational database supports all of this functional-ity and is typically much more optimized for these tasks.In fact, none of the XML (or XML-enhanced relational)databases that we are aware of support triggers over incre-mentally maintained XML views, while virtually all majorcommercial relational databases have sophisticated supportfor SQL triggers over relational tables [16, 19, 21].

To address the above issues, we propose an alternativeapproach of implementing triggers over XML views of re-lational data, which is by translating XML triggers intoSQL triggers. The primary benefits of this approach arethat: (a) it avoids having to materialize the XML view, (b)it does not require a sophisticated XML data management

productPID pname mfrP1 CRT 15 SamsungP2 LCD 19 SamsungP3 CRT 15 Viewsonic

vendorVID PID price

Amazon P1 100.00Bestbuy P1 120.00Circuitcity P1 150.00Buy.com P2 200.00Bestbuy P2 180.00Bestbuy P3 120.00Circuitcity P3 140.00

<db><product>

<row><pid>P1</pid><name>CRT 15</name><mfr>Samsung</mfr>

</row>· · ·

</product><vendor>

<row><vid>Amazon </vid><pid>P1</pid><price>100.00</price>

</row>· · ·

</vendor></db>

Figure 2: Example database and its default view.

create view catalog as {<catalog>{for $prodname in distinct(view(''default'')/

product/row/pname)let $products := view(''default'')/product/

row[./pname = $prodname]let $vendors := view(''default'')/vendor/

row[./pid = $products/pid]where count($vendors) >= 2return <product name={$prodname}>

{ for $vendor in $vendorsreturn <vendor>

{$vendor/*}</vendor>}

</product>}</catalog> }

Figure 3: XML view definition.

<catalog><product name=''CRT 15''>

<vendor><pid>P1</pid><vid>Amazon</vid><price>100.00</price>

</vendor><vendor>

<pid>P1</pid><vid>Bestbuy</vid><price>120.00</price>

</vendor>· · ·

</product><product name=''LCD 19''>· · ·

</catalog>

Figure 4: Catalog view.

system, and (c) it works with and fully leverages existingrelational technology.

The main technical contribution of this paper is a sys-tematic way to translate triggers over XML views of re-lational data into SQL triggers. This translation is fairlychallenging because XML triggers can be specified overcomplex nested XML views with nested predicates, whileSQL triggers can only be specified over flat relational ta-bles. Consequently, even identifying the parts of an XMLview that could have changed due to a (possibly deeplynested) SQL update is a non-trivial task, as is the problemof computing the old and new values of an updated frag-ment of the view. Another issue is that current commercialrelational databases are not very scalable with respect to thenumber of SQL triggers, while we expect a large numberof XML triggers to be specified over XML views exposedas web services.

In this paper, we address the above challenges. Specif-ically, our two main contributions are: (1) a system archi-tecture for supporting triggers over XML views of rela-tional data (Section 3), and (2) an algorithm for identifyingand computing changes in an XML view based on possiblydeeply nested relational updates (Section 4). We also showhow prior work on scalable trigger processing [6, 15] canbe adapted for the XML view problem, and investigate theeffects of using incrementally maintained relational mate-rialized views for evaluating XML triggers (Section 5).

We have implemented our proposed techniques in thecontext of the Quark XML middleware system. Oneof the original goals of Quark (like SilkRoute [12] andXPERANTO [27]) was to support queries over XML viewsof relational data. By integrating with Quark, we were ableto leverage many of the techniques originally developed forquerying XML views, and adapt them to the trigger prob-lem. This suggests that our techniques can be easily inte-grated into systems that already support queries over XMLviews of relational data (including relational databases withbuilt-in XML publishing support). Our performance resultsusing our prototype show that our proposed techniques pro-vide an efficient and scalable way to support triggers overXML views of relational data.

While our focus is on triggers over XML views, our

techniques also apply to the less general problem of trig-gers over (flat) relational views. We note that current re-lational systems only support INSTEAD OF triggers [28]over unmaterialized views. Using INSTEAD OF triggers,users can manually specify how updates on a view are to betranslated into updates on base tables. In contrast, we aresolving the problem of automatically inferring when up-dates on base tables cause triggers on a view to be fired.We are not aware of any published work or systems thatsupport such SQL triggers over unmaterialized views.

2 BackgroundWe have developed our trigger processing techniques inthe context of the Quark system, which is similar toXPERANTO [27] in its support for querying XML views.We thus present an overview of XPERANTO, and also pro-vide some background on XML and SQL triggers. We notethat although our techniques are implemented in Quark,they are applicable to any XML publishing system.

2.1 XPERANTO OverviewIn order to publish relational data as XML, XPERANTOfirst automatically creates a default view of the the rela-tional data. The default view, which is not materialized, isa simple mapping from relational tables to XML elements.Users can create their own application-specific views byspecifying the transformation from the default view usingXQuery. As an example, consider a relational database andits default view shown in Fig. 2 (the database contains prod-ucts and vendors for each product; primary keys are capi-talized). Now suppose this database is exposed as a (vir-tual) XML view in which vendors are nested under prod-ucts, with the restriction that only products sold by at leasttwo vendors appear in the view. The corresponding XQueryview definition is shown in Fig. 3; the result of materializ-ing this view is shown in Fig. 4.

While there are many details about query processingin XPERANTO that are not relevant here, one impor-tant relevant aspect is XQGM (the XML Query GraphModel). XQGM is used to represent and manipulateXQuery queries and views. XQGM consists of a set ofoperators (shown in Table 1) and functions. Each operator

table: product$pid $pname $mfr

table: vendor$vid $pid

1 2

project: $vendor = <vendor> <vid>$vid</..> <pid>$pid</..> <price>$price</…> </vendor>

$vendor$pname

4

groupby $pname $vdrs = aggXMLFrag($vendor)

$count = count($vendor)

$pname

5

select: $count ≥ 26

project: $product = <product name=”$pname”> $vdrs

</product>

$product

groupby:$prds = aggXMLFrag($product)

$prds

8

7

project: $catalog = <catalog> $prds

</catalog>

$catalog

9

$vdrs$pname

3 join: 1.$pid = 2.$pid$pname $vid

$price

$price$pid

$vdrs $count

A

B

Figure 5: XQGM for the catalog view

produces a set of output tuples whose columns are XMLnodes/values. Various functions can be embedded in oper-ators to manipulate XML nodes.

As an illustration, the XQGM graph for the view defi-nition from Fig. 3 is shown in Fig. 5. Operators (boxes) 1and 2 produce the tuples in the product and vendor tables,respectively. Box 3 joins each vendor with the product itsells, and box 4 constructs a <vendor> element for eachof these tuples. Box 5 then groups the elements by productname: the aggXMLFrag() function groups all the vendorelements in a group into a sequence, while the count func-tion counts the number of vendors per group. Box 6 selectsonly the tuples with count ≥ 2. Finally, boxes 7-9 create a<product> element for each product, group these into asingle sequence, and produce a <catalog> element con-taining this sequence.

2.2 XML Trigger Specification LanguageWe use a subset of the trigger specification language pro-posed by Bonifati et al. [3], with the syntax:

CREATE TRIGGER Name AFTER EventON Path WHERE Condition DO Action

A trigger has a unique Name. The Event specifies theoperation that activates the trigger, and can be either UP-

Operator DescriptionTable Represents a relational tableProject Computes results based on its inputSelect Restricts its inputJoin Joins two or more inputsGroupby Applies aggregate functions and groupingUnion Unions inputs and removes duplicatesUnnest Applies super-scalar functions to input

Table 1: XQGM operators.

DATE, INSERT, or DELETE. Path is an XPath expressionspecifying the portion of the XML view to be monitoredfor the event. Condition is a Boolean XQuery expressionwhich determines whether the trigger is to be fired. Whenthe condition is satisfied, an Action is performed; in oursystem, the action is a call to an external function whichtakes in XQuery expressions as parameters. Finally, twovariables, OLD NODE and NEW NODE, are bound to the valueof the node specified by Path before and after the Event;they may be referenced in the Condition and the Action.(When the Event is INSERT or DELETE, only the NEW NODEor OLD NODE, respectively, can be used.)

An example trigger over the view in Fig. 3 is shown be-low. On any update to a product whose name was “CRT 15”(before the update), the trigger invokes an external func-tion notifySmith() with the new value of that prod-uct. Note that the trigger will be fired not only for directupdates to a <product> element, but also for updates toits descendant nodes (i.e. vendors selling that product).

CREATE TRIGGER Notify AFTER UpdateON view('catalog')/productWHERE OLD_NODE/@name = 'CRT 15'DO notifySmith(NEW_NODE)

2.3 SQL Triggers

In contrast to XML triggers, which are specified on XMLnodes, SQL triggers [7, 8] are fired when an event (IN-SERT, UPDATE, or DELETE) occurs on a specific relationaltable. When an SQL trigger is activated, it has access to thepre- and post-update versions of the affected rows throughtransition tables. We use the notation �table to denote thetransition table that contains the updated rows before an up-date, and �table to denote the transition table that containsthe updated rows after an update (�table is empty for IN-SERT triggers, and �table is empty for DELETE triggers).For example, if product P1 goes on sale at Amazon, thenthe transition tables might look like:

�vendor

vid pid priceAmazon P1 100.00

�vendor

vid pid priceAmazon P1 75.00

3 Semantics & System ArchitectureWe now formalize the semantics of triggers on views, andthen present our system architecture.

3.1 Semantics of Triggers on XML Views

In order to define the semantics of triggers on views, weneed a precise definition of when an XML element in aview is said to be updated, inserted, or deleted. This inturn requires us to define the identity of an element in theview (so that we can talk about which specific element isupdated, inserted or deleted). Note that the issue of iden-tity is not as problematic for triggers over native XML databecause each physical XML element has a well-defined no-tion of identity based on the XML data model. In contrast,XML elements in views are virtual and do not have a stan-dard notion of identity. We now present an intuitive defini-tion of the identity of XML elements based on the semanticstructure of a view (in terms of the view’s XQGM graph).The main idea is to use the notion of keys of XQGM oper-ators to define the identity of nodes.

Definition 1 (Keys of XQGM Operators). Given an op-erator o in XQGM graph G, a key of o is a minimal set of(existing or derivable) columns of o whose values uniquelyidentify each output tuple produced by o.

As an illustration, a key of the table operator in box 1 inFig. 3 is the pid column (which is the product table’s rela-tional primary key). A key of the project operator in box 7is the column containing the $pname values (since the op-erator produces an output for each unique $pname). Notethat this key column is not directly present in the projectoperator but can be derived from its input operator.

In general, an operator can have more than one key. Forinstance, a relational table can have one primary key col-umn, and a unique constraint on a different column. In suchcases, we pick one of the keys to be the canonical key. Fora table operator, we use its primary key. The canonical keysfor the other XQGM operators can be defined in terms ofthe canonical keys of its input operators Appendix A. Fora tuple t produced by an operator o, we use the notationv(t) to denote the value of (all columns of) t. We denoteby ckvo(t) the value of the canonical key columns of o fortuple t. We denote the top operator of an XQGM graph G,which produces the final result of the graph, as oG.

In order to define updates, inserts, and deletes on a view,we first formalize the notation for a database transition,which is the result of UPDATEs, INSERTs, and/or DELETEson relational tables. We do so in terms of the databasestate, where the database is in a state D before the transi-tion, and a different state D ′ after the transition; we writethe transition itself as D

∗→ D′. When considering theeffect of UPDATEs, INSERTs, and/or DELETEs to a singletable T (as is the case when a SQL trigger on T is fired), wedenote the transition as D

T→ D′. The result of evaluatingoperator o in state D is written R(o, D).

We now define updates, inserts and deletes on views.

Definition 2 (View Trigger Updates). A tuple t is saidto be updated in view G by relational transition D

∗→ D′

Trigger Parser

Event Pushdown

Affected-Node Graph Generator

XQuery Trigger

Event/Path XQGM

Path XQGM & Relevant Relational

Update Events

Action/Condition XQGM

Affected-node XQGM

XML Fragments

Trigger Pushdown

RDBMS

SQL Triggers Tuples (When trigger fires)

Trigger GroupingAffected-node

XQGM Tagger

TriggerActivation

Action

Figure 6: System architecture.

iff t ∈ R(oG, D), and ∃t′(t′ ∈ R(oG, D′) ∧ ckvoG(t) =ckvoG(t′) ∧ v(t) �= v(t′)).

Definition 3 (View Trigger Inserts (Deletes)). A tuplet is said to be inserted (deleted) in view G by relationaltransition D

∗→ D′ (D′ ∗→ D) iff t ∈ R(oG, D′) and¬∃t′(t′ ∈ R(oG, D) ∧ ckvoG(t) = ckvoG(t′)).

Given the above definition of events, we use the seman-tics of XML triggers specified by Bonifati et al. [3]. Notethat our events are well-defined only for operators with(canonical) keys. We thus need to define a class of viewsfor which triggers are well-defined.

Definition 4 (Trigger-Specifiable Views). A view withXQGM graph G is trigger-specifiable iff every operator inG has a (canonical) key.

We require every operator (not just the top operator) inthe view to have a canonical key because the user can spec-ify a trigger on a nested element (and not just a top levelelement). In Appendix B we have proven that any view istrigger-specifiable if all its Table operators have (canoni-cal) keys. Thus, arbitrarily complex views can have trig-gers specified on them, so long as the underlying relationaltables have primary keys (which is the common case). Wecan also relax this restriction for a certain class of viewsand triggers but we do not describe these relaxations here.

3.2 System ArchitectureOur system architecture is shown in Fig. 6. Users can cre-ate triggers (using the syntax in Section 2.2) on trigger-specifiable views. The Path, Condition and Action of thetrigger are converted into their respective XQGM graphs(recall that Path, Condition and Action are all XPathor XQuery expressions, and hence can be converted toXQGM). The trigger Event and the Path graph are then an-alyzed by the Event Pushdown module to determine the

minimal set of base relations on which inserts, updates,or deletes could cause the trigger to be fired. For eachof these tables, the Affected-Node Graph Generator con-structs an XQGM graph which, when evaluated, producesthe OLD NODE and NEW NODE values for each affected XMLnode. This graph is fed into the Trigger Grouping module,which groups similar triggers together for improved scal-ability. The Trigger Pushdown module takes the groupedtrigger graph, pushes down selection conditions, and pro-duces a set of SQL triggers, one for each relational event.

When activated, an SQL trigger issues a single SQLquery to retrieve the relational data required for the actionsof the XML triggers. A constant-space Tagger [27] thenconverts these results to XML. Finally, the Trigger Acti-vation module activates the appropriate XML triggers andpasses in the XML results as parameters to their actions.

In our implementation, we support a powerful subsetof XQuery. Specifically, we support arbitrarily complexnested views with FLWOR expressions, quantified expres-sions, XPath expressions with child/descendant axes, arith-metic operators, comparison operators, and element con-structors. We do not support XQuery type expressions orsibling / parent / ancestor XPath axes. For XML triggers,the Path, Condition and parameters to the Action can alsobe arbitrarily complex XQuery expressions with the samerestrictions as for XML views. The complete grammar ofsupported expressions is specified in Appendix D. We notethat our restrictions on XQuery expressions are an artifactof our current implementation and not an inherent limita-tion of the architecture.

Finally, a limitation of our current implementation isthat XML trigger(s) are fired for each SQL INSERT, UP-DATE, or DELETE statement, rather than for each SQLtransaction (which could contain more than one statement).This is not a limitation of our approach itself, but due to thefact that most commercial databases do not support SQLtriggers at the transaction level; they only support SQL trig-gers at the granularity of a statement within a transaction.However, we note that our approach is general enough tosupport transaction level XML triggers if the underlying re-lational databases exposes transaction level SQL triggers.

3.3 Trigger Parsing and Event PushdownThe first step in our architecture is to convert the trig-ger Path, Condition and Action XQuery expressions intoXQGM; this is done in a manner similar to convertingXQuery views to XQGM (see Section 2.1). In addition, weapply view composition rules [27] on the Path expression.This has the effect of removing Unnest operators (sincethe base data is relational and has no inherent nesting), andidentifies the specific part of the view to be monitored bythe trigger. For example, the trigger in Section 2.2 mon-itors the path view(‘catalog’)/product. On composing thispath with the catalog view, it produces the graph in Fig. 5A.Note that this graph only produces products and not the en-tire catalog (since the trigger only monitors products).

$new_node $old_node

Gaffectedproduces

affected nodes

param1 paramn

Gactionevaluates parameters to

external function

$new_node $old_node

Gcondevalutes triggerCondition

Figure 7: Gparams: Producing parameters to Action.

The next step is to determine which events on whichrelational tables can cause the event specified in the XMLtrigger. This is similar to the problem of identifying eventson the base tables that can affect materialized views [5]and violate constraints [4]. We adopt a similar approach toidentifying relevant events on base tables and present thedetails in Appendix C. In our example, we are interestedin UPDATE on the result of Box 7 in Fig. 5A; this can becaused either by an INSERT, UPDATE or DELETE on theproduct table or the vendor table.

4 Affected-Node Graph GenerationThe goal of the Affected-Node Graph Generator (seeFig. 6) is to produce XQGM graphs that compute the inputparameters for the trigger action. Specifically, the mod-ule takes as input the XQGM graphs for the Path, Condi-tion, and parameters for the Action, and also the set of re-lational table-event pairs identified by the Event Pushdownmodule. For each of these table-event pairs, it produces anXQGM graph which computes the transformation from therelational transition tables to the parameters for the triggeraction.

Our high-level approach is to produce a single XQGMgraph, Gparams, which consists of three parts as shown inFig. 7. First, Gaffected produces an (OLD NODE, NEW NODE)tuple for each affected node of the view. Next, G cond, theXQGM graph corresponding to the Condition, filters outany tuples that do not satisfy the condition. Finally, Gaction

computes the XQuery expressions given as parameters tothe Action. The main technical contribution of this sectionis an algorithm to produce Gaffected.

4.1 Technical Challenges in Producing Gaffected

On the surface, the problem of producing G affected may ap-pear similar to the incremental view maintenance problem(whose goal is to compute changes to a materialized viewbased on updates to the base data). However, there are threenew challenges that arise in our context, which require thedevelopment of new techniques.

First, as mentioned in the introduction, one of our de-sign goals is to not materialize the XML view. We avoidmaterialization because (a) it would require a sophisti-cated XML database that can support incremental view

updates and XML triggers, and (b) it would require theview to be updated for every relevant relational update eventhough user triggers may have very selective predicates 1.In contrast, most incremental view maintenance algorithms(e.g., [1, 2, 5, 10, 11, 13, 17, 18, 25]) assume that the viewis materialized, and use the materialized old value of a dataitem to compute its new value. We thus need to devise tech-niques that can selectively compute the relevant new valuesfrom the base data.

Second, in producing Gaffected, we need to computenew and old values after an update. In contrast, only thenew value needs to be computed for materialized views.Thus, even materialized view techniques that can selec-tively compute new values without using materialized oldvalues (e.g., [22, 23]) are not applicable because they can-not compute (old value, new value) pairs. This problem isespecially acute for INSERT/DELETE events because theyintroduce specific restrictions on whether the old/new val-ues can appear in the view before/after an update (Defini-tion 3).

Finally, the third (and perhaps most important) chal-lenge arises due to nested predicates in XQuery. For in-stance, in Fig. 5, we have multiple group-by (nesting) op-erators along with a selection predicate on a group-by ag-gregate value. While prior work on view maintenancefor object-oriented [13, 18, 25], nested relational [17] andsemi-structured [1, 2, 10, 11] databases supports nesting,these do not work with nested predicates. To understandwhy, consider the following example where a transactioninserts a row into the vendor table. The transition table is:

�vendorvid pid price

Amazon P2 500.00

Intuitively, for the XML view in Fig. 5A, the above in-sert corresponds to an update of the “LCD 19” XML prod-uct (since a new vendor is added to this product). However,it turns out that the change computation technique (also re-ferred to as the propagate phase [22]) commonly used forview maintenance will not detect this update. Specifically,most view maintenance algorithms compute changes to aview by replacing an updated table in the view definitionwith its corresponding transition table. In our example, thiscorresponds to replacing the vendor table in Fig. 5A withthe �vendor table, and evaluating the resulting query tocompute the changes to the view. However, since �vendorhas a single row, boxes 2, 3 and 4 will each produce a sin-gle row and the selection in box 6 will return no rows since$count = 1. Hence, no changes will be detected!

As the reader has probably observed, the above prob-lem arises because we are trying to compute changes fornested predicate views using only tuples from the transi-

1Note that if we chose to materialize the view, all items in the view(even those that do not satisfy any trigger selection predicate) would haveto be incrementally maintained, because any item could become the oldvalue of an updated item that does satisfy a trigger predicate.

tion table. This results in inaccurate aggregate values andhence misses some relevant updates (it can also introducespurious updates in other cases). We thus need to devisetechniques for correctly computing changes for views withnested predicates. We note that [1] does present a techniquefor computing changes to views with existential predicatesand a single level of nesting (existential predicates can beseen as a very specific form of a select over an aggregation),but we are not aware of any prior technique that can handlecomplex query predicates at arbitrary levels of nesting.

4.2 Proposed AlgorithmWe now present our algorithm for producing G affected. Thealgorithm first detects the keys of the XML nodes affectedby an update (affected keys) and then use the affected keysto compute the actual node values. Our main contribu-tions are (a) a technique for correctly determining affectedkeys even when the view has arbitrary nested predicates,and (b) a technique for using the affected keys to generate(OLD NODE, NEW NODE) pairs that satisfy the definition oftrigger events, without using any materialized data.

In what follows, we use the following notation. G is thePath XQGM graph; T is the post-update version of the ta-ble in question (recall that this algorithm is invoked oncefor each table-event pair); Told is the pre-update version ofthis table; Gold is a graph identical to G with the sole excep-tion that T is replaced by Told. While most DBMSes do notexpose the Told table directly, it can easily be constructedusing a query of the form [8]:

(SELECT * FROM T) EXCEPT (SELECT * FROM �T)

UNION (SELECT * FROM �T)

4.2.1 CreateAKGraph: Finding Affected Keys.In Fig. 8, we present our algorithm for determining the

affected keys.2 The algorithm takes as input an operator O(the top operator in the Path graph), a base table T , and atransition table dT (which is either �T or �T ). It returnsa new operator, O′, the top operator of an XQGM graphwhich has the property that O �� O ′ will produce exactlythe subset of O’s output tuples which are affected by therelational update captured by dT .

In order to determine the keys of G affected by �T (or�T ), we traverse G (or Gold, respectively) in depth-first or-der, building up a parallel graph G�key (or G�key , respec-tively). At each step, we maintain the following invariant:for each operator o in G and the corresponding operator o�in G�key , joining o and o� on the key of o will produceexactly those tuples from the result of o that were affectedby �T . Thus, if o is the top operator of G, then the corre-sponding o� operator provides a way to identify the nodesin the result of G affected by the relational update.

We now walk through the algorithm using the Pathgraph in Fig. 5A, for the case of an UPDATE on ven-dor (the other cases are similar). At the leaf level, a

2Note that this algorithm does not explicitly consider Unnest opera-tors, which are eliminated by view composition (see Sec. 3.3 and [27]).

1: CreateAKGraph (O, T, dT ) : (Operator, Key){O is an operator; T and dT are table names.}

2: I ← input operators to O3: if O.type = Table then4: if O.tableName = T then5: PK ← primary key of table T6: (O′, K) ← (Project(PK)(Table(dT )), PK)

7: else (O′, K) ← (NIL, NIL)8: else if O.type = GroupBy then9: I′ ← CreateAKGraph(I, T, dT )

10: if I′ = NIL then (O′, K) ← (NIL, NIL)11: else12: J ← Join(key(I′))(I, I′)13: K ← grouping columns of O14: O′ ← new GroupBy on J with grouping cols K15: end if16: else if O.type = Select or O.type = Project then17: (O′, K) ← CreateAKGraph(I, T, dT )18: else if O.type = Join then19: {Assuming, without loss of generality, that |I| = 2.}20: for all i ∈ I do21: (i′, k) ← CreateAKGraph(i, T, dT )22: if i′ �= NIL then (I′, K) ← (I′ ∪ {i′}, K ∪ {k})23: end for24: if |I′| = 0 then O′ ← NIL

25: else if |I′| = 1 then O′ ← I′

26: else27: {Create a union of cross-products}28: Ja ← Project(K)(Join(I′

0, I1))

29: Jb ← Project(K)(Join(I0, I′1))

30: O′ ← Union(Ja, Jb)31: end if32: else if O.type = Union then33: {Assuming, without loss of generality, that |I| = 2.}34: for all i ∈ I do35: (i′, k) ← CreateAKGraph(i, T, dT )36: if i′ �= NIL then37: I′ ← I′ ∪ i′ ; Ki ← k38: end if39: end for40: {Construct the key, as described in Appendix A}41: K ← S

i∈I

Sc∈Ki

M(c)

42: {Create union of all operators in the set I′}43: O′ ← Union(I′)44: end if45:46: {Ensure that O’s key is propagated}47: Add K to O.outputColumns ; return (O′, K)

Figure 8: Algorithm for producing affected keys.

Table(�vendor) operator will first be created. Clearlythe invariant holds at this point: joining Table(�vendor)with Table(vendor) on the $vid column (the key ofTable(vendor)) would produce exactly the vendor tuplesthat changed. The result after this step is shown in Fig. 9.

Box 3 (the Join operator) merely propagates the$vid column without creating any new operators inG�key . The corresponding operator in G�key remainsTable(�vendor), and the invariant still holds: joiningbox 3 with �vendor on $vid would produce exactly thoseproduct-vendor pairs affected by �vendor. For box 4, wecan similarly just propagate the $vid column without hav-ing to create any new operators in G�key .

We then arrive at box 5, a GroupBy operator. Sincea GroupBy operator aggregates multiple input values, anyupdate to any one the input values in a group can change theaggregate result for that group. We therefore need a wayto create an operator o� in G�key that only produces thekeys of those groups affected by the update. First, we jointhe operator below the current GroupBy (box 4) with its

key

Original Path graph, G

Path: view(‘catalog’)/product

table: product$pid $pname

table: vendor

$vid $pid

2

project: $vendor = <vendor> ... </vendor>

$vendor$pname

4

groupby $pname $vdrs = aggXMLFrag($vendor)

$count = count($vendor)

$pname

5

select: $count ≥ 26

project: $product = <product name=”$pname”> ... </product>

$product

7

$vdrs$pname

3 join: 1.$pid = 2.$pid$pname $vid

$price

$price$pid

$vdrs $count

1

$vid

$pname

G∆key

1∆ table: ∆vendor$vid $pid $price

Figure 9: CreateAKGraph: Step 1.

G∆keyOriginal Path graph, G

Path: view(‘catalog’)/product

project: $vendor = <vendor> ... </vendor>

$vendor$pname

4

groupby $pname $vdrs = aggXMLFrag($vendor)

$count = count($vendor)

$pname

5

$vdrs $count

$vid

groupby $pname

$pname

3∆

1∆ table: ∆vendor$vid $pid $price

$pname

2∆join: 4.$vid = 1∆.$vid

key

Figure 10: CreateAKGraph: GroupBy operator.

corresponding operator in G�key (1�). By the algorithminvariant, we can infer that this new join produces exactlythe set of input values to the GroupBy operator that havechanged. Thus, to identify the keys of all affected groups,we simply need to project distinct values of the groupingcolumns, which we achieve by creating a new GroupByoperator (3�) which groups on this column ($pname). TheG�key graph at this point is shown in Fig. 10.

The final two operators are Select and Project oper-ators and, like boxes 3 and 4, merely propagate the keycolumn ($pname); no new operators are created in G�key .The final graph is shown in Fig. 11. Note that the only dif-ference from Fig. 10 is that box 7 now propagates $pname.

The application of GetAKGraph is virtually identicalfor producing G�key , so we don’t walk through it here; theonly difference is that vendorold and �vendor are used

G∆keyOriginal Path graph, G

Path: view(‘catalog’)/product

table: product$pid $pname

table: vendor

$vid $pid

2

project: $vendor = <vendor> ... </vendor>

$vendor$pname

4

groupby $pname $vdrs = aggXMLFrag($vendor)

$count = count($vendor)

$pname

5

select: $count ≥ 26

project: $product = <product name=”$pname”> ... </product>

$product

7

$vdrs$pname

3 join: 1.$pid = 2.$pid$pname $vid

$price

$price$pid

$vdrs $count

1

$vid

$pname

groupby $pname

$pname

3∆

1∆ table: ∆vendor$vid $pid $price

$pname

2∆join: 4.$vid = 1∆.$vid

Figure 11: The complete G�key graph.

1: CreateANGraph(E : Event, G : XQGMGraph, T : String) :

{Build up the affected-key graph for �T , and project out just thekey column(s)}

2: O�key ← CreateAKGraph(oG, T, �T )3: G�key ← Project(O�key.key)(O�key)

{Do the same for �T}4: O�key ← CreateAKGraph(oGold

, Told,�T )

5: G�key ← Project(O�key.key)(O�key)

{Take the union of the affected keys}6: Ou ← Union(G�key, G�key)

{And join it back with G/Gold to produce NEW NODE/OLD NODE}7: Onew ← Join(Ou.key=G.key)(Ou, G)

8: Oold ← Join(Ou.key=Gold.key)(Ou, Gold)

{Finally, the way we produce Gaffected depends on the type of event}9: if E = UPDATE then9: {Inner join; we want those nodes which are present in both

OLD NODE and NEW NODE}10: Gaffected ← Join(key)(Onew, Oold)

11: If required, Gaffected ← Select(OLD NODE �=NEW NODE)(Gaffected)

12: else if E = INSERT then12: {Here we only want those nodes which are not present in

OLD NODE}13: Gaffected ← LeftAntiJoin(key)(Onew, Oold)

14: else if E = DELETE then15: Gaffected ← RightAntiJoin(key)(Onew, Oold)

16: end if

Figure 12: Algorithm for producing Gaffected.

instead of vendor and �vendor, respectively.

4.2.2 CreateANGraph: Producing Affected Nodes.At this stage, for a given relational table-event pair,

we have the Path graphs (G and Gold) and the affected-key graphs (G�key and G�key). Our next goal is to pro-duce the Gaffected graph, which generates the (OLD NODE,NEW NODE) pairs corresponding to the relational update.The algorithm for producing Gaffected is given in Fig. 12.

notifySmith($param1)

select: $pname = “CRT 15”

$new_node

14

$old_node

project:$param1 = $new_node

$param1

G Gold

G key

join: 12new.$pname = 12old.$pname13

$old_node$new_node

11Union

$pname

Gaffected

Gcond

join: 11.$pname = 8old.$pname12old

$product $pname

12new

join: 11.$pname = 8.$pname

$product $pname

G key

Gaction15

$pname

Figure 13: The final Gparams graph.

We begin by creating a Union operator merging the G�key

and G�key subgraphs. This captures the notion that weneed to compute the (OLD NODE, NEW NODE) pairs for nodesaffected by either insertions or deletions at the relationallevel. Next, the result of the union is joined with G toget NEW NODE, and with Gold to get OLD NODE. Finally,OLD NODE and NEW NODE are joined on the key. The typeof this join depends on the XML trigger Event: an UP-DATE has both OLD NODE and NEW NODE (hence an in-ner join), while an INSERT (DELETE) has only NEW NODE(OLD NODE), hence a left (right) anti join. In our exam-ple, we perform an inner join since we are monitoring UP-DATEs. Fig. 13 shows our final Gaffected graph.

In some special cases of UPDATEs (not relevant to ourexample), we need to explicitly check whether the values ofOLD NODE and NEW NODE are different; we refer the readerto Appendix F for more details of these special cases andrelated optimizations. We prove the correctness of Create-ANGraph in Appendix E.

4.3 Adding Condition and Action

Finally, as described in the beginning of this section, weneed to produce Gparams, the graph that produces param-eters for the Action after selecting only the (OLD NODE,NEW NODE) pairs that satisfy the trigger condition (recallFig. 7). Gparams is produced by converting the XQueryexpressions for Condition and parameters of Action intoXQGM (to produce Gcond and Gaction, respectively), and

select: $name = “Samsung”

$new_node $old_node

......

join: $name = $Const1

$new_node $old_node

...

table: Constants$TriggerID $Const1 $Constn...

$TriggerID

Figure 14: Converting select to join.

param1 paramn

Gparams

join$param1$TriggerID $paramn

table: Constants$TriggerID $Const1 $Constn

Figure 15: Correlated Ggrouped graph.

stacking these graphs on top of Gaffected. Fig. 13 showsGparams for our running example.

5 Trigger Grouping and PushdownGiven Gparams for each relevant table-event pair, the finaltwo steps in generating SQL triggers are Trigger Groupingand Trigger Pushdown (Fig. 6). We describe each in turn.

5.1 Trigger GroupingA simple approach to producing SQL triggers is to createone SQL trigger for each Gparams graph of an XML trigger.This approach is not likely to be very efficient, however, be-cause the number of SQL triggers produced will be at leastas many as the number of XML triggers (which we expectto be large), and current relational databases are not veryscalable with respect to the number of SQL triggers. Wetherefore explore techniques for grouping structurally sim-ilar Gparams graphs together, and producing a single SQLtrigger for each group. We note that our focus is not on de-veloping new techniques for grouping triggers; rather, ourfocus in on adapting existing techniques [6, 15] to workwith nested XML views.

For the purposes of this paper, we only consider group-ing Gparams graphs that differ in the constant value(s) ofa selection condition (this corresponds to grouping struc-turally similar XML triggers that only differ in selectionconstant(s) in the WHERE clause). For instance, we wouldconsider grouping the Gparams graph in Fig. 13 with an-other graph that has a different selection condition in box14 (which, say, selects “LCD 19” instead). The proposedapproach can also be extended for grouping joins [6], butwe do not discuss this extension here.

The first step is to create a constants table [15] for eachgroup of structurally similar Gparams graphs. The constantstable has a TrigIDs column, which identifies the XML trig-gers which share a particular set of constants, followed byas many columns as there are constants in the triggers. Forinstance, if in our example, XML triggers 1 and 2 both

share the value CRT 15, while trigger 3 uses LCD 19,the constants table would look like:

TrigIDs Const11,2 CRT 153 LCD 19

Given the constants table, the standard grouping tech-nique [6, 15] is to directly convert the selection conditionwith constant(s) into a join with the constants table, asshown in Fig. 14 for our example. In this way, multipleindividual selections are converted into a single join, andare hence more efficient.

However, this direct replacement of a select with ajoin does not work for complex nested XML conditions.To see why, suppose the WHERE condition in our ex-ample is modified to be of the form: count(NEW NODE/vendor[./price < x]) ≥ y (i.e., the new node contains atleast y vendors who sell an item for less than x; here x andy are different constants for different XML triggers). Inthis case, the condition contains a selection (./price < x)nested under a group-by (count) nested under another se-lection (≥ y). Simply replacing a selection (such as./price < x) with a corresponding join would be incorrectbecause this would change the output cardinality of the op-erator (due to the join with the constants table, where mul-tiple triggers could be fired); this would in turn change thegroup-by (count) result, thereby producing wrong results.

To address this issue, we propose a simple yet powerfulapproach that works for arbitrarily complex nested selec-tions. The basic idea is to use the constants table to setup a correlation in the Gparams graph to produce a Ggrouped

graph, as shown in Fig. 15. Conceptually, this means thatthe Gparams graph is evaluated once for each row in the con-stants table (i.e., for each unique set of constants). Whilethis will certainly produce the correct results, it is likely tobe inefficient because we still do selections one by one foreach unique set of constants. However, the key idea nowis to decorrelate this graph using query rewrite techniquesdeveloped for SQL [26] and XML [27] queries. Decor-relation converts correlated selections to joins [26, 27] (aswe desire) and also preserves the correct semantics of thegraph by adding appropriate grouping columns to group-byoperators so that nested selections are handled correctly.

5.2 Trigger Pushdown OptimizationsThe final step is to generate a SQL trigger that, when ac-tivated, produces the output of the decorrelated G grouped

graph. In generating this SQL trigger, we leverage tech-niques developed for publishing relational data as XML.Specifically, we apply selection/join pushdown and taggerpull-up transformations [27] on Ggrouped to generate a singlesorted outer union SQL query whose results can be taggedin constant space to produce the XML output; this querybecomes the body of the SQL trigger generated.

In addition, we apply an important optimization to avoiddirectly computing the contents of Told (the pre-update ver-

sion of T ), which can be expensive since it is not directlymade available by the RDBMS. For instance, in our exam-ple trigger, since only NEW NODE is returned to the user andTold is only required to compute an aggregate, we wouldlike to produce the aggregate using only T and the transi-tion tables, rather than materializing Told. Note that this ap-proach is exactly the inverse of the incremental view main-tenance problem, which computes new aggregates from oldvalues. Consequently, by switching the role of old valuesand new values, we can directly use existing incrementalview maintenance techniques [23] to compute aggregateson Told using just T and the transition tables.

The SQL trigger generated for our running example ofan UPDATE on vendor is shown in Fig. 16 (formatted to bemore human-readable). The trigger first finds the affectedkeys by taking a union of the product names associated withtuples in the transition tables (lines 4-9). The trigger thencomputes the number of vendors for each affected productafter the update (lines 11-15), and selects only those withmore than one vendor as potential NEW NODEs (lines 17-18).Note that vendors are only computed for affected products,by using regular query rewrite techniques to push down thejoin on affected keys [22, 27]. The trigger then computesthe number of vendors for each affected product before theupdate by using the corresponding values after the updateand the transition tables (lines 20-38); also note that theselection on the OLD NODE name is transformed into a joindue to trigger grouping. Finally, the action parameters areproduced using a sorted outer union (lines 46-47).

We also investigated the use of relational material-ized views to optimize the performance of XML triggers.We considered subqueries for materialization when they:(a) contained distributive aggregates (such as count(∗));(b) could be incrementally maintained by the relationaldatabase (which imposes other restrictions such as not al-lowing nested predicates or having clauses); and (c) didnot contain transition tables. The materialized views werethen substituted for the subquery in the trigger. For ourexample trigger (Fig. 16), the subquery chosen for materi-alization was ProductCount (lines 11-15), without the jointo AffectedKeys. Ideally, we would have materialized onlythe subset where numVendors ≥ 2 (i.e. MultiVendorProd-uct without the join to AffectedKeys), but such nested pred-icates are disallowed in materialized views. Due to thisrestriction, the materialized view needs to incrementallymaintain more tuples (including those that do not satisfy thenested predicate), which is one explanation for the surpris-ingly poor performance of this optimization (Section 6.2).

6 Experimental EvaluationWe have developed and evaluated a prototype of the sys-tem architecture proposed in this paper. We observe thatthe compile time for an XML trigger, which is the timeto manipulate the intermediate XQGM graphs and producethe final SQL trigger, is fairly small (on the order of a hun-dred milliseconds, even for a complex view) and is only ex-

� �1CREATE TRIGGER sqlTrigger AFTER UPDATE ON VENDOR2REFERENCING OLD_TABLE AS DELETED, NEW_TABLE AS INSERTED34WITH AffectedKeys (name) AS (5SELECT P.name FROM product AS P, INSERTED AS V6WHERE P.pid = V.pid7UNION8SELECT P.name FROM product AS P, DELETED AS V9WHERE P.pid = V.pid),1011ProductCount (name, numVendors) AS (12SELECT P.name, COUNT(*) AS numVendors13FROM product AS P, vendor AS V, AffectedKeys as C14WHERE P.pid = V.pid AND P.name = C.name15GROUP BY P.name),1617MultiVendorProduct (name) AS (18SELECT name FROM ProductCount WHERE numVendors >= 2),1920deltaCount (name, numVendors) AS (21SELECT P.name, 1 FROM product AS P, DELETED AS D22WHERE P.pid = D.pid23UNION ALL24SELECT P.name, -1 FROM product AS P, INSERTED AS I25WHERE P.pid = I.pid),2627MultiVendorProduct_old (name) AS (28SELECT name29FROM (SELECT DISTINCT name, numVendors30FROM ProductCount PC, Constants C31WHERE PC.name = C.Const32UNION ALL33SELECT DISTINCT name, numVendors34FROM deltaCount DC, Constants C35WHERE DC.name = C.Const36) AS T(name, numVendors)37GROUP BY T.name38HAVING SUM(T.numVendors) >= 2),3940ProductInfo (pid, name) AS (41SELECT P.pid, P.name42FROM Product AS P, MultiVendorProduct AS MVP,43MultiVendorProduct_old AS MVP_old44WHERE MVP.name = MVP_old.name AND MVP.name = P.name)4546-- Produce sorted outer union of47-- ProductInfo, Vendor, and Constants

� �

Figure 16: The generated SQL trigger.

Parameter Values (default in bold)Hierarchy depth 2, 3, 4, 5# leaf tuples (×1000) 32, 64, 128, 256, 512, 1024# leaf tuples/element 16, 32, 64, 128, 256# triggers 1, . . ., 10,000, . . ., 100,000# updated elements 1, 20, 40, 60, 100# fired triggers/updated element 1, 10, 100, 1000, 10000

Table 2: Experimental parameters.

pended once during the creation of the trigger. We thereforefocus our evaluation on the run time, which is the overheadof evaluating the generated SQL trigger(s) on an update tothe underlying base table(s).

6.1 Experimental SetupThe parameters of our experimental setup are given in Ta-ble 2. Hierarchy depth specifies the depth of the relationalschema. For depth 2, we use the product/vendor schemaand XML view described earlier. For deeper views, weadd additional “ancestor” tables above product, so that eachchild table has a foreign key column referencing its parent’sprimary key, and the XML view contains children nestedinside of parents. The # of leaf tuples is the number of

0

50

100

150

200

1 10 100 1000 10000 100000

Ave

rage

tim

e pe

r up

date

(m

illis

econ

ds)

Number of triggers on Vendor table

UNGROUPED-AOPTGROUPED

GROUPED-AOPTGROUPED-MV

GROUPED-AOPT-MV

Figure 17: Varying the number of triggers.

0

50

100

150

200

250

0 2000 4000 6000 8000 10000

Ave

rage

tim

e pe

r up

date

(m

illis

econ

ds)

Number of triggers fired per produced node

GROUPEDGROUPED-AOPT

GROUPED-MVGROUPED-AOPT-MV

Figure 18: Varying # of fired triggers.

0

100

200

300

400

500

600

5 4 3 2

Ave

rage

tim

e pe

r up

date

(m

illis

econ

ds)

Depth of relational hierarchy

GROUPEDGROUPED-AOPT

GROUPED-MVGROUPED-AOPT-MV

Figure 19: Varying hierarchy depth.

Approach Grouping? Agg Opt? MV?UNGROUPED No No NoUNGROUPED-AOPT No Yes NoUNGROUPED-MV No No YesUNGROUPED-AOPT-MV No Yes YesGROUPED Yes No NoGROUPED-AOPT Yes Yes NoGROUPED-MV Yes No YesGROUPED-AOPT-MV Yes Yes Yes

Table 3: Evaluated approaches

rows in the leaf (vendor) table. # leaf tuples/element isthe number of leaf tuples per top-level XML element pro-duced by the view; this measures the size of OLD NODE andNEW NODE. # triggers specifies the number of structurallysimilar XML triggers in the system, and # updated elementsspecifies the number of XML elements that are updated foreach relational update. Finally, we also vary the number offired triggers per updated element. In all cases, the XMLtrigger was placed on the top-level XML element in theview, and the count(· · · ) ≥ 2 predicate remained on thelowest level (vendors). We defined the actions of the trig-gers to insert the entire NEW NODE into a temporary table.

We evaluated eight alternative implementations to eval-uate the various aspects of our approach (Table 3). We cat-egorize them based on the optimization techniques used.Grouping refers to grouping structurally similar XML trig-gers (Section 5.1); Agg opt refers to optimizing aggregatecomputation on Told; and MV refers to utilizing material-ized views in triggers (Section 5.2).

Our experiments were performed on a Linux systemwith a 933MHz PIII processor and 1GB of main memory,running IBM DB2 8.1. We defined primary keys for all therelational tables and built appropriate indices on the keycolumns and other join columns. Unless otherwise speci-fied, for each experiment, we varied one of the parametersin Table 2 and used default values for the rest (the defaultvalues are in bold). The run time was averaged over 100 in-dependent updates to the vendor table using a cold cache.

6.2 Varying # TriggersFig. 17 shows the performance of the different approacheswhen we vary the number of XML triggers. For this ex-periment alone, we set # fired triggers/updated elementto be 1 instead of its default value of 100, since the de-fault value is not applicable when # triggers is small. As

shown, UNGROUPED-AOPT does not scale well becauseit does not benefit from shared computation across trig-gers. Other UNGROUPED approaches (not shown) per-formed even more poorly because they do not take ad-vantage of the different optimizations. In contrast, all theGROUPED approaches scale gracefully due to the groupingoptimizations (note the log scale in the x-axis). This sug-gests that we can successfully employ existing groupingtechniques for triggers over XML views.

GROUPED-AOPT provides a 30% improvement overGROUPED due to our aggregation optimization. Surpris-ingly, GROUPED-AOPT-MV shows a 25% performancedegradation over GROUPED-AOPT indicating that there isa performance degradation due to using relational materi-alized views. There are two reasons for this performancedegradation: (1) the overhead of maintaining the view onevery relational update, and (2) lack of support for nestedpredicates, which increases the number of tuples that needto be incrementally maintained. GROUPED and GROUPED-MV show similar relative performance.

6.3 Varying # fired triggers/updated elementFig. 18 shows the effect of varying the number of fired trig-gers per updated XML element (for this and subsequentexperiments, we do not consider UNGROUPED approachesdue to their bad scalability properties). The GROUPED ap-proaches scale linearly with the number of fired triggers;the time to perform a relational update is only 150-200ms even when up to 10,000 XML triggers are activated foreach relational update. Again, as with varying # triggers,our optimizations provide a 30% improvement in perfor-mance when compared to not applying the optimizations.

6.4 Varying Hierarchy depthFig. 19 shows the effect of varying the hierarchy depth,which we define as the depth of the relational schema (i.e.the number of tables to join), and usually translates to amuch deeper nesting in XML elements. For instance, theexample view could be written so that <price> is nestedunder <vid>; while this intuitively increases the depth ofthe XML view, it does not affect the trigger depth: the re-sulting SQL will still execute the same number of joins.Thus, to characterize performance, we use the hierarchydepth instead of the depth of the resulting XML elements.

As shown, the run time of all the approaches increasesapproximately linearly with the hierarchy depth. This

is because, as the depth increases, the relational triggermust evaluate more joins to recreate the hierarchy. Fur-ther, the size of the produced result also increases be-cause the number of intermediate nodes grows larger (eventhough the number of leaf nodes remains constant). Inparticular, GROUPED-AOPT scales gracefully and indicatesthat we can get good performance for XML triggers evenfor deeply-nested views. The performance of GROUPED-MV and GROUPED-AOPT-MV is less scalable because theoverhead of maintaining the materialized view increaseswith the number of joins.

6.5 Summary of Other ResultsWe also varied the other parameters in Table 2 and obtainedsimilar results: all grouped approaches scaled gracefully,and GROUPED-AOPT provides a 30-50% performance gainas compared to GROUPED. These results are described ingreater in Appendix H.

7 Related workThere have been recent advances in supporting triggers innative XML databases [3, 20]. However, unlike our ap-proach, these systems do not support triggers over (unmate-rialized or incrementally maintained) XML views, and alsodo not exploit relational technology. Many commercial re-lational databases have also recently added built-in XMLsupport. However, these systems do not support triggersover incrementally maintained (or unmaterialized) XMLviews, as they lack sophisticated XML data managementcapabilities. In contrast, our approach supports triggersover XML views using existing relational technology byleveraging SQL triggers.

A related class of systems is XML publish/subscribesystems [6, 9, 14, 20, 30], where the goal is to efficientlymatch streaming XML documents against a large set ofsubscriptions. However, most pub/sub systems do not con-sider updates to XML documents, which is the main issuewith XML triggers. Even those that consider updates [20]work with native XML engines and are not designed towork with relational engines or exploit SQL triggers.

Our work is also related to the incremental mainte-nance of relational/nested-relational/object-oriented/semi-structured views; we refer the reader to Section 4.1 forthe main technical differences with respect to view mainte-nance. Our work also differs from XML view maintenancein that we exploit relational triggers and query processing.

8 Conclusion and Future WorkWe have presented a systematic way of translating triggersover XML views of relational data into SQL triggers, andfor translating relational updates into their correspondingXML updates. We have also presented various optimiza-tions and quantified their efficiency and scalability. Oneinteresting result of our performance study is that the useof relational materialized views for XML triggers actuallyresults in a performance degradation. This is partly due to

a lack of support for materialized views with nested pred-icates, which are common in XML views. Thus, an in-teresting direction for future work is investigate whetherour general algorithm for detecting changes over XQueryviews can be adapted for incrementally maintaining mate-rialized views with nested predicates.

References[1] S. Abiteboul, J. McHugh, M. Rys, V. Vassalos, J. Wiener, “Incremen-

tal Maintenance for Materialized Views over Semistructured Data”,VLDB 1998.

[2] P. Bohannon, P. Buneman, B. Choi, W. Fan, “Incremental Evaluationof Schema-Directed XML Publishing”, SIGMOD 2004.

[3] A. Bonifati, D. Braga, A. Campi, S. Ceri, “Active XQuery”, ICDE2002.

[4] S. Ceri, J. Widom, “Deriving Production Rules for Constraint Main-tenance”, VLDB 1990.

[5] S. Ceri, J. Widom, “Deriving Production Rules for Incremental ViewMaintenance”, VLDB 1991.

[6] J. Chen, D. J. DeWitt, F. Tian, Y. Wang, “NiagaraCQ: A ScalableContinuous Query System for Internet Databases”, SIGMOD 2000.

[7] R. Cochrane, K. Kulkarni, N. Mattos, “Active Database Featuresin SQL3”, In N. Paton (ed.) Active Rules in Database Systems,Springer-Verlag, 1999.

[8] R. Cochrane, H. Pirahesh, N. Mattos, “Integrating Triggers andDeclarative Constraints in SQL Database Systems”, VLDB 1996.

[9] Y. Diao, P. Fischer, M. Franklin, R. To, “YFilter: Efficient and Scal-able Filtering of XML Documents”, ICDE 2002.

[10] K. Dimitrova, M. El-Sayed, E. Rundensteiner, “Order-sensitive ViewMaintenance of Materialized XQuery Views”, ER 2003.

[11] M. El-Sayed, L. Wang, L. Ding, E. Rundensteiner, “An AlgebraicApproach for Incremental Maintenance of Materialized XQueryViews”, WIDM 2002.

[12] M. Fernandez, D. Suciu, W. Tan , “SilkRoute: Trading between Re-lations and XML”, WWW 2000.

[13] D. Gluche, T. Grust, C. Mainberger, M. Scholl, “Incremental Up-dates for Materialized OQL Views”, DOOD 1997.

[14] A. Gupta, D. Suciu. “Stream Processing of XPath Queries with Pred-icates”, SIGMOD 2003.

[15] E. Hanson, C. Carnes, L. Huang, M. Konyala, L. Noronha, S.Parthasarathy, J. Park, A. Vernon, “Scalable Trigger Processing”,ICDE 1999.

[16] IBM DB2 UDB, http://www.ibm.com/software/data/db2/[17] A. Kawaguchi, D. Lieuwen, I. Mumick, K. Ross, “Implementing In-

cremental View Maintenance in Nested Data Models”, DBPL 1997.[18] H. Kuno, E. Rundensteiner, “Incremental Maintenance of Materi-

alized Object-Oriented Views in MultiView: Strategies and Perfor-mance Evaluation”, TKDE 10(5), 1996.

[19] Microsoft SQL Server, http://www.microsoft.com/sql/[20] B. Nguyen, S. Abiteboul, G. Cobena, M. Preda, “Monitoring XML

Data on the Web”, SIGMOD 2001[21] Oracle 10g DBMS, http://www.oracle.com/database/[22] T. Palpanas, R. Sidle, R. Cochrane, H. Pirahesh, “Incremental Main-

tenance for Non-Distributive Aggregate Functions”, VLDB 2002.[23] D. Quass, “Maintenance Expressions for Views with Aggregation”,

SIGMOD 1996.[24] M. Rys, “State-of-the-Art Support in RDBMS: Microsoft SQL

Server’s XML Features”, Data Engineering Bulletin 24(2), 2001.[25] M. Rys, M. Norrie, H. Schek, “Intra-Transaction Parallelism in the

Mapping of an Object-Model to a Relational Multi-Processor Sys-tem”, VLDB 1996.

[26] P. Seshadri, H. Pirahesh, T. Leung, “Complex Query Decorrelation”,ICDE 1996.

[27] J. Shanmugasundaram, J. Kiernan, E. Shekita, C. Fan, J. Funderburk.“Querying XML views of relational data”, VLDB 2001

[28] M. Stonebraker, A. Jhingran, J. Goh, S. Potamianos, “On Rules, Pro-cedures, Caching and Views in Data Base Systems”, SIGMOD 1990.

[29] D. Suciu, “Query Decomposition and View Maintenance for QueryLanguages for Unstructured Data”, VLDB 1996.

[30] F. Tian, B. Reinwald, H. Pirahesh, T. Mayr, J. Myllymaki, “Im-plementing A Scalable XML Publish/Subscribe System Using Re-lational Database Systems”, SIGMOD 2004.

[31] World Wide Web Consortium. “XQuery 1.0: An XMLQuery Language”, W3C Working Draft, 12 Nov. 2003. Seehttp://www.w3.org/TR/2003/WD-xquery-20031112/.

[32] World Wide Web Consortium. “XQuery 1.0 and XPath 2.0 Func-tions and Operators”, W3C Working Draft, 12 Nov. 2003. Seehttp://www.w3.org/TR/2003/WD-xpath-functions-20031112/.

[33] X. Zhang, M. Mulchandani, S. Christ, B. Murphy, E. Rundensteiner,“Rainbow: mapping-driven XQuery processing system”, SIGMOD2002.

A Definitions of Canonical KeysTable 4 defines the canonical key for each XQGM operator(except for Unnest) in terms of the canonical keys of itsinput operator(s).

B Proof of Theorem 1

Theorem 1 1. A view of relational data, G, is trigger-specifiable if all the table operators in G have (canonical)keys.

Proof. We need to prove that every operator in G has acanonical key. In Table 4, we define the canonical keys forevery type of operator, except for Unnest, in terms of itsinput operator(s). Thus, if G does not contain any Unnestoperators, then we can simply derive the canonical key foreach operator o by applying the definitions in Table 4.

If G does contain Unnest operators, then it can berewritten to an equivalent graph G ′ that does not containany Unnest operators. This transformation is possible be-cause G is an XML view of relational data and the underly-ing relational data contains no inherent nesting. Hence, anUnnest operator can only unnest an XML hierarchy createdin the view itself. We can thus use the sound and completeview composition rules from [27] to remove Unnest oper-ators in such cases. We can thus assume without loss ofgenerality that G does not contain any Unnest operators.

Since all operators in G have canonical keys, the view istrigger-specifiable by Definition 4.

C Event pushdownIn this part, we show our algorithm for determining whatrelational events can cause XML triggers to be fired.

First, we precisely define the notion of INSERT, UP-DATE, and DELETE events on an operator o, given adatabase transition D

∗→ D′:

• INSERT(o): There exists t ∈ R(o, D′) such that¬∃t′(t′ ∈ R(o, D) ∧ ckvo(t) = ckvo(t′)).

• DELETE(o): There exists t ∈ R(o, D) such that¬∃t′(t′ ∈ R(o, D′) ∧ ckvo(t) = ckvo(t′)).

• UPDATE(o, C): There exists t ∈ R(o, D) such thatfor the set of columns C,∃t′(t′ ∈ R(O, D′) ∧ ckvo(t) =ckvo(t′) ∧ πC(v(t)) �= πC(v(t′))).

The procedure we use is quite straightforward: for eachtype of operator (Join, GroupBy, etc.), and for each ofthe three event types, there is a set of possible input eventswhich can directly cause that output event (see Table 5).Put another way, there is a set of rules EI → EO , whereoperator I is an input to operator O, such that the event EO

can occur if EI occurs. Thus, starting at the top operator

1: GetSrcEvents (o : Operator, e : Event) :2: S ← {(o′, e′) |

determined from (o, e)using Table 5}3: S′ ← ∅4: for all (o′, e′) ∈ S do5: if o′.type = Table then6: S′ ← S′ ∪ {(o′, e′)}7: else8: S′ ← S′ ∪ GetSrcEvents(o′, e′)9: end if

10: end for11: Return S′.

Figure 20: GetSrcEvents: given an operator, o, and a desiredevent on that operator, e, returns the set of table-level events whichcan cause e.of the Path graph, we can determine the set of events onall of its input operators which can directly cause the Eventspecified in the trigger. Applying these rules recursively,as shown in Fig. 20, we eventually reach the base Tableoperators, at which point we have the set of all base-tableevents IB such that IB → Event.

D Supported XQuery ExpressionsOur implementation supports a powerful subset of XQueryin view and trigger definitions. As mentioned in Sec-tion 2.2, the trigger’s Path is an XPath expression, whilethe Condition as well as the parameters to the Action func-tion are XQuery expressions. Rather than list the full XPathand XQuery grammar we support, we will instead highlightthe differences between the grammar supported by our im-plementation and the XQuery 1.0 specification [31].

Fig. 21 shows the restrictions we place on the XQuery1.0 grammar. The two main restrictions are on the axes sup-ported (we do not support parent or sibling accessors), andon type expressions, which we do not support (hence the re-moval of InstanceofExpr). There are also restrictionson the XQuery functions [32] which we support; currently,we allow only arithmetic functions which have counter-parts in SQL, and we do not allow user-defined XQueryfunctions.

E Proof of Theorem 2In this section, we will prove the correctness of the Create-ANGraph algorithm (see Fig. 12).

E.1 Avoiding Spurious UPDATE EventsThe goal of CreateANGraph is to produce an XQGMgraph which, when evaluated, will produce the values ofOLD NODE and NEW NODE for each XML node which wasinserted, updated, or deleted. If, as we shall prove, Create-AKGraph (Fig. 8) is correct, then the anti-join at the topof Gaffected (lines 13 and 15 of CreateANGraph) preventsINSERT and DELETE triggers from producing spurious out-put.

Operator type Input (operator, key) pairs How to derive output key (KeyO)

Select, Project (I, KeyI)/* Simply propagate the key of our input operator. */KeyO ← KeyI

Join (I1, KeyI1), · · · , (In, KeyIn

)/* New key is the concatenation of input keys. */KeyO ← KeyI1

∪ · · · ∪ KeyIn

Union (I1, KeyI1), · · · , (In, KeyIn

)

Let M : CI → CO be the mapping from the columns of input operators tocolumns of O.

KeyO ←[

Ii∈I

0@ [

c∈KeyIi

M(c)

1A

GroupBy (I,KeyI) KeyO ← the grouping columns of O.

Table — KeyO ← the primary key of O.Table 4: Deriving canonical keys for XQGM operators.

Operator type Output event Input (operator,event) pairs

Select, Project

DELETE(O)DELETE(I) (I is the input operator);UPDATE(I,Cσ) where Cσ are the columns used in the selection condition

INSERT(O)INSERT(I);UPDATE(I,Cσ)

UPDATE(O,C) UPDATE(I,C)

Join

DELETE(O)DELETE(I) for any input operator I ;UPDATE(I,C), where CI are the columns of operator I .

INSERT(O)INSERT(I) for any input operator I ;UPDATE(I,C)

UPDATE(O,C) UPDATE(I,CI)

GroupBy

DELETE(O)DELETE(I);UPDATE(I,G), where G is the set of grouping columns

INSERT(O)INSERT(I);UPDATE(I,G)

UPDATE(O,C)UPDATE(I,C);INSERT(I) unless C ⊆ G;DELETE(I) unless C ⊆ G

Union

DELETE(O)DELETE(I) for any input operator I ;UPDATE(I, IC) for any input operator I (Note that DELETE(O) could becaused by an UPDATE where a previously unique tuple becomes a duplicate.)

INSERT(O)INSERT(I) for any input operator I ;UPDATE(I, IC) for any input operator I (analogously to DELETE(O))

UPDATE(O,C) UPDATE(I, IC) for any input operator ITable 5: Operator-specific rules used in event pushdown.

[31] < MainModule > − > < Prolog >< QueryBody >

< MainModule∗ > − > < QueryBody >

[72] < AxisStep > − > (< ForwardStep > | < ReverseStep >)

| < Predicates >

< AxisStep∗ > − > < ForwardStep >< Predicates >

[89] < ForwardAxis > − > (”child” ” :: ”)

|(”descendant” ” :: ”)

|(”attribute” ” :: ”)

|(”self” ” :: ”)

|(”descendant − or − self” ” :: ”)

|(”following − sibling” ” :: ”)

|(”following” ” :: ”)

< ForwardAxis∗ > − > (”child” ” :: ”)

|(”descendant” ” :: ”)

|(”attribute” ” :: ”)

|(”self” ” :: ”)

|(”descendant − or − self” ” :: ”)

[56] < AndExpr > − > < InstanceofExpr >

@”and” < InstanceofExpr >

< AndExpr∗ > − > < ComparisonExpr >

@”and” < ComparisonExpr >

Figure 21: Supported XQuery grammar. Non-terminals markedwith an asterisk (∗) indicate our implementation, while the brack-eted number refers to the rule in the XQuery specification [31].

However, when the XML trigger is on an UPDATE event,there are cases in which spurious updates might be de-tected. To see why, suppose the view for our running ex-ample (Fig. 3) were changed so that instead of producingthe list of vendors for each product, it only produced theminimum price for each. The XQGM graph for this modi-fied view is shown in Fig. 22. Note that the only differencefrom the original (Fig. 5) is in the Project and GroupBy(boxes 4 and 5).

Now suppose vendor initially contains the following twotuples:

vendorvid pid price

Amazon P1 100.00Buy.com P1 60.00

and one vendor updates its price:�vendor

vid pid priceAmazon P1 100.00

�vendor

vid pid priceAmazon P1 75.00

Since the view creates product elements, P1 will beidentified by our algorithm as an affected key because�vendor and �vendor (potentially) affect the aggregateproduced by the GroupBy operator in Box 5. However,

table: product$pid $pname $mfr

table: vendor$vid $pid

1 2

project: $pname, $price

$price$pname

4

groupby $pname$minprice = min($price)

$count = count($vendor)

$pname

5

select: $count ≥ 26

project: $product = <product name="$pname"> <min>$minprice</min>

</product>

$product

groupby:$prds = aggXMLFrag($product)

$prds

8

7

project: $catalog = <catalog> $prds

</catalog>

$catalog

9

$pname

3 join: 1.$pid = 2.$pid$pname $vid

$price

$price$pid

$minprice $count

A

B

$minprice

Figure 22: XQGM for modified catalog view.

since this particular update does not change the minimumprice, the XML node for product P1 remains unchanged:both before and after the update, the node is simply:

<product name="P1"><min>60.00</min>

</product>

On the other hand, if the new price for Amazon hadbeen, say, $50.00, this XML node would have been af-fected. Thus, when the XML trigger is on an UPDATE

event, we need to ensure that the node in question wasactually updated. The simplest, but far from the mostefficient, solution is to place a selection condition at thetop of Gaffected (Line 11 in Fig. 12) which filters outthose (OLD NODE, NEW NODE) pairs where OLD NODE =NEW NODE. This is implemented as a string comparison inthe tagger (since it’s a comparison of the full XML nodes),which has two drawbacks. First, it is expensive when thenodes are large. Second, and more importantly, it requirespassing the entire (OLD NODE, NEW NODE) pair to the mid-dleware, which prevents many of the optimizations whichwould otherwise be performed by XQGM graph rewriterules when the Condition and Action do not require the en-tire nodes. We therefore present some optimizations and

prove their correctness in Appendix F. For this section,however, we assume the use of the simpler, less-efficientapproach.

E.2 Proof of Correctness of CreateAKGraphCentral to our proof of Theorem 2 is the correctness of theaffected-keys algorithm, CreateAKGraph (Fig. 8).

First, we formally define some terminology. In the fol-lowing, we use R(T, D) to denote the contents of table Tin database state D. (In other words, R(T, D) = R(o, D)where o is the XQGM operator Table(T ).)

Definition 5 (Valid transition tables). For any givensingle-table database transition D

T→ D′, (�T,�T ) isa valid pair of transition tables iff

�T ⊆ R(T, D), � T ⊆ R(T, D′),�T ⊇ {x|x ∈ R(T, D) ∧ x �∈ R(T, D′)},

�T ⊇ {x|x ∈ R(T, D′) ∧ x �∈ R(T, D)},and (R(T, D) −�T ) = R(T, D′) −�T .

Definition 6 (Hypothetical state). For a given databasestate D and a transition table dT , the hypothetical stateD−dT is the database state such that R(T, D−dT ) =R(T, D) − dT and for all tables T ′ �= T , R(T ′, D−dT ) =R(T ′, D).

We refer to a database transition as monotonic if�T = ∅ or �T = ∅. It follows from Definitions 5 and6 that both D

T→ D′−�T and D′

−�TT→ D′ are monotonic

transitions.

Definition 7 (“Affected”). A tuple t is said to be affectedin view G by relational transition D

∗→ D′ iff t is updated,inserted, or deleted in G by D

∗→ D′ (as per Definitions 2and 3).

We now prove the correctness of CreateAKGraph.

Lemma 1 (Correctness of CreateAKGraph). Given aview graph G, a relational table T , and a monotonicdatabase transition D1

T→ D2 with non-empty transitiontable dT , let O′ = CreateAKGraph(oG, T, dT ). ThenckvO′(x) ∈ R(O′, D2) for all tuples x where x is affectedin G by D1

T→ D2.

Proof. We prove Lemma 1 by induction on the depth of G.

Base case: depth = 1.

In this case, the view graph only consists of a singleoperator, Table(X), for some relational table X .

Suppose T �= X . Since we stipulated that a databasetransition D1

T→ D2 occurred, D1 and D2 are identicalstates except for the contents of table T . Therefore, sinceR(Table(X), D1) = R(Table(X), D2), there are no tu-ples affected, so the lemma is vacuously true.

On the other hand, suppose that T = X .Then CreateAKGraph(oG, T, dT ) = πT.key(Table(dT )).

Hence, by the definition of transition tables, R(O ′, D2)contains all tuples x affected by D1

T→ D2.Thus, the base case holds.

Induction Hypothesis: For a graph H of depth ≤ k, sup-pose Lemma 1 holds.

We will now show that Lemma 1 holds for a graph Gof depth k + 1. There are four cases, one for each type ofoperator except for Table (which can only occur at the leaflevel of the graph).

Case 1: oG is a GroupBy operator.

This case is handled by lines 9-15 of the algorithm.Then there are two cases to consider, depending on the

value returned by the recursive call to CreateAKGraph,I ′. First, if I ′ = ∅, then there are no tuples in the in-put to the GroupBy which were affected as a result of thedatabase transition. Since the input to the GroupBy is un-changed, and it merely aggregates its input, its output mustalso be unaffected by the transition; hence, in this case, wesimply return ∅.

Otherwise, the algorithm creates a Join operator, J ,joining I with I ′, and then returns a new GroupBy op-erator which merely projects out the values of the group-ing columns of oG. Note that for a GroupBy operator, thegrouping columns are its canonical key. By the inductionhypothesis, ckvI′(x) ∈ R(I ′, D2) for every tuple x affectedby the transition. If ckvI′(x) is produced by I ′, then Jwill produce all tuples y where ckvoG(x) = ckvoG(y), andO′ will produce ckvO′(y). In other words, for each tupleproduced by operator I affected by the transition, O ′ pro-duces the corresponding value of the canonical key of theGroupBy operator, O′.

Finally, since each tuple z produced by a GroupByoperator Q depends only on those input tuples w whereckvQ(w) = ckvQ(z), the keys of all tuples affected in G by

the transition D1T→ D2 are included in R(O′, D2).

Case 2: oG is a Select or Project operator.

This case is handled by lines 16-17 of the algorithm.Both Select and Project take a single input, I . Sup-

pose, by contradiction, that there exists a tuple x such that xis affected in G by D1

T→ D2, but ckvO′(x) �∈ R(O′, D2).By the algorithm (line 17), O ′ = I ′, so it must also be truethat ckvI′(x) �∈ R(I ′, D2). Let y be the tuple produced byI such that ckvI(y) = ckvI(x). There must be exactly onesuch tuple, because Project does not change the cardinal-ity of its input, and Select can only decrease it. ThereforeckvI′(y) �∈ R(I ′, D2). By the induction hypothesis, thisimplies that y is not affected by D1

T→ D2, for if it were,ckvI′(y) would be in R(I ′, D2). Both Select and Projectare deterministic (given the limitations on XQuery func-tions laid out in Appendix D), and compute each outputtuple in terms of exactly one input tuple. But this yields acontradiction, because an unaffected input tuple y resulted

in an affected corresponding output tuple x.

Case 3: oG is a Join.

This case is handled by lines 20-30 of the algorithm.A Join with predicates is semantically equivalent to a

Join with no predicates (i.e., a cross-product) followed bya Select imposing the predicates. Since a Select requires noadditional operators added to the affected-keys graph (seeCase 2 above), we can assume, without loss of generality,that oG has no predicates.

Furthermore, a Join with fewer than 2 input opera-tors is equivalent to a Select, and a Join with more than2 input operators can be split up into several joins, i.e.,Join(I1, I2, · · · , In) = Join(I1,Join(I2, · · · , In)). Wetherefore make the additional simplifying assumption thateach Join has exactly 2 inputs.

The algorithm begins by invoking CreateAKGraph re-cursively on each of the inputs (I0 and I1). There are threepossible cases to consider: first, suppose both invocationsreturn ∅. By the induction hypothesis, this implies that theinput to oG is unchanged; since Join is deterministic, itsoutput must also be unchanged.

The second possibility is that CreateAKGraph re-turned ∅ for exactly one of I0 or I1, and returned someI ′ for the other. The proof for this case is almost iden-tical to the proof-by-contradiction used in Case 2 for Se-lect, so we just sketch it out: assume that there exists atuple x such that x is affected in G by D1

T→ D2, butckvO′(x) �∈ R(O′, D2); then a contradiction arises becauseonly one leg of the Join changed, and I ′ = O′, so the corre-sponding tuple y ∈ R(I ′, D2) should have been identifiedby I ′.

The third and final possibility is that CreateAKGraphreturned I ′

0 and I ′1 for I0 and I1, respectively. In this

case, we produce O′ by creating two Join operators—Ja

computing the cross-product (I ′0 × I1), and Jb computing

(I0 × I ′1)—and taking their Union. Suppose by contradic-tion that there exists a tuple x such that x is affected in G

by D1T→ D2, but ckvO′(x) �∈ R(O′, D2). Then, by def-

inition of a cross-product, there exists exactly one pair oftuples (y, z) such that x = y · z, i.e. y ∈ R(I0, D2), z ∈R(I1, D2), ckvI0(y) = ckvI0(x), and ckvI1(z) = ckvI1(x).Since x was affected, and Join is deterministic, it must bethe case that either y or z was also affected. Without loss ofgenerality, assume it was y. Then, by the induction hypoth-esis, it must be the case that ckvI′

0(y) ∈ R(I ′0, D2). There-

fore, since z ∈ R(I1, D2), there is a tuple x′ ∈ R(Ja, D2)such that ckvI′

0(x′) = ckvI′

0(y) and ckvI′

1(x′) = ckvI′

1(z).

But this yields a contradiction, since the Union will simplypropagate x′, and x′ = ckvO′(x), contradicting our originalassumption that ckvO′(x) �∈ R(O′, D2).

Case 4: oG is a Union.

The final possibility is that oG is a Union operator; thisis handled in lines 34-43. The algorithm for Union calls

CreateAKGraph recursively on each of its inputs i, andthen creates a new Union operator computing the union ofeach of these results i′.

We prove the correctness for this final case by contra-diction. Suppose there exists some tuple x such that x

is affected in G by the transition D1T→ D2, but x �∈

R(O′, D2). By definition of the Union operator, there mustbe at least one input operator I such that x was affectedin I . By the induction hypothesis, ckvI′(x) ∈ R(I ′, D2).However, this yields a contradiction because I ′ is one ofthe inputs to O′, and the semantics of Union require that∀y(y ∈ I ′ → y ∈ O′).

E.3 Proof of Correctness of CreateANGraphBefore we can prove the correctness of CreateANGraph,we must first prove the following lemma:

Lemma 2. For a database transition DT→ D′ with tran-

sition tables �T and �T , and a view graph G, if a tuplet is affected in G by D

T→ D′, then either t is affected byD

T→ D′−�T , or t is affected by D′

−�TT→ D′.

Proof. For conciseness of notation, let D̄ = D′−�T . Sup-

pose by contradiction that (1) t is affected by DT→ D′,

but (2) t is not affected by DT→ D̄ and t is not af-

fected by D̄T→ D′. By Definitions 2 and 3, it fol-

lows from (2) that one of two cases is possible: either (3)(t ∈ R(oG, D) ∧ t ∈ R(oG, D̄) ∧ t ∈ R(oG, D′)) or (4)(t �∈ R(oG, D) ∧ t �∈ R(oG, D̄) ∧ t �∈ R(oG, D′)).

If (3) holds, then t ∈ R(oG, D) ∧ t ∈ R(oG, D′), so itfollows from Definitions 2 and 3 that t is neither inserted,deleted, or updated in G by D

T→ D′, contradicting ourassumption (1) that t is affected by D

T→ D′.On the other hand, if (4) holds, then t �∈ R(oG, D)∧ t �∈

R(oG, D′); again, it follows from Definitions 2 and 3 thatt is neither inserted, deleted, or updated in G by D

T→ D′,contradicting assumption (1).

We now proceed with the proof of Theorem 2, the cor-rectness of CreateANGraph. Please refer to Fig. 12 forthe text of the algorithm, which is referenced throughoutthis proof.

Theorem 2 1. Given an event E, view graph G, and ta-ble T , CreateANGraph(E, G, T ) produces graph Gaffected

such that for all valid database transitions DT→ D′,

(OLD NODE, NEW NODE) ∈ R(oGaffected , D′) iff:

(a) E = UPDATE∧OLD NODE ∈ R(oG, D)∧NEW NODE ∈R(oG, D′)∧ckvoG(OLD NODE) = ckvoG(NEW NODE)∧v(OLD NODE) �= v(NEW NODE), or

(b) E = INSERT ∧ OLD NODE = ∅ ∧ NEW NODE ∈R(oG, D′)∧�x|(x ∈ R(oG, D)∧ckvoG(NEW NODE) =ckvoG(x)), or

(c) E = DELETE ∧ NEW NODE = ∅ ∧ OLD NODE ∈R(oG, D)∧�x|(x ∈ R(oG, D′)∧ckvoG(OLD NODE) =ckvoG(x)).

Proof. Lemma 1 proved that if some tuple x is affectedin G by the database transition D ′

−�TT→ D′, then

ckvoG(x) ∈ R(CreateAKGraph(oG, T,�T ), D′), andthat if some tuple x is affected in G by D

T→ D′−�T , then

ckvoG(x) ∈ R(CreateAKGraph(oGold , Told,�T ), D′).As Lemma 2 showed, for any tuple x affected in G byD

T→ D′, it must be affected either by D ′−�T

T→ D′ or by

DT→ D′

−�T . Therefore Union created in line 6 producesa superset of affected keys. I.e., for any affected tuple x, itmust be the case that x ∈ R(Ou, D′) (see line 6).

We now consider each of the three event types sepa-rately, and we prove the theorem in both directions for each.

(a) E = UPDATE.

Suppose E = UPDATE, and there exist OLD NODE andNEW NODE such that OLD NODE ∈ R(oG, D) ∧ NEW NODE ∈R(oG, D′) ∧ ckvoG(OLD NODE) = ckvoG(NEW NODE) ∧v(OLD NODE) �= v(NEW NODE). This is the definition ofan UPDATE event on oG, and since we have shown thatR(Ou, D′) contains ckvoG(x) for all tuples x affectedby the database transition, we can infer that R(Ou, D′)contains ckvoG(OLD NODE). Then, since OLD NODE ∈R(oG, D), it follows that OLD NODE ∈ R(Oold, D

′)(line 8); similarly, since NEW NODE ∈ R(oG, D′), there-fore NEW NODE ∈ R(Onew, D′) (line 8). Next, sincethere is exactly one tuple in each of R(Oold, D

′) andR(Onew, D′), the inner join (line 10) produces exactly(OLD NODE, NEW NODE). Since we initially stipulated thatOLD NODE �= NEW NODE, the final Select (line 11) does notremove this pair from the output. Therefore, G affected pro-duces the tuple (OLD NODE, NEW NODE).

Conversely, suppose E = UPDATE, and there ex-ists (OLD NODE, NEW NODE) ∈ R(OGaffected , D

′). Then, thismust have been returned in line 11, so we can infer thatOLD NODE �= NEW NODE. Furthermore, OLD NODE andNEW NODE come from Oold and Onew, respectively, joinedon their respective keys; therefore, since the key of bothOold (line 8) and Onew (line 7) is the same as the key ofG, we have that ckvoG(OLD NODE) = ckvoG(NEW NODE).Finally, since R(Onew, D′) ⊆ R(G, D′), we can con-clude that NEW NODE ∈ R(G, D′). We can similarly con-clude that OLD NODE ∈ R(G, D) because R(Oold, D

′) ⊆R(G, D).

(b) E = INSERT.

Next, suppose that E = INSERT, and there ex-ist OLD NODE and NEW NODE such that OLD NODE =∅ ∧ NEW NODE ∈ R(oG, D′) ∧ �x|(x ∈ R(oG, D) ∧ckvoG(NEW NODE) = ckvoG(x)). This is the definitionof an INSERT event on oG, and since we have shownthat R(Ou, D′) contains ckvoG(x) for all tuples x affected

by the database transition, we can infer that R(Ou, D′)contains ckvoG(NEW NODE). Then, since NEW NODE ∈R(oG, D′), it follows that NEW NODE ∈ R(Onew, D′); simi-larly, since there is no corresponding tuple x in R(oG, D),it follows that there is no tuple y ∈ R(Oold, D

′) such thatckvoG(y) = ckvoG(NEW NODE). Therefore, the LeftAnti-Join (line 13) will produce (∅, NEW NODE).

Conversely, suppose E = INSERT, and there ex-ists (OLD NODE, NEW NODE) ∈ R(OGaffected , D

′). Then,this must have been returned in line 13, so we can in-fer that OLD NODE = ∅. Furthermore, OLD NODE andNEW NODE come from Oold and Onew, respectively, anti-joined on their respective keys. Therefore, since the keyof both Oold (line 8) and Onew (line 7) is the same asthe key of G, we have that NEW NODE ∈ R(G, D ′), and�y|(y ∈ R(oOold , D

′) ∧ ckvoG(y) = ckvoG(NEW NODE)).Because NEW NODE ∈ R(Onew, D′), we conclude thatckvoG(NEW NODE) ∈ R(Ou, D′). Therefore, since we haveshown that Oold does not contain a corresponding tuple y,it follows that �x|(x ∈ R(oG, D) ∧ ckvoG(NEW NODE) =ckvoG(x)).

(c) E = DELETE.

The final case is analogous to (b). Suppose that E =DELETE, and there exist OLD NODE and NEW NODE suchthat NEW NODE = ∅ ∧ OLD NODE ∈ R(oG, D) ∧ �x|(x ∈R(oG, D′) ∧ ckvoG(OLD NODE) = ckvoG(x)). This isthe definition of a DELETE event on oG, and since wehave shown that R(Ou, D′) contains ckvoG(x) for all tu-ples x affected by the database transition, we can inferthat R(Ou, D′) contains ckvoG(OLD NODE). Then, sinceOLD NODE ∈ R(oG, D), it follows that OLD NODE ∈R(Oold, D

′); similarly, since there is no correspondingtuple x in R(oG, D′), it follows that there is no tupley ∈ R(Onew, D′) such that ckvoG(y) = ckvoG(OLD NODE).Therefore, the RightAntiJoin (line 15) will produce(OLD NODE, ∅).

Conversely, suppose E = DELETE, and there ex-ists (OLD NODE, NEW NODE) ∈ R(OGaffected , D

′). Then,this must have been returned in line 15, so we can in-fer that NEW NODE = ∅. Furthermore, OLD NODE andNEW NODE come from Oold and Onew, respectively, anti-joined on their respective keys. Therefore, since the keyof both Oold (line 8) and Onew (line 7) is the same asthe key of G, we have that OLD NODE ∈ R(G, D), and�y|(y ∈ R(oOnew , D

′) ∧ ckvoG(y) = ckvoG(OLD NODE)).Because OLD NODE ∈ R(Oold, D

′), we conclude thatckvoG(OLD NODE) ∈ R(Ou, D′). Therefore, since we haveshown that Onew does not contain a corresponding tuple y,it follows that �x|(x ∈ R(oG, D′) ∧ ckvoG(OLD NODE) =ckvoG(x)).

F Optimizations for CreateANGraphAs described in Appendix E.1, in the CreateANGraphalgorithm (Fig. 12), we initially put a Select operator

(line 11) at the top of the affected-node graph in order to en-sure that OLD NODE and NEW NODE actually differ; however,doing this comparison in the tagger can be expensive. Inthis section, we identify a general class of views for whichwe do not have to explicitly check whether OLD NODE andNEW NODE differ, while still ensuring that we do not iden-tify spurious updates - many views, including the runningexample in the paper, fall into this class. For views thatdo not fall into this class, we present a few optimizationsthat can push down the check to the relational engine undercertain conditions.

F.1 DefinitionsAs Definition 5 shows, the transition tables provided bythe relational database system are actually a superset of thetuples which changed. This is because an update statementsuch as:

UPDATE VENDORSET PRICE = 1 * PRICE

will result in the transition tables containing as many rowsas there are vendor rows, even though none of the ven-dor rows actually changed in value. Therefore, Create-AKGraph would produce keys of XML nodes which werenot actually updated, and these would not be identified asspurious until reaching the final Select operator.

This problem can be avoided by replacing all referencesto �T and �T in the SQL trigger with �T ′ and �T ′,respectively, where �T ′ = �T −�T and �T ′ = �T −�T . Then we can refine Definition 5:

Definition 8 (Pruned transition tables). For any givensingle-table database transition D

T→ D′, the pruned tran-sition tables �T and �T are:

�T = {x|x ∈ R(T, D′) ∧ x �∈ R(T, D)}, and

�T = {x|x ∈ R(T, D) ∧ x �∈ R(T, D′)}.For a large class of views, including the running exam-

ple in the paper, we can actually remove the selection con-dition OLD NODE �= NEW NODE from CreateANGraph ifwe prune the transition tables. This class of views, whichwe call injective, has the property that there is a one-to-onemapping between each XML node produced by oG (the topoperator of the view graph) and the set of relational tuplesused to construct the node.

In order to prove this claim, we must first define the no-tion of an injective view more formally. We begin by defin-ing the contributing set of a tuple: intuitively, for any tuplet produced by an operator o, there is a set of tuples pro-duced by each of its input operators o i which contributes tot. For example, the contributing set of a tuple t producedby a GroupBy operator is the set of all input tuples havingthe same grouping-column value as t. For a Project orSelect, the contributing set of a tuple t is the input tuple

from which t is computed by projection or selection, re-spectively. We now formalize this notion of a contributingset of a tuple for arbitrary operators.

Definition 9 (Contributing set). Given an operator (o),one of its input operators (oi), a database state (D), and atuple (to ∈ R(o, D)), the contributing set of to is:

ζ(to, oi, o, D) = {ti ∈ R(oi, D)|∀D′(ti ∈ R(oi, D

′) →∃t′o(t′o ∈ R(o, D′) ∧ ckvo(to) = ckvo(t′o)))}.

In the following, C̃(o) is the set of columns producedby operator o. For a tuple t produced by operator o, wedenote the value of columns C (where C ⊆ C̃(o)) asπC(t). We similarly denote projection for a set of tu-ples: πC(S) = {πC(t)|t ∈ S}. Finally, when C is a set ofcolumns belonging to (potentially) multiple operators, then(C|o) denotes the subset of C belonging to operator o; i.e.(C|o) = C ∩ C̃(o).

We now define injection for a single operator in terms ofthe contributing set of each tuple it produces:

Definition 10 (Injection for operators). Given an XQGMoperator (o) with a set of input operators (I), a set ofo’s columns (Co), and a subset of I’s columns (CI ): thecolumns Co are injective with respect to the columns CI

(denoted as CI �→ Co) iff:

∀t1, D1, t2, D2((t1 ∈ R(o, D1) ∧ t2 ∈ R(o, D2) ∧ πCo(t1) = πCo(t2))

→ (∀oi ∈ I(π(CI |oi)(ζ(t1, oi, o, D1)) =π(CI |oi)(ζ(t2, oi, o, D2))))).

In other words, if CI �→ Co, then there is a one-to-onemapping such that for each tuple t produced by that opera-tor, πCo(t) (the value of columns Co in tuple t) maps to aunique set of CI values produced by the input operator(s)I .

Definition 11 (Transitive injection). An operator o istransitively injective for Co with respect to a table T (de-noted T

∗�→ Co) iff one of the following holds:

• o = Table(T ) and Co is all of T ’s columns, or

• ∃CI((CI �→ Co)∧∀oi((oi ∈ I) → (T ∗�→ (CI |oi)))).

That is, an operator is transitively injective for a subsetof its output columns, Co, if and only if there is a one-to-one mapping such that for every tuple t produced by theoperator Table(T ), vCo(t) maps to a unique set of tuplesin Table(T ).

Finally, we say that a view with graph G is injective forC with respect to table T if its top operator, oG, is tran-sitively injective for C with respect to T . Although theseconditions may seem restrictive, most XML views of rela-tional data are injective with respect to each of their basetables. For example, the original catalog view (Fig. 5) isinjective with respect to both product and vendor.

In Section F.3, we will prove the following theorem,stating that for injective views, CreateANGraph will notproduce spurious updates if we remove the final selectioncondition in line 11; we refer to this modified version asCreateANOpt.

Theorem 3 1. Given an UPDATE event, a view graph Gwhich is injective for all columns of its top operator, andtable T , let Gaffected = CreateANGraph(UPDATE, G, T ),and Gopt = CreateANOpt(UPDATE, G, T ). Then for all

valid database transitions DT→ D′ with pruned transition

tables �T and �T , (OLD NODE, NEW NODE) ∈ R(oGopt , D′)

if and only if (OLD NODE, NEW NODE) ∈ R(oGaffected , D′).

F.2 Sufficient Conditions for Injection

For each operator o in a graph G, and a set of columnsCo, we can determine whether o is injective for Ci �→ Co,based on the type of operator:

• Project, Select, and Join. o is injective for CI �→Co if, for input operator(s) I , there exists CI such thatfor each ci ∈ CI , one of the following holds:

– ci ∈ Co, or

– If o is Project or Select: ∃c ∈ Co such that c isproduced by an injective function which takes c i

as a parameter. The most commonly-used suchfunction is the XML constructor function.

• GroupBy. o is injective for Ci �→ Co if, for its inputoperator i, there exists Ci such that for each ci ∈ Ci,one of the following holds:

– ci ∈ Co, or

– ∃c ∈ Co such that c = aggXMLFrag(ci).

It is easy to see that the view in Fig. 5 satisfies the aboveconditions. Note that the above conditions are sufficient butnot necessary for injection.

F.3 Correctness of Theorem 3

Given an injective view, can now prove a stronger versionof Lemma 1:

Lemma 3. Given a relational table T , a viewgraph G which is injective for columns C w.r.t.T , and a monotonic database transition D1

T→D2 with pruned nonempty transition table dT , letO′ = CreateAKGraph(oG, T, dT ). Then �x, y, z(x ∈ R(O′, D2) ∧ y ∈ R(oG, D1) ∧ ckvoG(y) = x ∧vC(y) = vC(z) ∧ z ∈ R(oG, D2) ∧ ckvoG(z) = x).

Proof. We prove Lemma 3 by induction on the depth of G.

Base case: depth = 1.

In this case, the view graph only consists of a singleoperator, Table(X), for some relational table X .

Suppose X �= T . Then CreateAKGraph returns ∅, sothere are no tuples x ∈ R(O′, D2); the lemma is vacuouslytrue.

Otherwise, X = T . Then CreateAKGraph(oG, T, dT )returns πT.key(Table(dT )). Suppose, by contradiction, thatthere exist x, y, z contradicting Lemma 3. By the definitionof transitive injection, C must be the set of all columns ofT ; therefore, since vC(y) = vC(z), we infer that y = z.The pruned transition table dT is either �T or �T . ByDefinition 8, if dT = �T , then z ∈ R(T, D2) implies thatz �∈ R(T, D1); otherwise dT = �T and y ∈ R(T, D1)implies that y �∈ R(T, D2). In both cases, a contradictionis reached because we had concluded that y = z.

Thus, the base case holds.

Induction Hypothesis: For a graph H of depth ≤ k, sup-pose Lemma 3 holds.

We will now show that Lemma 3 holds for a graph Gof depth k + 1. There are four cases, one for each type ofoperator except for Table (which can only occur at the leaflevel of the graph).

Case 1: oG is a GroupBy operator.

Suppose, by contradiction, that there exist tuples x, y, zcontradicting Lemma 3. We know that oG is injective forCI �→ C, where CI is some set of columns of the inputoperator, I . By the definition of injection, we know thatthe set of I-tuples used for computing vC(y) and vC(z)did not change in the transition: πCI (ζ(y, I, oG, D1)) =πCI (ζ(z, I, oG, D2)). Let Z = ζ(z, I, oG, D2) and XI ={ckvI(z)|z ∈ Z}. Finally, let Cg be the grouping columnsof oG (note that the set of grouping columns defines the keyof a GroupBy operator).

By the induction hypothesis, when CreateAKGraph isinvoked recursively (line 9), R(I ′, D2) will not contain anyxI ∈ XI . Therefore, the Join created in line 12 will notpropagate any z ∈ Z . Furthermore, there can’t be a tu-ple z′ �∈ ζ(z, I, oG, D2) such that πCg (z′) = x, becausex = πCg (z) and therefore πCg(z′) would have to be in thecontributing set Z (by Definition 9). As a result, since theJoin is not propagating any tuples t such that πCg (t) = x,the GroupBy created in line 14 will not produce x. Thisyields a contradiction, however, since our original assump-tion was that x ∈ R(O′, D2).

Case 2: oG is a Select or Project operator.

Suppose, by contradiction, that there exist tuples x, y, zcontradicting Lemma 3. We know that oG is injective forCI �→ C, where CI is some set of columns of the inputoperator, I . The canonical key for Select and Project isdefined to be the same as the key of its input, so there is ex-actly one pair of input tuples y ′, z′ such that y′ ∈ R(I, D1),z′ ∈ R(I, D2), ckvI(y′) = ckvoG(y), and ckvI(z′) =

ckvoG(z). By the definition of injection, we know thatvC(y) = vC(z) implies that vCI (y′) = vCI (z′).

Therefore, by the induction hypothesis, when Create-AKGraph is invoked recursively (line 16), R(I ′, D2) willnot contain x. However, this yields a contradiction, sinceO′ = I ′ (line 17), and our original assumption is thatx ∈ R(O′, D2).

Case 3: oG is a Join.

If CreateAKGraph returned ∅ (line 24), then there areno tuples x ∈ R(O′, D2), so the lemma is vacuously true.

If the recursive call to CreateAKGraph only returnednon-∅ for a single leg of the Join, then this call will simplyreturn I ′ (line 25), and therefore the proof is identical toCase 2.

Thus, we only need consider the case where O ′ isa union of cross-products (lines 27-30). Suppose, bycontradiction, that there exist tuples x, y, z contradictingLemma 3. We know that oG is injective for CI �→ C,where CI is a set of columns of the input operators I .Since the canonical key of Join is the concatenation ofthe keys of I , we infer that for each i ∈ I , there is ex-actly one triple of input tuples (x′

i, y′i, z

′i) such that y′

i ∈R(i, D1), z′

i ∈ R(i, D2), x′ = ckvi(y′) = ckvi(y), andx′ = ckvi(z′) = ckvi(z). By the definition of injection, weknow that vC(y) = vC(z) implies that vCi(y′

i) = vCi(z′i).By the induction hypothesis, when CreateAKGraph is

invoked recursively (line 21), R(i ′, D2) will not containx′

i. Therefore, the cross-product Ja = I ′0 × I1 (line 28)cannot contain x, since x is the concatenation of x ′

0 andx′

1, and we have just shown that x′0 �∈ R(I ′0, D2). By

the same argument, Jb (line 29) also cannot contain x, asx′

1 �∈ R(I ′1, D2). Therefore, by the semantics of the Unionoperator, the union of Ja and Jb (line 30) will not producex. Thus, we have reached a contradiction, since our origi-nal assumption was that x ∈ R(O′, D2).

Case 4: oG is a Union.

Suppose, by contradiction, that there exist tuples x, y, zcontradicting Lemma 3. Then by the definition of theUnion operator, ∃Ii ∈ I such that y ∈ R(Ii, D1), and∃Ij ∈ I such that z ∈ R(Ij , D2). Because oG is tran-sitively injective for C, it follows that Ii and Ij are bothtransitively injective for C.

If it were the case that z ∈ R(Ii, D2), this would con-tradict the induction hypothesis for Ii. Therefore, it mustbe the case that z �∈ R(Ii, D2). Because we had stipulatedthat oG is injective for cI �→ C, and y ∈ R(oG, D1) ∧z ∈ R(oG, D2), it follows from Definition 10 thatπC(ζ(y, Ii, oG, D1)) = πCζ((z, Ii, oG, D2)). BecauseckvoG(y) = ckvoG(z), therefore πCζ(y, Ii, oG, D1)) =πC(ζ(z, Ii, oG, D1)). However, this yields a contra-diction because z ∈ πC(ζ(z, Ii, oG, D1)) but z �∈πC(ζ(z, Ii, oG, D2)).

We have now proven that for a single transition, Create-

AKGraph will not produce the keys of any XML nodeswhose value did not actually change. However, this is notsufficient, as CreateAKGraph is invoked twice (once foreach partial transition). The following corollary is neces-sary to show that given a pair of partial transitions, Create-AKGraph will not produce the keys of any node thatchanged to an intermediate value due to D

T→ D′−�T , and

changed back to its original value due to D ′−�T

T→ D′.

Corollary 3.1. Given a relational table T , a view graph Gwhich is injective for columns C with respect to T , and adatabase transition D

T→ D′ with pruned transition tables�T and �T :

(a) Let O� = CreateAKGraph(oG, T, �T ). �x, y, z(x ∈ R(O�, D′) ∧ y ∈ R(oG, D) ∧ ckvoG(y) = x ∧vC(y) = vC(z) ∧ z ∈ R(oG, D′) ∧ ckvoG(z) = x);and

(b) Let O� = CreateAKGraph(oGold , Told, �T ).�x, y, z (x ∈ R(O�, D′) ∧ y ∈ R(oG, D) ∧ckvoG(y) = x ∧ vC(y) = vC(z) ∧ z ∈ R(oG, D′) ∧ckvoG(z) = x).

Proof. We prove only case (a), as (b) is analogous.Suppose that there exist x, y, z contradicting case (a)

above. We know from Lemma 3 that there must ∃w suchthat w ∈ R(oG, D′

−�T )∧ckvoG(w) = x∧vC(w) �= vC(y).Let the set S = {s ∈ R(T, D)|ckvoG(s) = x}, andsimilarly, S̄ = {s ∈ R(T, D′

−�T )|ckvoG(s) = x}. UsingLemma 1, it follows from vC(w) �= vC(y) that S �= S̄.Furthermore, from the defintion of injection, we infer fromvC(y) = vC(z) that S′ = {s ∈ R(T, D′)|ckvoG(s) =x} = S.

By the definition of pruned transition tables,�T ∩ �T = ∅, so by Definition 5 it follows fromS = S′ that S ∩ �T = ∅. However, this yields a contra-diction: we had previously concluded that S �= S̄, whichwould require s ∈ �T for some s ∈ S, and thereforeS ∩ �T �= ∅.

Finally, we can prove that CreateANOpt will not pro-duce spurious updates.

Theorem 3 2. Given an UPDATE event, a view graph Gwhich is injective for all columns of its top operator, andtable T , let Gaffected = CreateANGraph(UPDATE, G, T ),and Gopt = CreateANOpt(UPDATE, G, T ). Then for all

valid database transitions DT→ D′ with pruned transition

tables �T and �T , (OLD NODE, NEW NODE) ∈ R(oGopt , D′)

if and only if (OLD NODE, NEW NODE) ∈ R(oGaffected , D′).

Proof. It is easy to see that the “if” direction is true: fromthe definition of pruned transition tables, it follows thatpruned transition tables also satisfy the definition of transi-tion tables, so the correctness of Theorem 2 is not affectedby pruning the transition tables. The only difference then

is that CreateANOpt does not perform the final selection.Since Select can only decrease the cardinality of its in-put, there will be no affected nodes lost as a result of thechange. Thus, (OLD NODE, NEW NODE) ∈ R(oGaffected , D

′)implies that (OLD NODE, NEW NODE) ∈ R(oGopt , D

′).We prove the converse by contradiction. Sup-

pose ∃(OLD NODE, NEW NODE) ∈ R(oGopt , D′) such that

(OLD NODE, NEW NODE) �∈ R(oGaffected , D′). Since Gaffected =

Select(OLD NODE �=NEW NODE)(Gopt), it must be the case thatOLD NODE = NEW NODE. For brevity, we’ll refer to thisnode as n.

By line 10, it must be the case that n ∈ R(Oold) ∧ n ∈R(Onew). Therefore, by the definition of Onew and Oold

(lines 7 and 8), it follows that x ∈ R(Ou, D′) such thatckvoG(n) = x. From the definition of Ou (line 6), we seethat either (a) x ∈ R(O�key) or (b) x ∈ R(O�key).However, this yields a contradiction: in case (a) this vi-olates Corollary 3.1(a), and in case (b) it violates Corol-lary 3.1(b). Therefore, our initial assumption that n ∈R(oGopt , D

′) must have been false.

F.4 Additional Optimizations

In the previous section, we showed that for injective viewgraphs, we can avoid performing an expensive tagger-level comparison of OLD NODE and NEW NODE for UPDATE

events. If a view is not injective, then we generally needto keep this selection condition. In certain cases, however,we can still optimize the performance by pushing down theselection condition to the relational level.

If a view graph G is injective except for the presence of anon-injective aggregate functions (e.g. min, max, count,etc.) in GroupBy operators, such as in the example shownin Appendix E.1, then it is not necessary to compare the en-tire old and new XML nodes. Instead, it is only necessaryto compare the values of these aggregates. Since this is acomparison of numeric values with no nesting involved, itcan be pushed down to the relational engine, thus avoidingthe need to perform an expensive string comparison in thetagger.

This is just one of many possible optimizations to avoidperforming a tagger-level comparison for non-injectiveviews. A possible direction of future work is to identifythe general class of views where the top-level selection canbe pushed down to the relational level.

G Generated SQL TriggerThe SQL trigger generated for our running example of anUPDATE on vendor is shown in Fig. 23 (formatted to bemore human-readable). Since the view is injective, wedo not have to explicitly produce OLD NODE for compar-ing with NEW NODE (although we still need to compute theaggregate value on Told to ensure that OLD NODE appearedin the view before the update). The trigger first finds the

� �1CREATE TRIGGER sqlTrigger2AFTER UPDATE ON VENDOR3REFERENCING OLD_TABLE AS DELETED, NEW_TABLE AS INSERTED4FOR EACH STATEMENT56WITH PrunedIns(pid, vid, price) AS (7SELECT * FROM INSERTED EXCEPT SELECT * FROM DELETED8),9PrunedDel(pid, vid, price) AS (10SELECT * FROM DELETED

EXCEPT SELECT * FROM INSERTED11),1213AffectedKeys (name) AS (14SELECT P.name15FROM product AS P, PrunedIns AS V16WHERE P.pid = V.pid17UNION18SELECT P.name19FROM product AS P, PrunedDel AS V20WHERE P.pid = V.pid),2122ProductCount (name, numVendors) AS (23SELECT P.name, COUNT(*) AS numVendors24FROM AffectedKeys AS C, product AS P, vendor AS V25WHERE P.name = C.name26AND P.pid = V.pid27GROUP BY P.name),2829MultiVendorProduct (name) AS (30SELECT name31FROM ProductCount32WHERE numVendors >= 2),3334deltaCount (name, numVendors) AS (35SELECT P.name, 136FROM product AS P, PrunedDel AS D37WHERE P.pid = D.pid38UNION ALL39SELECT P.name, -140FROM product AS P, PrunedIns AS I41WHERE P.pid = I.pid),4243MultiVendorProduct_old (name) AS (44SELECT name45FROM (SELECT DISTINCT name, numVendors46FROM ProductCount PC, Constants1 C47WHERE PC.name = C.Const148UNION ALL49SELECT DISTINCT name, numVendors50FROM deltaCount DC, Constants1 C51WHERE DC.name = C.Const152) AS T(name, numVendors)53GROUP BY T.name54HAVING SUM(T.numVendors) >= 2),5556ProductInfo (pid, name) AS (57SELECT P.pid, P.name58FROM Product AS P, MultiVendorProduct AS MVP,59MultiVendorProduct_old AS MVP_old60WHERE MVP.name = MVP_old.name61AND MVP.name = P.name),6263outerUnion(type, pname, triggerIds, vid, price) AS (64-- Produce the <product> elements65SELECT 1, PI.name, NULL, NULL, NULL66FROM ProductInfo AS PI67UNION ALL68-- Produce the <vendor> elements69SELECT 2, PI.name, NULL, vid, price70FROM Vendor AS V, ProductInfo AS PI71WHERE V.pid = PI.pid72UNION ALL73-- Produce the <trigger> elements74SELECT 3, PI.name, C.TrigIDs, NULL, NULL75FROM Constants AS C, ProductInfo AS PI76WHERE C.value = PI.name),7778SELECT type, pname, triggerIds, vid, price79FROM outerUnion80ORDER BY type, pname, triggerIds, vid

� �

Figure 23: The generated SQL trigger.

0

50

100

150

200

1 10 100 1000 10000 100000

Ave

rage

tim

e pe

r up

date

(m

illis

econ

ds)

Number of triggers on Vendor table

UNGROUPED-EOPT-AOPTGROUPED

GROUPED-EOPTGROUPED-AOPT

GROUPED-EOPT-AOPT

Figure 24: Injective optimization: varying the number of triggers.

affected keys by taking a union of the product names asso-ciated with tuples in the pruned transition tables (lines 6-20). The trigger then computes the number of vendors foreach affected product after the update (lines 22-27), andselects only those with more than one vendor as potentialNEW NODEs (lines 29-32). Note that vendors are only com-puted for affected products by using regular query rewritetechniques to push down the join on affected keys [22, 27].The trigger then computes the number of vendors for eachaffected product before the update by using the correspond-ing values after the update and the pruned transition tables(lines 34-54); also note that the selection on the OLD NODEname is transformed into a join due to trigger grouping.Finally, the action parameters are produced using a sortedouter union (lines 63-80). Since multiple triggers can befired for the same update, the list of fired triggers is com-puted as the final leg of the sorted outer union (lines 74-76).

H Additional Experimental ResultsIn this section, we first show the results of evaluating theoptimizations for injective views when we varied # trig-gers, # fired triggers and hierarchy depth. We use EOPT todenote the approaches using this optimization.

We then show the experimental results of varying theparameter # updated nodes, # leaf tuples, and # leaf tu-ples/element for all approaches. In all experiments, we varythe parameter of interest and use default values for the rest.Note that varying these three parameters, unlike the twoparameters evaluated in Section 6, changes the number oftuples inserted into the temporary table by the SQL triggeraction. This introduces additional overhead not directly rel-evant to trigger processing. To isolate this effect, we firstcompute all the rows produced by the trigger, but only in-sert the maximum row into the temporary table (by usingthe max aggregate function), thus keeping the cost of therelational inserts constant.

H.1 Evaluating injective view optimizations

H.1.1 Varying # TriggersFigure 24 shows the performance of the different ap-proaches when we vary the number of XML triggers.

0

50

100

150

200

250

0 2000 4000 6000 8000 10000

Ave

rage

tim

e pe

r up

date

(m

illis

econ

ds)

Number of triggers fired per produced node

GROUPEDGROUPED-EOPTGROUPED-AOPT

GROUPED-EOPT-AOPT

Figure 25: Injective optimization: varying # of fired triggers.

0

50

100

150

200

250

300

350

5 4 3 2

Ave

rage

tim

e pe

r up

date

(m

illis

econ

ds)

Depth of relational hierarchy

GROUPEDGROUPED-EOPTGROUPED-AOPT

GROUPED-EOPT-AOPT

Figure 26: Injective optimization: varying hierarchy depth.

Note that for this experiment alone, we set # fired trig-gers/updated element to be 1 instead of its default valueof 100, since the default value is not applicable when #triggers is small. As shown, UNGROUPED-EOPT-AOPT

does not scale well with the number of triggers becauseit does not benefit from shared computation across trig-gers. Other UNGROUPED approaches (not shown) per-formed even worse because they do not take advantage ofthe different optimizations. In contrast, all the GROUPED

approaches scale gracefully due to the grouping optimiza-tions (note the log scale in the x-axis). This suggests thatwe can successfully employ existing grouping techniquesfor the triggers over XML views problem.

GROUPED-EOPT and GROUPED-AOPT provide a 25-35% improvement over GROUPED due to our optimiza-tions for injective views and computing aggregates. In-terestingly, these optimizations provide an additive benefit,and GROUPED-EOPT-AOPTrovides a 50% improvement inperformance over GROUPED.

H.1.2 Varying # fired triggers/updated elementFigure 25 shows the effect of varying the number of firedtriggers per updated XML element (for this and subsequentexperiments, we do not consider UNGROUPED approachesdue to their bad scalability properties). As shown, the

0

50

100

150

200

250

100 80 60 40 20

Ave

rage

tim

e pe

r up

date

(m

illis

econ

ds)

Number of produced nodes

GROUPEDGROUPED-AOPT

GROUPED-MVGROUPED-AOPT-MV

Figure 27: Varying # updated elements.

GROUPED approaches scale gracefully and linearly withthe number of fired triggers, such that the time to performa relational update is only between 150-200 millisecondseven when up to 10,000 XML triggers are activated foreach relational update. Again, as with varying # triggers,our different optimizations provide up to a 30% improve-ment in performance when compared to not applying theoptimizations.

H.1.3 Varying Hierarchy depthFigure 26 shows the effect of varying the hierarchy depth.We note that this depth refers to the depth of the relationalschema (i.e., depth of nested sets), and usually translates toa much deeper nesting in XML elements. For instance, theexample view could be written so that <price> is nestedunder <vid>; while this intuitively increases the depth ofthe XML view, it does not affect the trigger depth: the re-sulting SQL will still execute the same number of joins.Thus, to characterize performance, we use the hierarchydepth instead of the depth of the resulting XML elements.

As shown, all approaches have the same relative perfor-mance and their run time increases approximately linearlywith the hierarchy depth. This is because, as the depth in-creases, the relational trigger must evaluate more joins torecreate the hierarchy. Further, the size of the producedresult also increases because, although the number of leafnodes remains constant, the number of intermediate nodesgrows larger. The results indicate that we can get good per-formance for XML triggers even for deeply-nested XMLviews.

H.2 Varying # updated elementsFig. 27 and Fig. 28 shows the effect of varying the numberof XML elements in the view that are affected by a singlerelational update. As shown, all approaches have the samerelative performance – the running time grows slowly whenthe number of produced nodes increases.

However, the benefits of the injective view optimiza-tion (GROUPED-EOPT and GROUPED-EOPT-AOPT) are

0

100

200

300

400

500

100 80 60 40 20

Ave

rage

tim

e pe

r up

date

(m

illis

econ

ds)

Number of produced nodes

GROUPEDGROUPED-EOPTGROUPED-AOPT

GROUPED-EOPT-AOPT

Figure 28: Injective optimizations: varying # updated elements.

40

60

80

100

120

256 128 64 32 16

Ave

rage

tim

e pe

r up

date

(m

illis

econ

ds)

# leaf tuples/elements

GROUPED-EOPTGROUPED-EOPT-AOPT

GROUPED-EOPT-MVGROUPED-EOPT-AOPT-MV

Figure 29: Varying the fanout.

50

100

200

1M512K256K128K64K32K

Ave

rage

tim

e pe

r up

date

(m

illis

econ

ds)

# leaf nodes

GROUPED-EOPTGROUPED-EOPT-AOPT

GROUPED-EOPT-MVGROUPED-EOPT-AOPT-MV

Figure 30: Varying the data size.

striking because it avoids having to compute unneces-sary OLD NODEs, and this benefit increases as the num-ber of updated XML elements increases. Thus, in thiscase, GROUPED-EOPT-AOPT provides approximately a60% performance gain over GROUPED.

Now we show the performance results for optimized ap-proaches when we vary # leaf tuples per XML element and# leaf tuples.

H.3 Varying # leaf tuples per XML elementFig. 29 shows the effect of varying the fanout, the numberof leaf tuples per XML element. All optimized approacheshave the same relative performance, and there is only asmall increase in runtime as the fanout increases. This in-crease is due primarily to the fact that the OuterUnionintermediate result grows as OLD NODE and NEW NODE be-come larger.

H.4 Varying # leaf tuplesWe vary the data size by varying the number of leaf tuples;the result is shown in Fig. 30. All optimized approachesscale gracefully when the data size increases. This is be-cause, although the total number of leaf tuples increases,the number of leaf nodes in the affected XML element re-mains the same. This graph shows that our system indeedbenefits from not materializing the entire XML view, sothat we only need to compute a small fraction of leaf nodes.