order-sensitive xml query processing over relational sources

103
Order-sensitive XML Query Processing Over Relational Sources by Brian Murphy A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE In partial fulfillment of the requirements for the Degree of Master of Science in Computer Science by May 2003 APPROVED: Professor Elke Rundensteiner, Major Thesis Advisor Professor Lee Becker, Thesis Reader Professor Micha Hofri, Head of Department

Upload: others

Post on 03-Feb-2022

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Order-sensitive XML Query Processing Over Relational Sources

Order-sensitiveXML Query ProcessingOver Relational Sources

by

BrianMurphy

A Thesis

Submittedto theFaculty

of the

WORCESTERPOLYTECHNIC INSTITUTE

In partialfulfillment of therequirementsfor the

Degreeof Masterof Science

in

ComputerScience

by

May 2003

APPROVED:

ProfessorElkeRundensteiner, Major ThesisAdvisor

ProfessorLeeBecker, ThesisReader

ProfessorMichaHofri, Headof Department

Page 2: Order-sensitive XML Query Processing Over Relational Sources

Abstract

XML is an emerging standardformat for dataon the Web aswell asin businessappli-

cations. In order to storeandaccessthis information in an efficient manner, database

technologymustbeutilized. A relationaldatabasesystem,themostestablishedandma-

turetechnologyfor queryprocessingandstorage,createsa strongfoundationfor suchan

XML datamanagementsystem.However, while relationaldatabasesarebasedon SQL

queries,theoriginal userqueriesarewritten in XQuery, an XML querylanguage.This

XML querylanguagehassupportfor order-sensitivequeriesasXML is anorder-sensitive

markuplanguage.

A major problemhasbeendiscoveredwith loading XML in a relationaldatabase.

That problemis the lack of native SQL supportfor andmanagementof orderhandling.

While XQuery hasorderandpositionalsupport,SQL doesnot have the samesupport.

For example,individualswhowereviewing XML informationaboutmusicalbumswould

haveahardtimequeryingfor thefirst threesongsof atracklist from arelationalbackend.

MappingXML documentsto relationalbackendsalso proves hard as the datamodels

(hierarchicalelementsversusflat tables)aresodifferent.

For thesereasons,andother purposes,the Rainbow Systemis being developedat

WPI as a systemthat bridgesXML dataand relationaldata. This thesisin particular

dealswith the algebraoperatorsthat affect order, ordersensitive loadingandmapping

of XML documents,and the pushdown of orderhandlinginto SQL-capablequeryen-

gines. The contributionsof the thesisaretheorder-sensitive rewrite rules,new XML to

relationalmappingswith differentorderstyles,order-sensitivetemplate-drivenSQLgen-

eration,anda proposedmetadatatable for order-sensitive information. A systemthat

implementstheseproposedtechniqueswith XQuery as the XML query languageand

Page 3: Order-sensitive XML Query Processing Over Relational Sources

Oracleasthebackendrelationalstoragesystemhasbeendeveloped.Experimentswere

createdto measureexecutiontime basedon variousfactors.First, scalabilityof thesys-

temasbackenddatasetsizegrowsis studied.Second,scalabilityof thesystemasresults

returnedfrom thedatabasegrows,andfinally, queryexecutiontimeswith differentload-

ing typesareexplored.Theexperimentalresultsareencouraging.Queryexecutionwith

therelationalbackendprovesto bemuchfasterthannativeexecutionwithin theRainbow

system.Theseresultsconfirmthepracticalutility of ourproposedorder-sensitiveXQuery

executionsolutionover relationaldata.

2

Page 4: Order-sensitive XML Query Processing Over Relational Sources

Acknowledgements

First, I would like to thankmy advisorProf. Elke Rundensteiner. Shepushedand

proddedmethewholeway throughthis Thesis.Without her, noneof this wouldbehere.

Also to my reader, Prof. Lee Becker. I kept him in the dark throughoutmost of my

researchandwriting phases,but every time I saw him, hehadnothingbut positive things

to say.

Second,I would like to thankmy fellow Rainbow teammembersnew andold: Brad

Pielech,MukeshMulchandani,Steffen Christ, Katica Dimitrova, Luping Ding, Song

Wang,Ling Wang,andMagedEl Sayedwith specialthanksto Xin Zhang. Also many

thanksto my DSRGofficemates,again,old andnew: Li Chen,HongSu,Yali Zhou,and

AndreasKoeller.

Finally, I would like to thankmy friendsandfamily for all theirhelpandsupport.

i

Page 5: Order-sensitive XML Query Processing Over Relational Sources

Contents

1 Intr oduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Stateof theQueryProcessingSystems. . . . . . . . . . . . . . . . . . . 2

1.3 ProblemDefinition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 My Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.6 ThesisOutline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 RelatedWork 6

2.1 XML QueryProcessing. . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 OrderBasedQueries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 TemporalDataBases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Background 9

3.1 XML Schema. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2 XML Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.3 XQueryStatements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.4 DefaultXML View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Rainbow 12

ii

Page 6: Order-sensitive XML Query Processing Over Relational Sources

4.1 SystemOverview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.2 Order-sensitiveRainbow Modifications . . . . . . . . . . . . . . . . . . 13

4.3 DataFlow of theOrder-SensitiveSystem. . . . . . . . . . . . . . . . . . 14

5 XML Algebra and Rewrite Rules 15

5.1 XAT DataModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5.2 XAT BindingTable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5.3 XAT Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5.3.1 Select . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5.3.2 PositionFunction . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5.3.3 GroupBy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5.4 XAT Decorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5.5 XAT Rewrites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5.5.1 GroupBy/FunctionReplace . . . . . . . . . . . . . . . . . . . . 21

5.5.2 SingleStepOrderRewrite . . . . . . . . . . . . . . . . . . . . . 22

5.5.3 Multi StepOrderRewrite . . . . . . . . . . . . . . . . . . . . . . 22

6 XML DocumentLoading 24

6.1 RelationalStorage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6.2 OrderPreservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6.2.1 LocalOrder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6.2.2 GlobalOrder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6.2.3 Full PathOrder . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6.3 Guidelinesfor OrderingMethods. . . . . . . . . . . . . . . . . . . . . . 27

6.4 Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

6.4.1 Inline Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

6.4.2 EdgeLoading . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

iii

Page 7: Order-sensitive XML Query Processing Over Relational Sources

6.5 GeneralLoading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6.6 MetadataTable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6.7 QueryComposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6.8 View Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

7 Order-basedMethodology 35

7.1 TheOrder-SensitiveXQuery . . . . . . . . . . . . . . . . . . . . . . . . 35

7.2 PositionOperatorReplace . . . . . . . . . . . . . . . . . . . . . . . . . 36

7.3 MetadataTableQuery. . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

7.4 Main MemoryPositionExecution . . . . . . . . . . . . . . . . . . . . . 38

8 Order-basedTemplateHeuristics 39

8.1 Order-SensitiveSQL grammar . . . . . . . . . . . . . . . . . . . . . . . 40

8.2 TemplateCompletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

8.3 GeneralTemplateDiscussion. . . . . . . . . . . . . . . . . . . . . . . . 44

9 Implementation 45

9.1 SQL Generation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

10 Experiments 51

10.1 SystemSetup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

10.2 ExperimentalEvaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 52

10.2.1 SingleStepQueryExperiments . . . . . . . . . . . . . . . . . . 52

10.2.2 Multi StepQueryExperiments. . . . . . . . . . . . . . . . . . . 55

10.3 Experimentswith VaryingSelectivity . . . . . . . . . . . . . . . . . . . 57

10.3.1 SingleStepQueryExperiments . . . . . . . . . . . . . . . . . . 57

10.3.2 Multi StepQueryExperiments. . . . . . . . . . . . . . . . . . . 58

10.4 ExperimentalSummary. . . . . . . . . . . . . . . . . . . . . . . . . . . 59

iv

Page 8: Order-sensitive XML Query Processing Over Relational Sources

11 Conclusions 61

11.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

11.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

11.3 FutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

A Loading Queries 67

A.1 SchemaQueries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

A.1.1 EdgeLoadingSchemaQuery . . . . . . . . . . . . . . . . . . . 67

A.1.2 Inline LoadingSchemaQuery . . . . . . . . . . . . . . . . . . . 68

A.2 Step0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

A.2.1 LocalEdgeLoadingStep0 . . . . . . . . . . . . . . . . . . . . . 69

A.2.2 GlobalEdgeLoadingStep0 . . . . . . . . . . . . . . . . . . . . 71

A.2.3 LexicographicalEdgeLoadingStep0 . . . . . . . . . . . . . . . 73

A.2.4 Local Inline LoadingStep0 . . . . . . . . . . . . . . . . . . . . 75

A.2.5 GlobalInline LoadingStep0 . . . . . . . . . . . . . . . . . . . . 77

A.2.6 LexicographicalInline LoadingStep0 . . . . . . . . . . . . . . . 79

A.3 Step1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

A.3.1 EdgeLoadingStep1 . . . . . . . . . . . . . . . . . . . . . . . . 81

A.3.2 Inline LoadingStep1 . . . . . . . . . . . . . . . . . . . . . . . . 82

A.4 Step2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

A.4.1 EdgeLoadingStep2 . . . . . . . . . . . . . . . . . . . . . . . . 84

A.4.2 Inline LoadingStep2 . . . . . . . . . . . . . . . . . . . . . . . . 84

B View Queries 86

B.0.3 EdgeLoading . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

B.0.4 Inline Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

v

Page 9: Order-sensitive XML Query Processing Over Relational Sources

C Recordlist XML Schema 87

D Default XML View 88

vi

Page 10: Order-sensitive XML Query Processing Over Relational Sources

List of Figures

3.1 RecordList Schemain theCondensedW3C Format[18] . . . . . . . . . 9

3.2 RecordList XML Document . . . . . . . . . . . . . . . . . . . . . . . . 10

3.3 OrderSensitiveXQueryQ1 . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.4 XML DocumentResultof XQueryQ1 . . . . . . . . . . . . . . . . . . . 10

3.5 OrderSensitiveXQueryQ2 . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.6 Resultof XQueryQ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.1 TheRainbow SystemArchitecture . . . . . . . . . . . . . . . . . . . . . 13

5.1 OrderSensitiveXQueryQ1asXAT TreebeforeDecorrelation . . . . . . 18

5.2 OrderSensitiveXQueryQ2asXAT TreebeforeDecorrelation . . . . . . 19

5.3 Decorrelationexamplewith queryQ2 . . . . . . . . . . . . . . . . . . . 20

5.4 OrderSensitiveXQueryQ1asXAT TreeafterDecorrelation . . . . . . . 20

5.5 OrderSensitiveXQueryQ2asXAT TreeafterDecorrelation . . . . . . . 21

5.6 Q1 XAT SegmentRewriting . . . . . . . . . . . . . . . . . . . . . . . . 22

5.7 Q2 XAT SegmentRewriting . . . . . . . . . . . . . . . . . . . . . . . . 23

6.1 LocalOrderEncoding:SiblingOrderTraversal . . . . . . . . . . . . . . 25

6.2 GlobalOrderEncoding:Post-OrderTraversal . . . . . . . . . . . . . . . 26

6.3 Full PathOrderEncoding. . . . . . . . . . . . . . . . . . . . . . . . . . 27

6.4 Full PathOrderEncodingwith UpdateSn . . . . . . . . . . . . . . . . . 27

vii

Page 11: Order-sensitive XML Query Processing Over Relational Sources

6.5 RelationalInlining Strategy with LocalOrder . . . . . . . . . . . . . . . 30

6.6 RelationalEdgesStrategy with GlobalOrder . . . . . . . . . . . . . . . 30

6.7 View (Extraction)Queryfor Q1(Figure3.3)with Inline Strategy . . . . . 34

7.1 MetdataQueryfor Q1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

7.2 QueryQ1 asXAT Treewith Rewrite . . . . . . . . . . . . . . . . . . . . 37

7.3 QueryQ2 asXAT Treewith Rewrite . . . . . . . . . . . . . . . . . . . . 37

8.1 SQL Grammarfor Order-SensitiveQueries . . . . . . . . . . . . . . . . 40

8.2 TheTemplatefor our ExampleFilled . . . . . . . . . . . . . . . . . . . . 42

8.3 QueryQ1 from Figure7.2afterSQLGeneration. . . . . . . . . . . . . . 43

8.4 QueryQ2 from Figure7.3afterSQLGeneration. . . . . . . . . . . . . . 43

8.5 QueryQ1 overanInline Loadingwith AlphanumericOrder. . . . . . . . 44

8.6 QueryQ1 overanEdgeLoadingwith AlphanumericOrder . . . . . . . . 44

9.1 OrderSensitiveXQueryQ3 . . . . . . . . . . . . . . . . . . . . . . . . . 48

9.2 XML DocumentResultof XQueryQ3 . . . . . . . . . . . . . . . . . . . 48

9.3 Q3 IncorrectSQLStatementfor Inline GlobalLoading . . . . . . . . . . 49

9.4 Q3 CorrectSQL Statementfor Inline GlobalLoading . . . . . . . . . . . 49

viii

Page 12: Order-sensitive XML Query Processing Over Relational Sources

List of Tables

6.1 Order-BasedMetadataTablefor theInlinedRecordlistDocument . . . . 31

8.1 CondensedOrder-BasedMetadataTablefor theInlinedRecordlistDocu-

ment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

10.1 QueryQ1 Executedwith GlobalEdgeLoading . . . . . . . . . . . . . . 52

10.2 QueryQ1 Executedwith LocalEdgeLoading . . . . . . . . . . . . . . . 53

10.3 QueryQ1 Executedwith GlobalInline Loading . . . . . . . . . . . . . . 54

10.4 QueryQ1 Executedwith Local Inline Loading. . . . . . . . . . . . . . . 54

10.5 QueryQ2 Executedwith GlobalInline Loading . . . . . . . . . . . . . . 55

10.6 QueryQ2 Executedwith Local Inline Loading. . . . . . . . . . . . . . . 56

10.7 QueryQ1 with Selectivity Executedwith GlobalEdgeLoading. . . . . . 56

10.8 QueryQ1 with Selectivity Executedwith GlobalInline Loading . . . . . 57

10.9 QueryQ2 with Selectivity Executedwith GlobalEdgeLoading. . . . . . 58

10.10QueryQ2 with Selectivity Executedwith GlobalInline Loading . . . . . 58

ix

Page 13: Order-sensitive XML Query Processing Over Relational Sources

Chapter 1

Intr oduction

1.1 Moti vation

XML [1] hasbecomethe standardmarkuplanguagefor documentson the World Wide

Webandwithin thebusinesscommunity. In many instances,thesedocumentsarestored

in hard to queryfile systems.A bettersolutionwould be to storethemin a relational

database[11], which hasbeenproven throughtime to be highly optimizedfor queries,

secureandmature. Theseare qualitieswhich aredesiredby the businesscommunity.

However, utilizing bothtechnologiesin harmony canbea difficult task.Storingor load-

ing XML documentsinto relationaltablesis notalwaysa losslessoperation.Information

regardingorderstructure,hierarchicalandsibling orderspecificallymay be lost in the

relationaldatabase.This orderingof elementinformation,despitebeingimplicit, is crit-

ical to documentreconstructionandthusto correctqueryprocessing.Queriesbasedon

this implicit documentordercanalsobeimportantto theenduser. For example,suppose

theenduserswantsto know thesecondsongtitle from a tracklist. This querywould be

rathersimplein XQuery, but moredifficult to producein SQL.Hencea systemneedsto

beimplementedwhich caneffectively manageflexible order-sensitive documentloading

1

Page 14: Order-sensitive XML Query Processing Over Relational Sources

aswell asefficientorder-sensitivequeryprocessing.

1.2 Stateof the Query ProcessingSystems

Currentlysystemsexist to queryover XML documentsusingXQuery, the now almost

standardXML querylanguage.Many [13, 9] areconsiderednative systemsin that their

datais representedandstoredin someproprietaryXML specificrepresentation,suchas

DOM [16]. Thesesystemsareusefulasa proofof concepts,but in thelong run probably

won’t beefficient for largedocumentsthatcouldbegigabytesin size,with many queries

over them. A few of thesesystemshandlethe orderproblem,often by simply keeping

theXML dataaswell asintermediateresultsin physicalXML documentorder[13]. This

however is basedontheir inefficientstorage,whichmakestheirsystemsapoorchoicefor

the businesscommunity. For this reason,many systems[8, 21, 6] arebeginning to use

relationalbackends.OnceXML documentshave beenmappedto relationalschemas,a

techniquefor order-sensitive queryexecutionis thenrequired. The translationof XML

querylanguagesto SQLwhilepreservingorderremainsachallengingtask.Someprojects

do a direct translation– syntaxto syntax[2, 6], while otherprojectstranslatefrom syn-

tax to algebrarepresentationsto syntax[8, 10]. Regardlessof the translationstrategy,

theprocessis complicatedwhenit comesto order-sensitive translation.Currently, only

oneproject[15] attemptsto solve this issue,andit is doneby a direct syntaxto syntax

translation.While this methodmaysolve an immediateneed,if a new methodof order

storagewasproposed,this groupwould have to recodetheir syntaxto syntaxtranslator.

TheRainbow systemwasdesignedin partto solve theorderproblemandthis thesisnow

focuseson theseissues.

2

Page 15: Order-sensitive XML Query Processing Over Relational Sources

1.3 ProblemDefinition

Considerthefollowing queries.“Return thesecondsongtitle of every album”, “Return

thefirst 3 bandnamesof all bandsfrom Boston”,“Of all thesongsin a discography, re-

turn thesixth title.” Thesequeriesarealmostimpossibleto createwith a set-basedquery

language,suchasSQL, which hasno conceptof orderedsets. XQuery, however, uti-

lizesorderedsetsandcanexecutethesequeries.Providing a middlewarefrom relational

storageto XML representationthatwill allow us to createsuchqueriesin XQuery and

executethemin SQL is thegoal.

Beforethisgoalcanbereached,therearemany researchissueswhichmustbesolved.

First, the questionis if XML documentscanalwaysbe genericallydecomposedinto a

relationalstructure,thengenericallybecomposedbackto theoriginaldocumentwithout

losingany informationespeciallyany order-sensitiveinformation?Secondly, how canthis

order-sensitive informationbestoredwithin the relationaldatabase?Next, is it possible

to pushdown all order-sensitiveexecutionto therelationalbackend,or mustcertainparts

beexecutedby thenative system?If this order-sensitive executioncanbepusheddown,

how canit properlybeexecutedoveraorder-insensitiverelationalbackend?Finally, what

is thecostof thispushdown solution?

1.4 My Approach

In this thesisI amproposingto processandexecuteorder-sensitiveXQueryqueriesusing

the following steps. First XML documentsandschemasmustbe loadedandstoredin

the relationaldatabasesuchthat their order-sensitive information is kept. All required

implicit order-sensitive informationis madeexplicit. This explicit informationmustalso

becaptured.To do this,anorder-sensitivemetadatatableis createdthatcontainsthis in-

formationfor eachschemaandeachchosenloadingstrategy. Order-sensitiveuserqueries

3

Page 16: Order-sensitive XML Query Processing Over Relational Sources

mustthenbe mergedwith queriesthat containknowledgeof theseloadingsin orderto

composetheminto acompletequery.

Thismergedqueryis thentranslatedto thesystem-specificalgebratree,XAT (relevant

algebraoperatorsarediscussedin Chapter5). This queryis massagedusingequivalence

rulesin orderto optimizethetreebeforeSQLgenerationtakesplace.Algebratreeswhich

containorder-sensitive position informationarerewritten with extra metadatainforma-

tion. Rewrite rulesexist to createspecialorder-sensitive operatorsin their place. These

specialoperatorsalert SQL generationthat an order-specificquerywasissued. This is

fully describedin Section5.5.1.

Oncethe tree is sufficiently optimized,asdeterminedby the system,the operators

which canbe translatedto SQL aretranslated.As eachoperatoris translated,specific

portionsof an SQL statementarecreated.Whenthe newly createdorder-sensitive op-

eratorsarereached,specialSQL templatesareusedandfilled in with informationfrom

thealgebratreeaswell asinformationfrom themetadatatable.ThefinalizedSQL state-

ment(s)are thenexecutedover the relationaldatabase,with the resultsreturnedto our

system. Finally Rainbow executesall the remainingnon-SQLtranslatableoperatorsin

mainmemory. Thefinal resultis thenreturnedto theuser.

1.5 Contrib utions

Togetherwith myadditionto theRainbow system,Rainbow now will bethefirst systemto

processorder-basedquerieswith algebraoptimizations,notasyntaxto syntaxtranslation.

Eachcontribution is listedspecificallybelow:

1. Createddifferentloadingqueries,basedon3 differentorderencodings

2. Createda framework to allow generalmappingsandgeneralencodings

4

Page 17: Order-sensitive XML Query Processing Over Relational Sources

3. CreatedRewrite rulesfor order-sensitivecomputationpushdown

4. Designedthemetadatatablefor order-sensitive informationmanagement

5. Designedamethodfor generalorder-sensitiveSQL generationvia templates

6. Implementedthis work aspartof theRainbow system

7. Designedandconductedexperimentsto testtheorder-sensitivesystem

1.6 ThesisOutline

The next chapterdetailsthe relatedwork for the thesis. Chapter3 gives background

informationonXML andXQuery, while Chapter4 is ageneraloverview of theRainbow

System. Chapter5 discussesthe XML Algebra Tree of Rainbow as well as Operator

Rewrites. The following chapterthendealswith orderbasedloadingqueriesandview

queries. Chapter7 describesorder-basedheuristics,while Chapter8 describesorder-

basedtemplateheuristics.Chapter9 discussesour systemimplementation.Experimental

resultsareshown in Chapter10. Conclusionsandcontributionsaregivenin Chapter11.

Therestof thedocumentis dedicatedto Appendices.

5

Page 18: Order-sensitive XML Query Processing Over Relational Sources

Chapter 2

RelatedWork

2.1 XML Query Processing

TheKweelt[13] projectis anXML queryenginethatusesQuilt [3] asthequerylanguage.

Quilt, the predecessorto XQuery, is an ordersensitive language.It candetermine,for

example,what theglobally 3���

elementof theoverall documentis or what the2���

child

is relative to a specificelementin the document.Kweelt, however, doesnot supporta

relationaldatabaseasbackend,ratherit worksdirectlyonnativeXML documentsthrough

a file directorystructure.This is not a very sophisticatednor efficient documentstorage

method.Also, sinceorderis implicitly preserved throughouttheir system(it appearsby

physicalsequentialorderin main-memorydatastructures),little canbelearnedfrom their

projectin respectto relationalbackends.

Anotherprojectwith many of thesamebenefitsasKweelt is the Timber project[9]

from theUniversityof Michigan.This is a nativeXML queryengine,suchthattheXML

datais actuallystoredin a graphor treestructureon disk. Following W3C standards,

thedatais storedin theappropriatedocumentorder. This allows thequeryresultsto be

returnedcorrectlyevenwhenqueriesdonothaveorder-specificpredicates.Unfortunately,

6

Page 19: Order-sensitive XML Query Processing Over Relational Sources

this projectdoesnot rely on a RelationalBackendbut ratheranobjectstoragemanager,

Shore[14]. While theremaybebenefitsto usingShore,a low-level storagemedium,the

Rainbow systemwould have to be redesignedfrom scratch.We insteadaim to exploit

relationaltechnologies.

[22] is anXML queryengine.Thisenginecreatesanalgebra,whichpassesdatatables

betweenthenodeswithin thealgebra.Thecurrentsystem,in orderto handleorder, simply

createsa final columnthat is filled with integersthat containpositionalinformation. If

howevera queryonly requiresone2���

value,thesystemwouldpull toomany valuesout

of the relationalbackend,only to discarda majority of themwithin a few steps. This

causesresourcesto begreedilyconsumed,whichfurthercausesthesystemto slow down.

This is notanefficientquerysystem.

Many projects[2, 8, 24, 21, 6, 10] do userelationalbackends.They allow for the

translationof a particularXML Querylanguageinto SQL for thequeryingof XML doc-

uments.However, theseprojects,for thetime being,have notyet focusedon theimplicit

orderof thesedocuments.Thefuturework sectionof eachof thesepaperslists document

orderasa taskfor thefuture.

TheXPERANTO groupin [15] is theonly work we areawareof thusfar thatworks

with XML, SQL andthe implicit orderof the documents.In their paper, they describe

how they directlytranslateXQuerystatementsinto SQL”order-based”statements,syntax

to syntax.Part of theentireXPERANTO systemusesalgebratranslation,but thisportion

of their work doesnot. By not usingtheir algebra,XQGM, theSQL statementsappear

to be always”hardcoded.” If an individual createda new mappingstrategy, a Database

Administratorwouldhave to stepin andcreatesomepossiblynew strategy for thisoneto

onemappingfrom XQueryto SQL.Rainbow avoidsthisby beingmoregeneric.Rainbow

first translatestheXQueryinto analgebra,optimizesthisalgebraasmuchaspossible,and

thentranslatesamajority of thealgebrato SQL.

7

Page 20: Order-sensitive XML Query Processing Over Relational Sources

2.2 Order BasedQueries

Attemptingto changethenatureof SQL in orderto extendit towardssupportingordered

queriesis anotherpossibility. In SRQL [12], addingthe conceptof positionandorder

to relationaldatabasesis solvedby extendingtheSQL operatorsetandby creatinga hi-

erarchicalorderingstrategy. This hierarchicalorderingstrategy extendsthe conceptof

a relationalsetto includea setof groupingattributesaswell asa setof sequencingat-

tributes.Thisprojectdealswith orderingelementsbasedon thegroupingandsequencing

attributeswhich aredeterminedon thefly. Basedon this on demandsequencing,all tu-

plesarethenorderednumerically. This orderingtechnique,however, is not appropriate

for XML elementordering,especiallywhenSRQL determinesorderbasedon the nu-

mericrankof anelement’sdata(in amannersimilar to TemporalDatabases,asdiscussed

in thenext section),notbasedon locationwithin theoriginaldocument.

2.3 Temporal Data Bases

Anotherpossibility could be storing the XML in temporaldatabases.While temporal

databasesandtemporalquerylanguagesfocuson tupleswith timestampsrelatedto time

order, XML documentscontaindocumentorder. Thetwo orderconceptsarein two differ-

entdomains.Attemptingto forceXML documentsto usetime domainsis too awkward

of a concept[19], especiallywhenconsideringtheirrelevantdetails,suchastime granu-

larities (minutesversushours)or thedifferentpossiblycalendars.Thesedetailsbearno

relevanceon documentordernor on how a hierarchicaldocumentcould be orderedin

suchamanner. Also temporaldatabasesandtemporalquerylanguagesarenot supported

widely in relationaldatabasesor relationalquerylanguages.For thesereasonstemporal

databaseconceptsarenot idealfor XML order-sensitivequeryprocessing.

8

Page 21: Order-sensitive XML Query Processing Over Relational Sources

Chapter 3

Background

3.1 XML Schema

For therestof thedocument,theXML schemawhichrefersto MusicalRecords,asshown

in Figure3.1, will be usedin examples.This schemais useful for explaining alternate

mappingqueriesin Chapter6. Thefull W3CXML Schemais shown in AppendixC.

RECORDLIST�

SHORT PLAY�

BAND�string� ,

LABEL�string� ,

SONG�string� *

]*]

Figure3.1: RecordList Schemain theCondensedW3C Format[18]

Theschemain Figure3.1,calledtheRECORDLISTschema,hasarootRECORDLIST,

with zeroor moreSHORT PLAYs, eachwith oneBAND of typestring,oneLABEL of

typestringandtwo SONGsof typestring.

9

Page 22: Order-sensitive XML Query Processing Over Relational Sources

3.2 XML Data

<recordlist><short_play>

<band>Misfits</band><label>blank</label><song>Cough/Cool</song><song>She</song>

</short_play><short_play>

<band>Misfits</band><label>Plan 9</label><song>Bullet </song><song>We Are 138</song>

</short_play><short_play>

<band>Project X </band><label>Schism</label><song>SXE Revenge</song><song>Shutdown </song>

</short_play></recordlist>

Figure3.2: RecordList XML Document

�RESULT

FOR$recordin document(”r.xml”)/SHORT PLAYRETURN�

SONG $record/SONG[2]/text()�

/SONG�/RESULT

Figure3.3: OrderSensitiveXQueryQ1

�RESULT �

SONG She�

/SONG�SONG We Are 138

�/SONG�

SONG Shutdown�

/SONG�/RESULT

Figure 3.4: XML Document Result ofXQueryQ1

TheXML documentin Figure3.2conformsto theRECORDLISTschemain Figure

3.1.

3.3 XQuery Statements

The XQuery [17] statementQ1 in Figure3.3 that canbe interpretedas, ”Return every

secondSONGof eachSHORT PLAY in the XML document,r.xml”, canbe executed

over the RECORDLISTschemato return somesimple orderedinformation. The re-

sulting XML documentis shown in Figure 3.4. SinceRECORDLISThasonly three

SHORT PLAYs with SONGsin the2���

position,thesethreeSONGsarereturned.This

queryis anexampleof a querybelongingto thesinglestepclass, whereeachorderpred-

icateonly refersto asinglestep.

TheuserXQuerystatementQ2 in Figure3.5 is similar to Q1exceptit hasasmallbut

importantvariation;Q2 doesnot grouplocally. Instead,Q2 “groups” over the root ele-

10

Page 23: Order-sensitive XML Query Processing Over Relational Sources

mentasdenotedby theparenthesisalongtheXPath( ��� ������� ������������! "�#%$'&(�)�*��+-,�. ).Giventhat �� ������� �� is theroot, this requiresorderwith respectto theroot. QueryQ2 can

beinterpretedas”ReturnthefourthSONGof ALL theSHORT PLAYs in theXML doc-

ument,r.xml.” The resultingXML documentis shown in Figure3.6. This query is an

exampleof aquerybelongingto themulti-stepclass, whereeachorderpredicaterefersto

multiple steps.

�RESULT

FOR$recordin document(”r.xml”)RETURN�

SONG ($record/SHORT PLAY/SONG)[4]/text()�

/SONG�/RESULT

Figure3.5: OrderSensitive XQueryQ2

�RESULT �

SONG We Are 138�

/SONG�/RESULT

Figure3.6: Resultof XQueryQ2

3.4 Default XML View

ThedefaultXML view (DXV) is thedirectinstantiationof therelationalbackendschema

anddatain XML format.A DXV is createdby restructuringtheoriginalXML document

sothatit canbemappedin aoneto onemethodto therelationalbackend.This is accom-

plishedthroughvariousmethodsof mapping,asdiscussedin Chapter6. The example

shown in AppendixD is createdby theInline methodwith globalordering.

11

Page 24: Order-sensitive XML Query Processing Over Relational Sources

Chapter 4

Rainbow

4.1 SystemOverview

TheRainbow systemis acompleteXML datamanagementsystem.It canloadXML doc-

umentsinto relationaldatabases.XML Queriesareexecutedover therelationaldatabase

aftera translationto theXML AlgebraTree(XAT) andthento SQL.Beforesuchquery

processingcanbeenabled,wemusttranslatetheXML schemainto therelationalschema,

andsecondlyloadtheXML datainto therelationalschema.Moredetailson this loading

will bediscussedbelow. Detailedaccountsof eachparticularloadingcanbefoundin [4].

The Rainbow systemarchitectureis shown in Figure4.1. The systemis composed

of threemajor modulesanda relationalbackend.The first beingtheLoadingManager.

This part of the systemis responsiblefor managingthe flexible loading of the XML

documentswith schemasto therelationalbackend.Thesecondmajorcomponentis the

ExtractionManager. This componentis essentiallythe reverseof theLoadingManager.

It extractstheloadedinformationfrom therelationaldatabase.Thefinal componentis the

XML QueryEngine.This componentis responsiblefor creatingsystemspecificalgebra

treesfrom the userXQuery statements,optimizing thesetrees,andexecutingthem in

12

Page 25: Order-sensitive XML Query Processing Over Relational Sources

conjunctionwith the relationalbackend.For eachof thecomponents,orderplaysa key

role. Data is loadedwith order knowledgeandextractedwith order knowledge. The

algebratreesarealsooptimizedwith order-specificconsiderations.

Load

ing

Man

ager

Ext

ract

ion

Man

ager

XML Repository

Optimizer

SQL

Relational Engine

QueryResult

XML Publishing

XML Query XML

User

XM

LQ

uery

Eng

ine

SubSystem

Data

Process

Legend

XML Algebra Tree

XQueryEngine

XML Schema XML Data

DefaultXML View

XMLSource

Wrapper

Default XML Schema

Schema generation Data Loading

XQueryEngine

DefaultXML View

XMLSource

Wrapper

Default XML Schema

Data ExtractionSchema generation

XML View

User

Figure4.1: TheRainbow SystemArchitecture

4.2 Order-sensitiveRainbow Modifications

For this thesiswork, I had to updateseveral moduleswithin the optimizer and select

portionsof theoverall Rainbow framework. First, view querieshadto be segmentedto

becomeorder-sensitive, asshown in Chapter6.8. Thentheoptimizerrequiredposition-

sensitive rewrite rulesto becreated,asshown in Section5.5. The SQL module,aspart

of theoptimizer, requiredorder-sensitiveSQL generationcapabilitiesaswell. To accom-

plish this, metadatainformation thenneededto be storedwithin the RelationalEngine

(Section6.6).

13

Page 26: Order-sensitive XML Query Processing Over Relational Sources

4.3 Data Flow of the Order-SensitiveSystem

Thefirst stepin thenow upgradedRainbow systemis to loadtheXML documentwith its

schemainto thesystem.Thisis doneby choosingaparticularorderencodingmethod(the

explicit order-sensitive capturein a numericor alpha-numericformat)anda datamodel

mappingmethod(the methodin which theXML documentwill beshreddedandstored

in thedatabase).Both of thesemethodsarewritten in XQuery. Whenexecutedover the

original documentandschema,a DefaultXML View is created.This view canthenbe

directly loadedone-to-oneinto therelationalengine.After thedocumenthasbeenloaded

into theRDB with explicit ordercapture,a metadatatablemustbecreatedthatcaptures

theexplicit orderinformationof the loading. With this loading,a particularview query

mustbecreatedto extract the informationfrom theDXV for thecreationof theoriginal

document,againwith orderpreservation. At thispoint,anorder-sensitiveuserquerycan

beissuedagainsttheview. Thequeryenginewould mergetheview querywith theuser

query. This mergedqueryis translatedinto analgebratreethatcanberewritten. Special

rewrite rulescanthenbeappliedto theorder-sensitiveportionsof thealgebra.Whenthe

treeis optimized,it will betranslatedto SQL statements.Order-sensitive portionsof the

algebraaretranslatedvia specialorder-sensitiveSQLTemplates.Thisnewly createdSQL

statementis executedover therelationalengine,which returnstheresultingtuplesto the

Rainbow system.Rainbow thenexecutesany remainingqueryoperatorsthatcouldn’t be

translatedto SQL.Thisfinal order-sensitiveXML resultwill bereturnedto theuser.

14

Page 27: Order-sensitive XML Query Processing Over Relational Sources

Chapter 5

XML Algebra and Rewrite Rules

Thealgebrafor theRainbow systemis calledXML AlgebraTree,or XAT [26]. An XAT

is madeupof algebranodes,thatarein partbasedonXPERANTO [2] aswell asNiagara

[8]. Thesealgebranodesoperateover thedatamodelof thesystem,theXAT Tables.For

a full descriptionof the algebranodes,pleaserefer to [26]. In this document,we only

discussa subsetof algebranodes,namelythosethat are most relevant for order-based

queries.

5.1 XAT Data Model

The XAT datamodel consistsof an order-sensitive table composedof order-sensitive

columnsandtupleswhereorderingamongthetuplesis essential.This tableis calledthe

XAT table. It is basedon the W3C’s XQuery 1.0 Formal Semantics,whereevery cell

valueof a tuplein agivencolumncanconsistof:

/ anatomicvalueor

/ a node:anXML element,XML documentor anXML attributeor

/ a collection:anorderedbagof itemsthatcontainsatomicvaluesor nodes.

15

Page 28: Order-sensitive XML Query Processing Over Relational Sources

Eachcolumnhasa name. This nameis associatedwith a binding, which is either

createdexplicitly by theuserqueryor is a temporaryvaluecreatedby thesystem.

5.2 XAT Binding Table

The XAT Binding hashtablecontainsall one-to-onemappingsfrom variablesto XML

elements,whetherthey are usercreated,suchas $record,or systemcreated,suchas

$col3. The bindingsareusedaskeys of the hash,whereeachkey’s value is the XML

element’s path or an alias to anotherkey. For example,in QueryQ1 $recordmapsto

“/RECORDLIST/SHORT PLAY” and$col2to “/RECORDLIST/SHORT PLAY/SONG”.

This hashtableis mostfrequentlyusedfor comparisonsof variableswithin the rewrite

rule componentandwithin theSQL generationcomponent.

5.3 XAT Operators

Thefollowing sectionbriefly describestheoperatorsthatwill beusedfor order-sensitive

computationpushdown. A completedescriptionof eachoperatorcanbefoundin [26].

5.3.1 Select

The Selectoperatorhasthesamefunctionality astheSQL selectionoperatorandis de-

notedby the 021354). symbolset,where � is antheexpressionand 4 is anXAT input table.

The Selectoperatorwill filter all tuplesfrom 4 basedon the conditionof the selectex-

pression,� .

16

Page 29: Order-sensitive XML Query Processing Over Relational Sources

5.3.2 Position Function

ThePositionoperatoris afunctionoperatorof typeposition.Thefunctionoperatoris also

usedfor aggregateslike SUM, AVG, COUNT andotherfunctionsthatarederivedfrom

SQLor arestrictly XML specific.ThepositionfunctionhasnotrueSQLequivalent.This

operatorcreatesa new columnin the XAT datatable that containsincrementalinteger

valuesthatencodetherelativeorderamongthetuples.

5.3.3 GroupBy

TheGroupByoperatoris verysimilar to its SQLequivalentandis denotedby the

6 18759;:=<?>=> ��@ A4CB34EDF. symbol set, where ���EGIH;J)KLK=MON are the columnswithin the input table 4 to

groupover, and 4ED is thesubquerythatwill beexecutedon eachgroup.For all examples

in thispaper, thesubqueryfor aGroupBynodewill beasimpleaggregationoperator, and

in mostinstances,it will specificallybethePositionoperator.

Whenthesubqueryof a GroupBynodeis a Positionfunction, thePositionfunction

will createnumericorder valuesfor eachgroup. For the Query Q1 example shown

in Figure 5.1, thereare threeSHORT PLAY groups,with two SONGseach,the Po-

sition function in conjunctionwith the GroupBynodewill createthe following values

IJ�B3P2B�J�B3P�B�J�B3P�. . For the Query Q2 exampleasshown in Figure5.2, the groupingwill

occurover therootelementcreatingonly onegroup,andthePositionfunctionwill corre-

spondinglycreatethefollowing valuesQJ)B3P�B3R�BISTB3U�BWVX. .Whenthesubqueryof a GroupBynodeis anAggregateoperator, theaggregateoper-

atorturnsall groupingtuplesinto onetuple.

17

Page 30: Order-sensitive XML Query Processing Over Relational Sources

5.4 XAT Decorrelation

WhenanXAT (XML AlgebraTree)is first constructed,it cancontainzeroor moreFOR

nodeswith eachFORnodehaving two subqueries.A FORnodeis effectively aniterator

modelingthe semanticsof the ”FOR” statementin an XQuery FLWR expression.The

purposeof this nodeis “for eachitem returnedfrom the inner branch,executetheouter

branch.” In orderto makethesystemmoreefficient, we needto pushthis computation

down suchthattheinnerbranchis only executedonce.

Figures5.1and5.2show theXATsbeforeto decorrelation.

Figure5.1: OrderSensitiveXQueryQ1 asXAT TreebeforeDecorrelation

For the purposeof efficiency, upondecorrelation,the FOR nodeis removed andin

mostcasesis replacedwith a seriesof GroupBynodesandaggregatenodes. In some

circumstancesthe FOR node is replacedwith a Left Outer Join nodeand a seriesof

brancheswith CartesianProductnodes.For both cases,insteadof executingeachFOR

18

Page 31: Order-sensitive XML Query Processing Over Relational Sources

Figure5.2: OrderSensitiveXQueryQ2 asXAT TreebeforeDecorrelation

branchpossiblymultiple times,thebranchesaremergedandareexecutedonce,andthe

resultsareaggregatedor joined.Thenonly thedesiredcolumnsareprojectedout.

Whenfunction nodesaredecorrelated,they becomea functionof a GroupBynode,

and are groupedon the appropriateprevious FOR binding(s). An example for Q2 is

shown in Figure5.3. An exampleis not shown for Q1, asthat treeis alreadygrouped

over $record.Upondecorrelation,QueryQ1 would attemptto createanotherGroupBy

operatoroverthealreadypresentGroupByoperator. SincebothGroupByoperatorswould

begroupingover$record,theoutermostGroupByis redundantandnotnecessary. In this

example,theoutermostGroupBywasnotgeneratedfor this reason.

A completedescriptionof decorrelationcanbe found in [20]. Figures5.4 and5.5

show theXATsafterdecorrelation.

19

Page 32: Order-sensitive XML Query Processing Over Relational Sources

Figure5.3: Decorrelationexamplewith queryQ2

Figure5.4: OrderSensitiveXQueryQ1 asXAT TreeafterDecorrelation

20

Page 33: Order-sensitive XML Query Processing Over Relational Sources

Figure5.5: OrderSensitiveXQueryQ2 asXAT TreeafterDecorrelation

5.5 XAT Rewrites

As with all algebras,thereare many equivalencerules that cansimplify or vice versa

makea treemorecomplex. Order-sensitive pushdown relieson a few key rules. These

rulesarebriefly explainedin thefollowingsections.A full reportcanbefoundin [23, 25].

5.5.1 GroupBy/Function Replace

The most important rewrite rule for order-sensitive queriesis the GroupBy/ Function

replacementstrategy whichis oneof thecontributionsof thiswork. Thisruleonly applies

to Positionfunctionnodesor GroupBynodeswith a Position()nodesubquery. It is only

executedduringtheSQLgenerationstep.Thatis, if SQL isn’t generatedthen,thereis no

needfor suchareplacementasshown laterin Chapter7.4.

Therearetwocasesfor replacement.Bothrely onwherethePosition()nodeis located

21

Page 34: Order-sensitive XML Query Processing Over Relational Sources

in thetreeasillustratedbelow.

5.5.2 SingleStepOrder Rewrite

Thefirst caseis thatof aSingleStepOrderreplacement.Duringtheinitial treegeneration,

aPosition()Functionis creatednestedwithin aGroupBynode,asis naturalfor thealgebra

generationof asinglestepquery.

Thiscanbeseenin QueryQ1by thefollowing containedXPath: �� ������� ����)�*��+-,YHZPEN .At this point, �� ������� �� representsSHORT PLAY, which becomesthe context for the

SONGsto begroupedover.

In thiscase,theimmediatechild of theGroupBynodeis anavigatenode,demonstrat-

ing a needfor thesinglesteporderreplacement.This will alwaysbethecasefor Single

Steporderqueries.In this case,theGroupBynode,includingtheGroupBysubquery, is

replacedwith theSingleStepOrdernode.This rewrite is shown in Figure5.6.

Figure5.6: Q1XAT SegmentRewriting

5.5.3 Multi StepOrder Rewrite

The secondcaseis for Multi StepOrder replacement. In the tree generationstep, a

GroupBy G1 is createdwith an Aggregatenodesubquery. A partial tree is shown in

Figure5.7.ThisGroupByG1groupstheentiredocumentbasedon theXPATH of Query

Q2: ��� ������� ����)�*�-���! "�#[$\&]���*�^+�,�. . The parentof this GroupByis a Position()

Functionnode,asshown by thenext XPATH of QUERY Q2:

��� ������� ����)�*�-���! "�#[$\&]���*�^+�,�.�HZPEN .

22

Page 35: Order-sensitive XML Query Processing Over Relational Sources

Figure5.7: Q2XAT SegmentRewriting

This is onestrategy in which a Multi Stepquerytreecanbegenerated.Thenduring

Decorrelation,the Position()Functionwill be nestedinto a GroupBy, G2. This creates

two GroupBynodesin parent–childlocations.TheGroupByG2 will be replaced.This

rewrite is demonstratedin Figure5.7.

TheotherinstancewhereaMulti StepOrderreplacementis necessaryis whenthereis

only a Position()Function(not nestedwithin a GroupBynode)evenafterdecorrelation.

In this instance,it is clearthatnohierarchicalgroupingis necessaryasis naturalfor Multi

Stepqueries.HencetheMulti StepOrderrewrite is requiredfor this tree.

In both Multi Step Order rewrite cases,the GroupBy or Position() node and the

GroupBysubquery, if thenodeis a GroupBy, is replacedwith theMultiStepOrdernode.

Parametersto thesenew functionsaretakenfrom thenavigatechild (in thefirst case

of Multi StepOrder, G1’s child) aswell asinformationgainedfrom theMetadatatable,

aswill bediscussedin Chapter6.6.

23

Page 36: Order-sensitive XML Query Processing Over Relational Sources

Chapter 6

XML DocumentLoading

6.1 Relational Storage

Therearemany strategiesfor mappingtheXML documentin Figure3.2to relations.The

resultof two examplemethodsgivenin Figures6.5and6.6areshown in Section6.4.

6.2 Order Preservation

Thenext threesectionsdescribea samplingof orderingmethods.Many moreareavail-

able,but the threeorderingmethodschosenshow the full rangeof orderencodings.A

guidelineon generalorderingsfollows theorderencodings.In [15], thefollowing order

methodsarediscussedin full detail: local, global,andDewey order. The full pathorder

describedin Section6.2.3 is similar to the Dewey orderbut improved upon,asshown

below.

[15] proved someinterestingfacts aboutorder types. Throughexperiments,they

showed that local order is the bestmethodfor storagein situationswith heavy updates

becausetheneedto changemany ordervalueswill besmaller, especiallyin instancesof

24

Page 37: Order-sensitive XML Query Processing Over Relational Sources

sparseordering.Sparseorderingwould leavegapsin betweeneachordervaluewhile still

maintaininganascendingorder(i.e. 1, 4, 7, ...). Only theordervaluefor the immediate

siblingswill needto bechanged.

Other experimentsin [15] showed that global order is the bestmethodfor storage

whena heavy queryloadandlight updateload is expected.It is betterthanlocal order

in that it doesn’t have to groupsiblingsbeforeordering(whereasfor the local ordering

method,eachsetof siblingsmustbeorderedthroughtheir parentspositions.This would

recursively continueall the way to the root). Hencethis reducesthe complexity of the

SQL querycreated.If a mediumlevel of updatesis expected,sparseorderingwould be

mosteffective.

6.2.1 Local Order

Local order is kept throughsibling order. The root will have a value of 1, and its n

childrenwill haveordervaluesfrom 1 to n respectively andsoon. Theordernumbersare

not uniqueglobally but ratheronly uniquelocally with respectto siblingsregardlessof

elementnames.An exampleis shown in Figure6.1.

_a`bdc edf g;hi j k lm n o p q r s t

uv w x

yz {}| ~� ����� ���Figure6.1: LocalOrderEncoding:SiblingOrderTraversal

6.2.2 Global Order

Globalorderrefersto globally uniquepositionvalues.They arederivedby a postorder

traversalstrategy with a valueof ”1” for theroot elementandall theothernodesordered

consecutively basedon theordervisited. The root’s left child will have the next value,

25

Page 38: Order-sensitive XML Query Processing Over Relational Sources

andits left child will have thenext value,while thebottommost,right mostelementwill

have themaximumvalue.An exampleis shown in Figure6.2.

�a��d� �d� �;�� � � �� � � � � � � �

�� � � �

¡¢ £¥¤ ¦¨§© ª « ¬®­¨¯°± ²¨³µ´L´Figure6.2: GlobalOrderEncoding:Post-OrderTraversal

6.2.3 Full Path Order

A full pathorderis keptby combiningthe local sibling orderat eachlevel of the hier-

archyandconcatenatingtheseinto onepathkey. This combinestheconceptsof sibling

orderaswell asuniqueglobal order in an effort to maximizequeryefficiency for both

heavy updateworkloadsandheavy queryworkloads.An exampleis shown in Figure6.3.

This couldbesimpleintegersseparatedby dots,which is how it is representedby [15].

Therearepitfalls with the integer-dot method,including creatingspecialOracle”order

by” operators.Insteadof simplycomparingintegers,thedottedintegersstringswouldbe

comparedagainsteachotherin ASCII order. For this reason,ASCII orderwouldn’t be

correctas1.2wouldcomeafter1.10.

Anothermethodwould be to createlargemostly zerofilled integer strings. Instead,

a lexicographicalmethodasshown in [5] is utilized. This methodusessparseordering

regardlessof theworkloadandexploitsASCII alphanumericcomparisons.By doingthis,

it avoidshaving to createnew ”order by” classesin SQL. Lexicographicalorderingalso

providesa strategy for non-cascadingupdatesdueto its siblingandsparse”numbering.”

Forexample,if anew SONGis addedinto thesecondpositionof thefirstSHORT PLAY

in Figure6.3,thenthenew treewouldresembleFigure6.4,whereSnis thenewly inserted

SONG.

26

Page 39: Order-sensitive XML Query Processing Over Relational Sources

¶¸·¹;º »½¼ ¾;¿

À Á Â Ã

ÄÅ�Æ Å Ç�È É Ê�Ë Ì

Í�Î Í�Î Í Í?Î Í�Î ÏÐÍ�Î Í�Î ÑÒ�Ó Ò�Ó Ô Õ Ö × ØÙ�Ú Û Ú Ù Ù�Ú Û Ú ÜÐÙ�Ú Û Ú ÛÝ�Þ ß Þ àá

Figure6.3: Full PathOrderEncoding

â�ãäWå

æ ç è éWê

ë

ìTí ì

î2ï îTï î îTï îTï ð}îTï îTï ñòîTï îTï óôTõ ôTõ ö÷ø ù

Figure6.4: Full PathOrderEncodingwith UpdateSn

6.3 Guidelinesfor Ordering Methods

In orderfor theRainbow systemto properlyexecuteorderbasedqueries,theordervalue

loadingmustadhereto thefollowing guidelines.First, all orderencodingsmustbe loss-

less,suchthatall elementswhich requireordervaluesaregivenordervalues.Elements

which requireordervaluesarea)elementswith any numberof childrengreaterthanzero

andb) elementswith schemamodifiers,like * or +. Secondlyall ordervaluesmustbe

kept in ascendingorder, suchthat thenaturalascendingdocumentorderis captured.Fi-

nally, thenumberingmustbeeithernumericor alphanumeric,suchthatthey caneitherbe

completelycomparednumericallyor in ASCII order.

6.4 Loading

Theprocessof loadingXML dataintoarelationaldatabasemustpreserveall order-related

aspectsof thedocument.Thereareafew factorsthatmustbekeptin mind. First,theorder

encodingbetweenthedirectchildrenof a parentnodemustbekept.

Secondly, all sametypeelementsmustbestoredin theproperorder. In theRECORDLIST

27

Page 40: Order-sensitive XML Query Processing Over Relational Sources

document,theSONGsmustbestoredsothat they arein documentorder. Documentor-

der refersto how the SONGsareimplicitly orderedwithin the document.For the first

SHORT PLAY, ”Cough/Cool”mustcomebefore”She”, wherein thecaseof localorder,

”Cough/Cool” would get an ordervalueof ”1,” and”She” would get an ordervalueof

”2.”

Finally, all siblings,similar or not, mustalsobe storedin the properorder. For ex-

ample,in theRECORDLISTdocument,the labelelementmustcomebeforeany SONG

element.

Thereare many ways to load a document. First, two ways that were usedin the

experimentalsectionaredescribedbelow. Secondly, wegiveadescriptionof how general

loadingsarehandled.

6.4.1 Inline Loading

Inline loadingstrategy [7] is usedto keepthenumberof tablesin therelationalbackend

to a minimum. For this loading,mostchildren(onesthathave no subelements,or fewer

than4 subelementswith nosubelementsof their own) of oneelementareloadedinto the

sametable.An exampleof this inline loadingis shown in Figure6.5.

The table in Figure6.5 caneven be loadedmorecompactly. If it is known, by the

schema,thattherearealways2 or lessSONGs,thesevaluescanbeinlined into themain

table. TheSONGelementscanthenberelabeledSONG1 andSONG2 sothatorderis

preserved. They arelabeledin sucha way asto preserve the original documentorder;

mostimportantlyto indicatewhichSONGcamefirst andwhichSONGcamesecond.

If, for instance,eachSONGin thedocumenthadsubchildrenof typelyrics by, where

many individualshadhelpedto contributeto the lyrics, theSONGelementcouldnot be

inlined into the basetable. This is becausethe amountof columnsnecessarywould be

unknown, andcouldin factvary for eachSONG.

28

Page 41: Order-sensitive XML Query Processing Over Relational Sources

6.4.2 EdgeLoading

The edgeloading strategy [7] requiresthe fewest tables: one. This table will have 4

columnsonly. Thefirst columnis theSOURCEcolumn. This columnis equivalentto a

ParentID column.Thesecondcolumnis thePOSITIONcolumn.This columncontains

theexplicitly createdpositioninformation.Thethird columnis theNAME column.This

columnstoresall XML tagelementvalueor attributetypevalues.For ourloadingwith the

RECORDLISTdocument,this columnwould contain: (RECORDLIST, SHORT PLAY,

BAND, etc).Thefinal columnis theTARGETcolumn.Thiscolumnservestwopurposes.

First, it containsthedatavalueof any tagelementor attributedatavalue. For instance,

in our loadingit would contain: (Misfits, Blank, She,etc). The secondpurposeof this

columnis to containa referencebetweenelementsandtheir children. For our example,

thefirst elementRECORDLISTwould containthevalue”1.0” in theTARGET column.

This information could then be usedto join with its child’s SOURCEcolumn values,

whereeverySHORT PLAY SOURCEwouldequal”1.0.” Thefull exampleof thisloading

is shown in Figure6.6.

This loadingis usefulfor many reasons.First, theamountof redundantdatais min-

imized,consideringthe only redundantdata(not including any stringswithin theXML

sourcewhich maybethesame)is theSOURCEandTARGET join information. This is

normallyloadedasasmallsizedfloat. Sinceall thedatais storedin only onetable,when

updatesare applied,the action is lessexpensive dueto only touchingone table. This

mappingdoeshave a few drawbackshowever. The self joins that could be requiredto

recreatethedocumentcanbevery expensive. Also the loadingis not asintuitive asthe

previousinline mapping,whichwasverystraight-forward.

29

Page 42: Order-sensitive XML Query Processing Over Relational Sources

Table recordlistIID PID ORDER1 0 1

Table short_playIID PID ORDER band_PCDATA label_PCDATA2 1 1 Misfits blank3 1 2 Misfits Plan 94 1 3 Project X Schism

Table songIID PID ORDER song_PCDATA5 2 1 Cough/Cool6 2 2 She7 3 1 Bullet8 3 2 We Are 1389 4 1 SXE Revenge10 4 2 Shutdown

Figure6.5: RelationalInlining Strategy with LocalOrder

Table EDGESOURCE POSITION NAME TARGET0.0 1.0 RECORDLIST 1.01.0 2.0 SHORT_PLAY 2.02.0 3.0 BAND Misfits2.0 4.0 LABEL Blank2.0 5.0 SONG She2.0 6.0 SONG Cough/Cool1.0 7.0 SHORT_PLAY 7.07.0 8.0 BAND Misfits7.0 9.0 LABEL Plan 97.0 10.0 SONG Bullet7.0 11.0 SONG We Are 1381.0 12.0 SHORT_PLAY 12.0

12.0 13.0 BAND Project X12.0 14.0 LABEL Schism12.0 15.0 SONG SXE Revenge12.0 16.0 SONG Shutdown

Figure6.6: RelationalEdgesStrategy with GlobalOrder

30

Page 43: Order-sensitive XML Query Processing Over Relational Sources

6.5 GeneralLoading

For Rainbow to handleany loading,only onething is required:themappingmustbeable

to createa non-recursivedefaultXML view. We madetheassumptionherethatall input

XML documentsarenon-recursive. Any loadingwith a defaultXML view thatadheres

to this canbemanagedby theRainbow system.

6.6 Metadata Table

Anotherpartof loadingis thecreationof a metadatatablefor storingorderinformation.

Oneissueof this is theloadingstrategy usedwhile a secondorthogonalissueis thestrat-

egy selectedfor retainingorder. This informationmustbe capturedso thatorder-based

userqueriescanbeexecutedcorrectlyregardlessof how thedatais storedin therelational

schema.Thisinformationwill bestoredin themetadatatablefor eachXML Schema.The

metadatatableis createdin parallelwith thedocumentloading.An examplemetadatata-

ble is shown in Table6.1.

XML PATH DXV Order OrderLoadingState OrderContext AorN

RECORDLIST RECORDLIST/RECORDLIST.ROW/POSITION fully preserved RECORDLIST/RECORDLIST.ROW/PID aRECORDLIST/SHORT PLAY SHORT PLAY/SHORT PLAY.ROW/POSITION fully preserved SHORT PLAY/SHORT PLAY.ROW/PID aRECORDLIST/SHORT PLAY/BAND SHORT PLAY/SHORT PLAY.ROW/POSITION fully preserved SHORT PLAY/SHORT PLAY.ROW/PID aRECORDLIST/SHORT PLAY/LABEL SHORT PLAY/SHORT PLAY.ROW/POSITION fully preserved SHORT PLAY/SHORT PLAY.ROW/PID aRECORDLIST/SHORT PLAY/SONG SONG/SONG.ROW/POSITION fully preserved SONG/SONG.ROW/PID a

Table6.1: Order-BasedMetadataTablefor theInlined RecordlistDocument

In Figure6.1, the first columnof the tablewill storetheXML Pathof all theXML

elementsof theoriginaldocument.For ourexample,thefull XML pathis storedfor each

element:(RECORDLIST, SHORT PLAY, BAND, etc)as(RECORDLIST,

RECORDLIST/SHORT PLAY, RECORDLIST/SHORT PLAY/BAND, etc).

Thesecondcolumnwill storethedefaultxml view (DXV) elementpathof the rela-

tional columnthatstorestheselectedorderencodingvalues.

31

Page 44: Order-sensitive XML Query Processing Over Relational Sources

The third columnwill storethe orderingstatein which the documentwasordered.

This canbe oneof two values. In the exampletableabove, the valuefor eachelement

is “fully preserved”. This meansthat the documentwill only requireonesingleorder

encodingvalueto recreatethe documentin its entirety. A valueof “context required”

would be for any otherorderencodingsthat cannotfully preserve the documentorder

with onevalue.Thisholdsfor thecaseof LocalOrder.

The fourth columnwill containall orderingcontext elements,which in somecases

could be more thanone item. For the caseof multiple elements,the elementswill be

storedasastring,delimitedby the’:’ character. Thisitemwill beespeciallynecessaryfor

thegroupingof elementsin orderto achievethecorrecttupleoutputduringthesinglestep

orderSQL generationphase,describedbelow. This columnmayberequiredfor certain

multi stepqueriesdependingon theparticularXPATH with multi stepinformationin it.

In our multi stepexample,this pathincludestheroot. Hencethereis no needto usethis

column’s informationfor grouping.

For example,thefourth columnwill containmultiple itemsfor a local orderloading.

This caserequiresthat eachelementbe properly nestedby joining on parentIDs and

orderingover eachhierarchicallevel of thedocument.

The final column indicateswhetherthe ordervaluesare numericor alphanumeric.

If the ordervalue is numeric,a ”N” valuewill be present.An ”A” will be presentfor

alphanumericordervalues.

Themetadatatableshown in Figure6.1 is designedfor aninlined tablemappingwith

full pathorderingscheme.Theinline mappingcanbeseenby thesecondcolumnsuseof

two tables(onefor SONGsandonefor theotherelements),thefull pathis indicatedby

thethird column,”fully preserved”, andthefinal columnis setto ”a” for alphanumeric.

At this time,this tableis createdmanuallyfor eachXML Schemaandloading,though

conceptually, it wouldbepossibleto automatethis step.

32

Page 45: Order-sensitive XML Query Processing Over Relational Sources

6.7 Query Composition

In the system,the userquery is not in itself sufficient to producethe result (unlessthe

userhasaDatabaseAdministrator’sknowledgeof thesystemandloadingmethodsused),

asall theinformationis storedin a relationaldatabaseandnot in anXML document.In

order to executea userqueryproperlyover the relationalbackend,the userqueryand

theview querymustbemerged. View queriesarediscussedin Chapter6.8. Theuseris

assumingthequeryis overtheoriginalXML document,but thisdocumentis notavailable.

Insteadtheview queryvirtually createsthis document.Therearetwo possiblestrategies

to accomplishthis. First, executethe view query, createa temporaryXML document,

andthenfinally executetheuserquery. A bettermethodwhich doesn’t utilize a two step

executionprocedurerequirestheuserqueryto bemergedwith theview query. With this

method,no materializedview of the documentneedsto be dynamicallycreatedwhich

improvesthesystem’s performance.

6.8 View Queries

In orderto properlyexecuteanordered-basedqueryover anXML view, that view must

be created(or seeminglycreated)from the relationaldatabaseinformation and sorted

appropriatelybasedon theexplicitly capturedXML documentorder, makingtheorder-

sensitive knowledgeimplicit onceagain. With this knowledge,theXML documentcan

bereconstructed.

Insteadof reconstructingthedocument,however, theview queryis translatedinto a

XAT treeandmergedwith theuserquery, asdescribedin Chapter6.7.After thetwo have

beencomposed,thetreecanbesimplifiedandoptimizedby therewrite equivalencerules.

A generalview queryexistsfor eachtypeof loadingdiscussedin [4]. However, these

queriesarecurrentlytoo complex for thefirst prototypeversionof theRainbow system,

33

Page 46: Order-sensitive XML Query Processing Over Relational Sources

�RECORDLIST

FOR$playMapin DXV()/SHORT PLAY/SHORT PLAY.ROWRETURN�

SHORT PLAY pos=$playMap/POSITION/text()FOR$SONGMapin DXV()/SONG/SONG.ROW[PID/text() = $playMap/IID/text()]RETURN�

SONGpos=$SONGMap/POSITION/text()$SONGMap/SONGPCDATA/text()�

/SONGSORTBY (./@posASCENDING)�

/SHORT PLAY SORTBY(./@pos)�

/RECORDLIST

Figure6.7: View (Extraction)Queryfor Q1(Figure3.3)with Inline Strategy

asthey requireasof now unimplementedfunctionsaswell asrecursion.For this reason,

querieswith schemaknowledge,calledinstantiatedqueries,areusedin ourcurrenteffort.

Instantiatedview queriesarepartiallypre-optimizedquerieswith schemaknowledge.

Thesequeriesaregenerallymorecompact,anddonotrequirerecursivecalls.TheXQuery

shown in Figure6.7 is an exampleof an instantiatedview query. This querywould re-

trieve every SONGof eachSHORT PLAY from thedefaultxml view (DXV) shown in

AppendixD. It would thencorrectlyorderandnesttheelements,sothatthequeryresult

resemblesthe original XML document(without the BAND or LABEL elementswhich

areunnecessaryfor bothQ1andQ2).

34

Page 47: Order-sensitive XML Query Processing Over Relational Sources

Chapter 7

Order-basedMethodology

Themainpurposefor algebrarewritesis to pushtheSQL-executablecomputationsdown

to theSQL engine,while all XML specificnodesremainat thetop of thetree.All SQL-

executablenodesshouldbe pusheddown asfar aspossiblein order to be translatedto

SQL statements.SQL-executableoperatorsare operatorswith clear SQL equivalents;

XML selectbecomesSQL whereoperator, XML SortbybecomesSQL ”Order By” and

so forth. This processis further complicatedwhenthe order-sensitive query is merged

with theview query, creatinga largerandmorecomplex query. To makethis composed

queryefficient,all SQL-executablenodesneedto pusheddown asfar aspossible.These

SQL-executablenodescan then be translatedto SQL and executedover the relational

backend.

7.1 The Order-SensitiveXQuery

Whenatree,asshown in Figure5.1,is foundto haveaGroupBy/Positionnode,thisnode

canbeswappedoutandreplacedwith aspecialorder-sensitivenodebasedonthelocation

of theoriginalGroupBy/Positionnodeandits childrenasdiscussedin Sections5.5.2and

35

Page 48: Order-sensitive XML Query Processing Over Relational Sources

5.5.3.

7.2 Position Operator Replace

After the GroupBy/Positionoperatorhasbeenfound andremoved, the appropriateor-

der node,be it SingleStepOrderor MultiStepOrder, will be insertedin the placeof the

original GroupBy/Positionasshown in Figures5.6 and5.7. The parametersfrom the

original GroupBy/Positionoperatorsarenot lost. They arekeptastheparametersto the

SingleStepOrderor MultiStepOrderfunction.Theotherparametersto this new nodewill

now bedetermined,asdiscussedbelow.

7.3 Metadata Table Query

An SQLstatementwill becreatedto queryover themetadatatableaspartof thePosition

Operatorrewrite. This statementwill thenqueryover themetadataordertableby select-

ing thematchingXML Pathfrom theimmediatechild Navigateoperator, asthisoperator

is theonein which we wantto captureorderinformation. TheDXV orderelementpath

is thenprojectedfrom themetadatatable,alongwith therestof thetuple. Themetadata

queryfor Q1 is shown in Figure7.1.

selectDXV ORDERfrom RECORDLISTMETADATAwhereXML PATH = “RECORDLIST/SHORT PLAY/SONG”

Figure7.1: MetdataQueryfor Q1

The nodeis now completewith the newly producedparametersandthe parameters

from the original GroupBy/Positionnode. The tree is thenpassedto the SQL genera-

tion componentasdiscussedin Chapter8. A rewritten exampleof eachorder-sensitive

optimizedusertreeis shown in Figures7.2and7.3.

36

Page 49: Order-sensitive XML Query Processing Over Relational Sources

Figure7.2: QueryQ1 asXAT Treewith Rewrite

Figure7.3: QueryQ2 asXAT Treewith Rewrite

37

Page 50: Order-sensitive XML Query Processing Over Relational Sources

7.4 Main Memory Position Execution

Theremaybe caseswhereSQL generationis not desired,particularlyif the datais not

storedin a relationalengine.This canoccurwhentheoriginal XML documentis stored

locally or in a native XML documentrepository. For sucha case,theGroupBy/Position

operatorshouldnotbeswappedoutor pusheddownto SQLgeneration,but insteadshould

beexecutedusingour native XML queryprocessingstrategy. This executionis straight-

forward. The positionoperatorcreatesa new columnin the datamodel,is groupedap-

propriately, andthentheXAT tablecellsarefilled in with theappropriateorderedvalues

ascreatedby thePositionoperator.

38

Page 51: Order-sensitive XML Query Processing Over Relational Sources

Chapter 8

Order-basedTemplateHeuristics

SQL generationis an incrementalprocessdriven by bottomup tree traversal. As one

operatoris visited, an appropriateSQL statementfragmentis created. The Navigate,

SelectandotherSQL-executablenodesareall translatedto SQL. SQL-executableXAT

nodesarethosewhich have equivalentSQL operators.For themostpart, this is a direct

translationof operatorsto SQL statements.However, when order is of interest,other

order-specificpiecesmustbeaddedto theSQLstatement.Finally, whenthetreetraverser

reachesanoperatorthatcannotbeexecutedin SQL,thetraverserstopsandcreatesaSQL

Statementoperator.

To createtheorder-specificpiecesof theSQLquery, atemplateis used.Thegrammar

for the order-sensitive SQL templateis shown in Figure8.1. This grammaradheresto

Oracle’s ”ordered”querylingua.Othertemplateswouldneedto becreatedto coverother

proprietaryDBMS thathandlesimilar order-basedstatements.

39

Page 52: Order-sensitive XML Query Processing Over Relational Sources

TEMPLATE:SELECT

�ELEMENT ,+ row number()over

(�

PARTITION ?�

ORDERBY ) $positionfunction bindingFROM

�TABLE +

PARTITION:partitionby

�ELEMENT

ORDERBY:orderby

�TONUMBERûú � ELEMENT

TONUMBER:to number(

�ELEMENT )

TABLE:tablename ú TEMPLATE

ELEMENT:elementname

Figure8.1: SQLGrammarfor Order-SensitiveQueries

8.1 Order-SensitiveSQL grammar

The templateis basedupon6 rules,which areeachfairly simple. The following tem-

plateis baseduponOracle’s SQL rules,but othertemplatescanbecreatedfor otherDB

specificSQL versions. The itemsthat arespecialto Oracleare the analyticalfunction

���ü Mþýûÿ�� �3 T�. (which createsinteger valuesin the samefashionas the XAT position

function), ���þ�3 method(which tells the analyticalfunction what valuesto work with),

andthe ���� ����Q��M��� phrase(which createsgroupsor partitionsbasedon theelementit is

given).

The key grammarrule is theTEMPLATE rule. This rule is thekey root rule for all

the otherelements.Algebratreeswill fill out the templatein specificways,depending

on theorder-sensitivespecificoperator, whetherit is SingleStepOrderor MultiStepOrder

andtheparametersfor this operator.

Thekey rule to denotethis differenceis capturedby the � PARTITION � ? rule. Not

all queriesrequirepartitioninginformation,hencethe ’?’ at the endof the rule within

theTEMPLATE rule. More specifically, relationaltablesthatareencodedwith a ”fully

40

Page 53: Order-sensitive XML Query Processing Over Relational Sources

preserved” orderschemado not needpartitioningwith multi stepqueries,whereasmulti

stepquerieswith a ”context needed”parameterwill requirenestedpartitioning. Single

stepqueriesare similar, except they alwaysrequirepartitioning. In the instanceof a

”context needed”parameter, Singlestepquerieswouldalsorequirenestedpartitioning.

The ���X�E4�������M �2ý M ���Q��M ��5M ��5M�� is thebindingfrom thealgebratree.Foreachof our

examples,the ���û�E4���Q��M �2ýûMÐ�������M ��5M ��5M�� is � ���EG R . This binding is thenconstrained

outsideof the templatewithin the WHERE clauseof the remainingSQL query. For

example �E���EG R�� P . A completeexampleis shown in Section8.2 in Figure8.2.

8.2 TemplateCompletion

During theSQL generationphase,the templatewill befilled in. Somesectionswill be

filled in basedmostly on the metadatatable information. The new order-basedquery

nodecontainsvaluableinformationessentialfor thefilling of thetemplate.We now walk

throughtheexampleof Q1 for a completeexplanation.

XML PATH DXV Order OrderLoadingState OrderContext AorN

RECORDLIST/SHORT PLAY/SONG SONG/SONG.ROW/POSITION fully preserved SONG/SONG.ROW/PID A

Table8.1: CondensedOrder-BasedMetadataTablefor theInlined RecordlistDocument

The row shown in Figure8.1 is the onethat matchesthe metadataqueryfor Q1 as

shown in Figure7.1. From the “Order LoadingState”column,we canseethat the in-

formationwasloadedwith a fully preserved order. This indicatesto the templatethat

thereis no needto partition over any context to producethe correctorderof the result.

However, sincethequeryQ1 is a singlestepquery, thereis still a needto partitionover

theParentIDs in orderto geteach2���

SONG.The“AorN” columnshows thattheorder

valuesare alphanumeric,so thereis no needto usethe TO NUMBER() function. We

alsoknow from theDXV OrdercolumnwheretheSONGordercolumnis. With this raw

information,thetemplateconstructionis nearingcompletion.

41

Page 54: Order-sensitive XML Query Processing Over Relational Sources

Thefirst stepfor templatecompletionis to taketheraw stringsderivedfrom themeta-

datatable,andput themin an appropriateSQL executableformat. This taskis simple.

SincetheDXV storeseachof theelementsin thefollowing format:

”TABLE NAME/TABLE NAME.ROW/ELEMENT”,

thetransformationis fairly simple.By creatingareverselookupwithin theBindingHash

table,wecanusetheXML pathto returna$binding.Thenwithin theSQLgenerationpor-

tion, thereis alocalhashstructurethatmapseachbindingto its ’RDB equivalent,’ suchas

�E���EG P����! F� 4 ��M"�# �K½"����%$C &$��^+ , where�E���EG P is thebindingand F� 4 ��M"�# �K½"����%$C &$��^+is the’RDB equivalent.’ This ’RDB equivalent’is thedesiredSQLexecutableformat.Fi-

nally, we addeach’RDB equivalent’ to its appropriatelocationwithin the template,as

shown in Figure8.2. Thedoublequotesaroundthevariablesarerequired.In Oracle,the

$ symbolis a reservedsymbol.

row number()over(partionby ”$song”.PIDorderby ”$song”.POSITION

) $col4

Figure8.2: TheTemplatefor our ExampleFilled

Oncethe templateis complete,the restof the SQL generationstepscontinue.SQL

generationwill continueto consumenodesandcreatemoreSQL fragments.The tagger

andaggregateoperatorshowever, arenotconvertedinto SQL,asthey havenoSQLequiv-

alents.Thesetwo nodes,aswell asa few others,mustbeexecutedwithin theRainbow

engineproper.

WhentheSQL generationstephasfinished,a new nodeis createdin thetree,a SQL

Stmtnode. The nodesthatwereusedto createtheSQL statementandtemplatewill be

deletedfrom thetreeandreplacedby theSQL Stmtnode,which is simply onenodethat

containsanSQLstring,preparedfor execution.An exampleof Q1 in thefinalizedformat

is shown in Figure8.3.

42

Page 55: Order-sensitive XML Query Processing Over Relational Sources

Figure8.3: QueryQ1 from Figure7.2afterSQL Generation

If thequerywasinsteada multi stepquery, asin Q2, thetemplateandresultingSQL

statementwouldlook differently, aswould thetree.In themulti stepcase,nopartitioning

wouldberequired(asshown by thealgebratreein Figure7.3). Thequeryis attemptingto

find the”global” 2')( SONGsonogroupingis required.Hencenopartitioninginformation

is requiredeither. The *�+-,/.�+-0�1 informationis still required,asis the +2*�3 4�576�08.�+ and

*�9�.�+ method.Thisexampleis shown in Figure8.4.

Figure8.4: QueryQ2 from Figure7.3afterSQL Generation

43

Page 56: Order-sensitive XML Query Processing Over Relational Sources

8.3 GeneralTemplateDiscussion

In the previous section,we want to point out that our proceduredoesnot rely on any

hard-codedquerystatementsfor specificloadingsor specificmappings.Thesystemwas

designedto begeneral,to handlealargerangeof loadingsor orderencodingsthatcomply

with thepreviously discussedguidelinesfrom Sections6.5and6.3. In fact, thereis only

a small differencebetweenthe QueryQ1 whendesignedfor an inlined loadingversus

an edgeloading. The SQL statementfor Q1 queryover an inline loading is shown in

Figure8.5,while theexamplewith anedgeloadingis shown in Figure8.6.Thesimilarity

confirmsthegeneralnatureof our solution.

SELECT”$col2”FROM ( SELECT”s1”.PCDATA as”$col2”, ROW NUMBER() OVER

(PARTITION BY ”s1”.PIDORDERBY ”s1”.POSITION)”$col3”FROM SONG”s1”)

WHERE”$col3” = 2

Figure8.5: QueryQ1 overanInline Loadingwith AlphanumericOrder

SELECT”$col2”FROM ( SELECT”s1”.TARGETas”$col2”, ROW NUMBER() OVER

(PARTITION BY ”s1”.SOURCEORDERBY ”s1”.POSITION)”$col3”FROM SONG”s1”WHERE”s1”.NAME=”song”)

WHERE”$col3” = 2

Figure8.6: QueryQ1 overanEdgeLoadingwith AlphanumericOrder

44

Page 57: Order-sensitive XML Query Processing Over Relational Sources

Chapter 9

Implementation

TheRainbow systemis implementedwith JavaJDK 1.2,JDBCandOracle8i. Themajor

softwareengineeringcontrolbehindthesystemis thevisitor pattern.This allows pieces

to be removed or replacedwithout the entirecodeneedingto be updated.Considering

thereareover 300 files that combinedareover 1.2 M in size,this is a very convenient

pattern.

Orderandorderbasedqueriesareasubsetof thesystem.To properlyprocessandex-

ecute,I hadto updatemany components,thoughthemostorder-sensitivework wentinto

therewrite rules,SQLgeneration,andvariousfunctions,suchasSINGLESTEPORDER,

MULTISTEPORDERandTRIM. Thecombinedtotal for all of thesemethodsandclasses

thatI’ veaddedis morethan1200linesof code.

For the rewrite rules,a new classwasadded,”RewritePositionRules.” To this class,

threemethodswere added. Following the visitor patternwith two parameters,a visit

methodwasaddedfor eachof thesinglestepandmultistepconfigurations:GroupByand

Navigate – SingleStep;GroupBy and GroupBy – MultiStep; Positionand Navigate –

MultiStep.

Eachof thesemethodsmust thencommunicatewith the metadatatable in order to

45

Page 58: Order-sensitive XML Query Processing Over Relational Sources

get theappropriateinformationfor thenewly constructedorder-sensitive node. This re-

quireda JDBCconnectionto anOracle8.1.7or higherbackend,not the8.1.5versionof

Oracle. The 8.1.5versiondid not have someof theappropriatefunctions,in particular

row number().

For SQL generation,a methodcalled getStatement()was addedin order to allow

for SingleStepandMultiSteptemplatecreations.Thismethodreadsfragmentsof queries

andtranslatesthemto SQL.I updatedthiscodesothatit couldtranslateto theappropriate

order-sensitivequery.

Functionswere also addedto the Rainbow Engine. First functions for both Sin-

gleStepOrderandMultiStepOrderwereadded. Thesefunctionsarecurrentlystubsbe-

causeasdiscussedbefore,they shouldnever beexecutedin native. Later, functionswere

alsoaddedfor basicXQuery commands,suchasTRIM. Also a functionwascreatedto

producetheLexicographicalkey valuesusedin thefull pathordertype.

In general,many thingsneededto maintainedandupdatedthroughouttheRainbow

system.As a seniormemberof theRainbow Coreteam,my main focuswaswithin the

Decorrelationcomponent.I alsoworkedon fixing bugsin everysectorof thecode,from

theBinding HashTable,to Expressions,to theExecutioncomponent,wheremany bugs

werefixed.Many weeksandmonthswerespenttesting,fixing andtestingagain.

Decorrelationrequiredthemostinitial effort. Theoriginaldesignfor this component

wastoo incompletein that it wasmissingcases,andsomeof thecasesit coveredwere

incorrect. I updatedit so that it would effectively work with the new casesdiscovered,

andthepreviouscasesthatwereincorrect.Lateron,thegroupleaderrevisedall thecases

andredidit onceagain.At thispoint, thedecorrelationcomponentis stable.

The Executioncomponenthadmany small bugsthatneededto beworkedout. The

mostfrustratingsetof thesebugswasin theGroupByoperator. Themajorproblemwas

tricky, in thatit wouldgroupthefirst itemsproperly, but everysetof itemsfollowing that

46

Page 59: Order-sensitive XML Query Processing Over Relational Sources

werenotgrouped.It turnsout thattheoriginaldesignerof thecodewascomparingevery

item to the first item insteadof incrementingthe comparisonitem. With that fixed, the

GroupByoperatorworkedasintended.

TheExpressionpackagehadaproblemrelatedto thecomparisonof orderelementsas

describedearlier. At acertainpoint,all valuesweregettingcomparedin ASCII order, re-

gardlessof if they werestringsor numbers.Thiswasresolvedby changingthecascading

classcastexceptioncatchingto oneof creatingnew valuespecificobjects,andcascading

thecatchingof numberformatexceptions.Thecomparisonsnow work asdesigned.

Oneotherproblemwith theExpressionpackagewasthecallbackscreatedin SQLgen-

eration.DuringSQLgeneration,whenanexpressionwasneededin aWHEREclause,the

SQL generationpackagewould call backto theExpressionpackagefor expressionSQL

phrasing.Within this callbacktherewasa Binding HashTablelookup for theparticular

terminalpartsof theexpression.However, whentheBinding Hashtablecodeupdated,it

no longerkeptthebindingsof elementsin Expressions.Therefore,whenthecallbackwas

executedin SQL, theBinding Hashtablewascalledupon,anda null pointerexception

wasthrown sincetheterminaldid not exist within theHash.I fixedthis by changingthe

methodin which thecallbackwasexecuted.Now it no longerrequiresinformationfrom

theHash,andit cancreatetheterminalcolumnby itself.

BeyondtheJava coding,with theXQuerylanguage,6 loadingqueriesweredesigned

aswell as2 view queries.The loadingquerieswerebasedon work doneby [4]. More

than10 testquerieswerecreatedwith XQuery.

9.1 SQL Generation

The implementationof SQL generationavailableinitially in Rainbow wasrathernaive.

Thegenerationis an incrementalprocess.As the traversergoesfrom thebottomsource

47

Page 60: Order-sensitive XML Query Processing Over Relational Sources

nodesto theuppertaggernodes,eachoperatoris consumed.Thesemanticinformation

from eachoperatoris thenput into a systemof flat structures.Theflat structuresconsist

of vectors,onefor eachpieceof requiredSQL: SELECT, FROM, WHERE,HAVING,

ORDERBY, GROUPBY, DISTINCT andFUNCTION. Oncea taggernodeis reached,

eachvectoris unrolledto createoneSQLstatement.Thisprocesshowever losesvaluable

hierarchyinformationaboutthequery, mostimportantlywhento join tablesandwhento

applyselectionoperators.

Theexamplesusedthroughoutthis paperexecuteproperlyandcreatetheappropriate

resultswith theinitial naivegenerationprocess.This is dueto thefact thatall thequeries

want leaf-mostelements,anddo not queryaboutsiblingslike BAND andSONGin the

samequery. If querieswereexecutedthataskedfor samenestedsiblings,inappropriate

joinswouldbecreated.

As an example,considera new query, Q3, as shown in Figure 9.1. This query is

looking for all theSONGsof the2���

SHORT PLAY. Theresultof thisqueryis shown in

Figure9.2.

�RESULT

FOR$recordin document(”r.xml”)/SHORT PLAY[2]/SONGRETURN�

SONG $record/text()�

/SONG�/RESULT

Figure9.1: OrderSensitive XQueryQ3

�RESULT �

SONG Bullet�

/SONG�SONG We Are 138

�/SONG�

/RESULT

Figure 9.2: XML Document Result ofXQueryQ3

Whenthis queryis executedwithin thenaive SQL generationover anInlined Global

loading,anincorrectSQLstatementis createddueto thisflat structuresystem.TheSQL

statementis shown in Figure9.3.

This statementis incorrectbecausetheSHORT PLAY andSONGtablesaremerged

too soon. Whenthesetablesaremergedon PID andIID valuesandthenordered,each

2���

row number()will now refer to 2���

SONGsinsteadof 2���

SHORT PLAYs. The

48

Page 61: Order-sensitive XML Query Processing Over Relational Sources

select songpc, numfrom (

select short_play.iid as shid,song.pid as spid,

song.POSITION as spos,song.SONG_PCDATA as songpc,row_number() over

(partition by short_play.pid

order by to_number(short_play.position)) num

from short_play, songwhere short_play.iid = song.pid

)where num = 2ORDER BY spos

Figure9.3: Q3 IncorrectSQL Statementfor Inline GlobalLoading

appropriateSQL statementis shown in Figure9.4.

select song.SONG_PCDATAfrom (

select shidfrom (

select short_play.iid as shid, row_number() over(

partition by short_play.pidorder by to_number(short_play.position)

) numfrom short_play

)where num = 2

), songwhere shid = song.pidORDER BY song.POSITION

Figure9.4: Q3CorrectSQLStatementfor Inline GlobalLoading

Thestatementin Figure9.4 joinsproperly. It first createsthegroupsover

SHORT PLAYs, ordersappropriatelyandthenfindsthe2���

SHORT PLAY. At thispoint,

thejoin is createdwith SONGs,with theappropriateconditions.ThisSQLstatementthen

producesthecorrectresult.Thisstatementcapturesthehierarchicalinformationfrom the

treethatwaslost in theflat structurequery.

In short, if SQL statementswere generatedin a true incrementalfashionwith one

operatorfor every createdportionof anSQL statement,thefinal SQL statementswould

49

Page 62: Order-sensitive XML Query Processing Over Relational Sources

becorrect. The currentimplementationcanbeextendedsuchthat theflat structuresare

reused,wherecommonnodescancreateSQL statementchunksthatcanbejoinedprop-

erly togetherto form thefinal SQLstatement.Theflat structurescouldbeusedfor groups

of operators,but whenan operatoris reachedthat couldcausea differentnesting,such

asa join, theflat structureswould createtheir pieceof SQL andnestit properlywith the

otherSQLstatementchunks.

50

Page 63: Order-sensitive XML Query Processing Over Relational Sources

Chapter 10

Experiments

10.1 SystemSetup

Thechartsin theexperimentalsectionarebasedon experimentsrun on a Windows2000

machinewith a 1.2 Gig processorand512 Megs of RAM. The experimentswere run

duringtimesof minorprocessload.

The dataloadedwasgeneratedby a randomXML datagenerationscript. Thirteen

documentswerecreatedthatall comply to theschemashown in AppendixC. Thefirst

documenthas100SHORT PLAY elements,with 600totalSONGelements.Thesecond

documenthas200SHORT PLAY elements,wherethefirst 100elementsarethesameas

thefirst document.This seconddocumenthas1200SONGelements,wherethefirst 600

arealsothesameasthefirst document.This generationof documentscontinuedto the

thirteenthdocumentwhichhas1300SHORT PLAY elements,and7800SONGelements.

Of these7800songelements,thereare650distinctsongtitles. Theselectivity of thesong

distribution is random.

Two loadingmethodswereused,the inline loadingandthe edgeloading. For each

loading,two differentorderencodingswereused:globalandlocal. This givesfour total

51

Page 64: Order-sensitive XML Query Processing Over Relational Sources

testcases.

Thequeriesexecutedfor theexperimentsareQ1andQ2,asshown in Figures3.3and

3.5 respectively. Eachquerywasexecutedover thedescribedabove loadingsten times.

Theresultof thesetentestswereaveragedfor thefinal result.

TheY axisof eachchartis time in ms. For thefirst setof chartstheX axisdisplays

how many SHORT PLAYs were loaded. The final four chartsvary the selectivity of

theamountof SONGsreturnedby thequery. This wasdoneby creatingqueriesalmost

identicalto Q1 andQ2,but insteadof usingthebinaryoperator� , theoperator�&� was

used.For instance,for Q1,thefirst barin thechartrepresentsall SONGSof position �&�1, or 1300SONGs.

10.2 Experimental Evaluation

10.2.1 SingleStepQuery Experiments

:;�<�=)>?A@�B)CD)E�F)GH�I�J)KL)M�N)O

PRQTS UWVTXZY\[T] ^`_TacbWdTe fhgTikj�lTmonhpTq r\sTtvuRwTxTy{z}|R~T���`�W�h�{�}�\�T����#���/���2���)�����2���/���/�A�¡ 

¢£ ¤¥¦ §¨© ª�«�¬%­¯®±°³²�´¶µA·)¸A¹�º±» ¼�½ ¾)¿�À}ÁAÂ8Ã2Ä�Å�Æ�Ç�鱃 ÊAË

Table10.1:QueryQ1 Executedwith GlobalEdgeLoading

52

Page 65: Order-sensitive XML Query Processing Over Relational Sources

ÌÍ�Î�Ï)ÐÑAÒ�Ó)ÔÕ)Ö�×)ØÙ�Ú�Û)ÜÝ)Þ�ß)à

áRâTã äWåTæZç\èTé ê`ëTìcíWîTï ðhñTòkó�ôTõoöh÷Tø ù\úTûvüRýTþTÿ���������� ����������������������! #"%$'&)(!*,+�-/.�02143

56 789 :;< =?>�@BA�CED�F'G!HJILKJMONEP Q2R SUTWV�XJY%Z\[2]%^2_a`Eb cJd

Table10.2:QueryQ1Executedwith LocalEdgeLoading

Tables10.1and10.2show comparisonsbetweenSQL statementexecutionandtotal

executiontimesmeasuredin ms for QueryQ1 asexecutedover Global andLocal Edge

loadedtablesrespectively. The X axis displaystheamountof SHORT PLAY elements

loaded.For100SHORT PLAY elements,amajorityof thetotalexecutiontimeis theSQL

statementexecuting,which includestheOracleconnection.By the13e f test,however, the

SQL statementexecutionis almostlessthana third of thetotal executionfor bothtables,

despitethefact thatthereareonly threeotherXAT operatorsto execute,asshown in the

algebratreeof Figure8.3. At the most for Local andGlobal loading, the time is only

4000ms,with thelocal loadingoutperformingglobal.

Tables10.3and10.4show comparisonsbetweenSQL statementexecutionandtotal

executionmeasuredin msfor QueryQ1 asexecutedover GlobalandLocal Inline loaded

tables. Thesetablesshow similar resultsasthe Q1 Edgetablesfor SQL statementex-

ecutiontimes. The inline tableshowever have overall lower timesfor all tests. This is

dueto thedifferentshredding.Insteadof creatingself joins over onegigantictable,this

methodjoinsseveralsmallertablestogether. While Table10.1showsthe13e f testrunning

53

Page 66: Order-sensitive XML Query Processing Over Relational Sources

ghOiLj%kl2mLn%op%qLr%stJuLv%wx%yLz%{

|�}�~ � ��������� ������� ��� ����������������� ������������ �¡�¢�£�¤¥�¦ §�¨�©�ª�«�¬­�®�¯�°�±�²!³#´%µ'¶)·!¸,¹�º/»�¼2½4¾

¿À ÁÂà ÄÅÆ Ç#È�ÉËÊÍÌEÎÐÏ'Ñ\Ò2Ó%Ô2ÕaÖE× ØJÙ Ú%Û�Ü�Ý2ÞUß!àJáLâJãOäEå æ2ç

Table10.3:QueryQ1Executedwith GlobalInline Loading

èéOêLë%ìí2îLï%ðñ%òLó%ôõJöL÷%ø

ù�ú�û ü ý�þ�ÿ���� ��������� ���������������� ������� ��!�"$#&% '�(*)�+�,�-$.&/�0�12436587:9<;>=@?BADCFE>GIH:JLK:MONQP

RS TUV WXY Z\[8]_^6`ba8cDd>egfihgjlkbm nOo prqts&ugvBwyxOzB{O|~}b� �g�

Table10.4:QueryQ1Executedwith Local Inline Loading

54

Page 67: Order-sensitive XML Query Processing Over Relational Sources

over 1300ms,Table10.3shows a smallertime of 1050ms,which is almosta negligible

differencebut still important. The local loadingsaresimilar with timesof 1600ms for

theEdgetable,and1050msfor theInline table,which is a greaterdifferenceandshows

thepotentialof the inline loadingbetter. The overall executiontimesarestill 3-4 times

greaterthantheSQL statementexecutiontimes,for only threeXAT operators.

10.2.2 Multi StepQuery Experiments

�����������������l��������

����������������� �l¡�¢¤£�¥�¦¨§�©�ª�«�¬�­¤®�¯�°¨±�²�³µ´�¶�·�¸º¹�»�¼�½¿¾�À�Á�ÂÄÃ�Å�Æ�ÇÈ<ÉËÊ:Ì~ÍtÎiÏBÐ�ÑÓÒÕÔ�Ö ×OØ_ÙOÚËÛÝÜ

Þß àáâ ãäå æiç:èêétë�ìgítî�ï~ð�ñlòôóIõ ö~÷ ø�ù�ú û~ü�ý�þ~ÿ�������� �

Table10.5:QueryQ2Executedwith GlobalInline Loading

Tables10.5and10.6alsocompareSQL statementexecutionandoverall executionas

measuredin ms. Thesetablesshow a differentpicturethantheSingleStepcharts.This

is dueto thefact thattheQ2 queryonly returns1 tuple,regardlessof theamountof data

in thebasetables.TheSQL statementexecutiontimesincreaseto almost350msat their

highestpoints.

55

Page 68: Order-sensitive XML Query Processing Over Relational Sources

��� ����������������������

�! #" $&%#' (*)#+ ,.-#/ 0&1#2 354#687:9#; <5=#> ?*@#ACB!D#E#FCGIH!J#KML.N&O#PCQIR*S5TUWVYX[Z]\_^a`cb�dcegfih�jlknmlo�prq

st uvw xyz {}|[~��Y�������a���������� ��� ���:�I�����i����������� ���

Table10.6:QueryQ2Executedwith Local Inline Loading

� �¡�¢�£�¤¥�¦�§�¨�©ª�«�¬�­�®¯�°�±�²�³´�µ�¶�·�¸¹�º�»�¼�½

¾ ¿ À Á  ÃÄcÅÇÆ�ÈaÉËÊÍÌ ÎÐÏÒÑÔÓÖÕÇ×lØÇÙÛÚÇÜ[ÝlÞ_ßià}á�â�ã

äå æçè éêë ì}í[î�ïYð�ñ�ò�óaô�õ�ö�÷ø�ù ú�û ü�ýËþIÿ������������ � ���

Table10.7:QueryQ1with Selectivity Executedwith GlobalEdgeLoading

56

Page 69: Order-sensitive XML Query Processing Over Relational Sources

������������������������� �!"�#�$�%�&'�(�)�*�+,.-�/�0�1

2 3 4 5 6 78:9<;>=�?A@CB DFEHGJILK<MON<PRQ<SUTOVXWZY\[�]_^

`a bcd efg h\iUjlknm oqp_r�s�t�u�v�w x y�z {.|A}�~�������������� � ���

Table10.8:QueryQ1with Selectivity Executedwith GlobalInline Loading

10.3 Experiments with Varying Selectivity

10.3.1 SingleStepQuery Experiments

Tables10.7 and 10.8 have differentX-axis valuesthan the previous tables. Theseta-

blesshow selectivity of the datareturnedby the querychangingover time (on the X-

axis). Thesequerieswereexecutedover a staticdatabasewith a fixeddatasizeof 1300

SHORT PLAYs. Tables10.7and10.8show QueryQ1asexecutedoverGlobalEdgeand

Inline loadedtables.Thesequeriesno longerusethebinaryexpression� , but ratheruse

�&� . For this reason,thetablesappearto beskewed,in thattheslopesdo not follow the

trendof theprevioustables.Theseslopesareappropriatethough.For thefirst bar, there

are 1300 SONGsreturned,for the secondbar thereare 2600 SONGsreturnedand so

forth. By thelastbar, 7800SONGsarereturned.Thecomputationpushedinto theSQL

enginescaleswell with an increasein datasearchedfor while the effect on the native

XML executionis morenoticeabledueto thetime requiredto tagall thereturnedSONG

elements.For theselectivity cases,theinline methodis still cheaperin termsof execution

57

Page 70: Order-sensitive XML Query Processing Over Relational Sources

time.

10.3.2 Multi StepQuery Experiments

��������������������������

� � � � � � \¡<¢>£�¤¦¥C§ ¨Z©Hª¬«®­O¯O°n±³²n´Uµ·¶X¸�¹:º�»:¼

½¾ ¿ÀÁ ÂÃÄ

Å:ÆUÇÉÈ<Ê ËUÌ_ÍZÎ�Ï�Ð�Ñ�Ò Ó Ô�Õ Ö.×AØÚÙ�Û.ÜZÝ�Þ�ß�à�á â ã�ä

Table10.9:QueryQ2with Selectivity Executedwith GlobalEdgeLoading

åæèçêéëíìêîïêðêñò�óêô

õ ö ÷ ø ù úû�ü�ý þ�ÿ���� ��� ��� �����������������������

! "#$ %&'

(*)�+-,�.0/�1�243 576�8:9<; = > ?A@CBED FHG4I J7K�L:M<N O P

Table10.10:QueryQ2 with Selectivity Executedwith GlobalInline Loading

Tables10.9 and 10.10 againshow anotherpicture, despitehaving the static 1300

SHORT PLAYs in thebasetables.TheX axisof thesechartsis still selectivity of SONG

58

Page 71: Order-sensitive XML Query Processing Over Relational Sources

elementsreturnedfrom thebasedata.Theonly differencebetweentheseQ2 queriesand

thepreviousexamplesaretheuseof the � � predicateinsteadof the � predicate.While

thepreviouschartshaveexperiencedincreasingslopes,thesechartsshow jaggedpositive

andnegative slopes.This canbeexplainedby thesemanticsof theQ2 queryin termsof

selectivity. Thefirst barof eachtableshowseverySONGwith position �&� 1, thatis only

thefirst SONG.Thesecondbarshows every SONGwith position �&� 2, or just thefirst

two SONGs,while the lastonly returns6 SONGs.Thetime differencebetweentagging

1 to 6 elementsis very minimal whichexplainsthejaggedline. Dueto thesmallamount

of tuplesreturnedtheoverall timesarealmostall lessthan400. Theinline methodagain

shows smallertimevaluesthantheedgemethod.

10.4 Experimental Summary

As describedin the introduction, the executionof XAT operatorsin main memory is

expensive in the context of the Rainbow system. This is clearly shown by eachof the

experimentalfiguresasthenumberof SONGsincreases.TheSQL statementis executed

followedby threeXAT operators.As theamountof tuplesreturnedincreases,theSQL

statementgrows slowly. However, thecostof executingthe threeXAT operatorsbegins

to grow quite quickly. For instance,in Table10.3, the time to executethe threeXAT

operatorsis lessthan 70 ms with 100 SHORT PLAYs, but upon executingover 1300

SHORT PLAYs, the time to executethe threeXAT operatorsis over 3000ms. If SQL

wasn’t created,main memoryexecutionwould requireexecutingover 25 suchXAT op-

erators.At therateof Table10.3,this time would be25,000ms,over 5 timesthelength

of executionwith anSQL statement.

This point is moreclearlyshown by Table10.8. At thehighesttime line, the3 XAT

operatorsaretagging7800SONGelements.This takesnearly50,000ms. If therewas

59

Page 72: Order-sensitive XML Query Processing Over Relational Sources

no SQL statement,therewouldbemorethan20 otherXAT operatorsto executein main

memoryaswell, whichat thatratewould total almost300,000ms.Asking a userto wait

morethan300secondsfor aqueryresultis unreasonable.

60

Page 73: Order-sensitive XML Query Processing Over Relational Sources

Chapter 11

Conclusions

11.1 Summary

XML documentsareorder-sensitive. Queriesover thesedocumentsarelike-wise order

sensitive with orderpredicatesandproperlyorderedresults. Efficient storagemethods

for thesedocumentsare also desired. Many systemshave chosento utilize relational

databasesasthisstoragestrategy. However, thereis still amissinglink betweentheXML

queriesandthe relationalstorage,especiallywhereorderis concerned.For this reason,

Rainbow wasextendedto handlethe order-sensitive issue. This wasaccomplishedby

the following: order-specificloadingsthat capturedimplicit order-sensitive information

makingit explicit; metadatatablesthatmanagetheimplicit to explicit mappingcapture;

rewrite rules for order-specificqueries;SQL generationtechniquesfor order-sensitive

statementcreationwith theuseof templates;andintegrationandimplementationwithin

the Rainbow system. The experimentalstudiesshow the correctnessof this approach,

aswell asshow a moreefficient executioncomparedto native XML queryexecutionin

Rainbow.

61

Page 74: Order-sensitive XML Query Processing Over Relational Sources

11.2 Contrib utions

My contributionsof this thesisareasfollowed:

1. Thesystemprovidesguidelinesfor handlingany generalloadingqueries;

2. Six order-sensitive loadingquerieshavebeencreatedto testtheguidelines;

3. Rewrite rules now exist for XQuery statementsthat force maximal computation

pushdown of order-sensitiveoperators;

4. A generaltemplategrammarfor order-sensitive SQL query statementshasbeen

created;

5. A metadatatableconcepthasbeendesignedto captureimplicit order-sensitive in-

formation.It wascreatedin amannerthatit is extendableandgeneralto enablethe

additionof ”new” orderencodingsor ”new” loadingstrategiesin thefuture;

6. SQL generationin Rainbow now includesorder-sensitive SQL statementcreation

throughtheuseof templates;

7. This framework wasimplementedandintegratedwithin theRainbow system;

8. An experimentalstudywasconductedto evaluatetheproposedorder-basedXML

queryprocessingsystem.

Most importantly, the systemwasdesignedin a generalmannerto handleany new

mappingsor orderingsthat may arise. The scenariofor eachis quite simple. If a new

ordermethodis created,any currentmappingwould requirea new orderpre-processing

stepaspartof thefull loadingquery. This stepis shown in eachStepZeroof Appendix

A. This simply meansthata new StepZeroXQuery functionwould needto bewritten.

No Java codewould needto be addedto the system.If a new mappingmethodwasto

62

Page 75: Order-sensitive XML Query Processing Over Relational Sources

be utilized, thena new loadingXQuery statementwould needto be created– one for

eachcurrentordertypes.Again,nocodewouldneedto beaddedto thecurrentsystemto

handlethis. Informationaboutthismappingwouldalsoneedto becreatedwithin its own

metadatatable.

11.3 Futur eWork

Currently, Rainbow only handlesnumericorderpredicates.In thefuture, it will benec-

essaryto implementLAST(), FIRST()andotherrelatedorderfunctions,aswell astheir

translationsto SQL.

Also, the metadatatableis createdmanuallyfor eachdocumentand loadingin our

system.Part of thefuturework wouldbeto automatethisprocess.Thisprocesscouldbe

donewith a mixtureof XQueryandJava in parallelto theloadingprocess.

Finally, thework describedin Section9.1mustbeimplementedsuchthatSQLgener-

ationis purelyincrementalandawareof thehierarchyof thealgebratree.

63

Page 76: Order-sensitive XML Query Processing Over Relational Sources

Bibliography

[1] E.T. Bray, J.Paoli,andC.Sperberg-McQueen.ExtensibleMarkupLanguage(XML),1997.http://www.w3.org/TR/PR-xml-971208.

[2] M. J. Carey, J. Kiernan,J. Shanmugasundaram,E. J. Shekita,andS. N. Subrama-nian.XPERANTO: Middlewarefor PublishingObject-RelationalDataasXML Doc-uments.In TheVLDBJournal, pages646–648,2000.

[3] D. ChamberlinandJ. RobieandD. Florescu.Quilt: An XML QueryLanguageforHeterogeneousDataSources.In ACM SIGMODAssociatedWorkshopon the WebandDatabases(WebDB2000),Dallas,Texas, pages53–62,May 2000

[4] S. Christ,X. ZhangandE. A. Rundensteiner. X-Cube: Flexible XML MappingbyXQuery. Technicalreport,WorcesterPolytechnicInstitute,2001.

[5] K. Deschler. Xml navigationindexing. Master’s thesis,WorcesterPolytechnicInsti-tute,2001.

[6] M. F. Fernandez,A. Morishima,D. Suciu,andW. C.Tan.PublishingRelationalDatain XML: the SilkRouteApproach. IEEE Data EngineeringBulletin, 24(2):12–19,2001.

[7] D. Florescu,D. Kossman.StoringandQueryingXML DatausinganRDBMS. IEEEData EngineeringBulletin, 22(3),1999.

[8] L. Galanis, E. Viglas, D. J. DeWitt, J. F. Naughton, and D. Maier. Follow-ing the paths of xml data: An algebraic framework for xml query evaluation.http://www.cs.wisc.edu/niagara/papers/algebra.pdf,2001.

[9] H. V.Jagadish,S. Al-Khalifa, A. Chapman,L. Lakshmanan,A. Nierman,S. Papari-zos,J.M. Patel,D. Srivastava,N. Wiwatwattana,Y. Wu,andC. Yu. Timber:A nativexml database,2002.VLDBJournal, 11(4),2002.

[10] I. Manolescu,D. Florescu,D. Kossmann,F. Xhumari,andD. Olteanu.Agora: Liv-ing with xml andrelational.In A. E.Abbadi,M. L. Brodie,S.Chakravarthy, U. Dayal,N. Kamel,G.Schlageter, andK.-Y. Whang,editors,VLDB2000,Proceedingsof 26thInternationalConferenceon Very LargeData Bases,September10-14,2000,Cairo,Egypt, pages623–626.MorganKaufmann,2000.

64

Page 77: Order-sensitive XML Query Processing Over Relational Sources

[11] R. Ramakrishnan.DatabaseManagementSystems. WCB/McGraw-Hill, 2002

[12] R. Ramakrishnan,D. Donjerkovic, A. Ranganathan,K. S. Beyer, and M. Krish-naprasad.Srql: Sortedrelationalquery language. In M. Rafanelli and M. Jarke,editors,10th InternationalConferenceon Scientificand StatisticalDatabaseMan-agement,Proceedings,Capri, Italy, July 1-3, 1998, pages84–95.IEEE ComputerSociety, 1998.

[13] A. Sahuguet.Kweelt: More thanjust ”yet anotherframework to queryxml!”. InDemoSessionProceedingsof SIGMOD’01, page602,2001.

[14] ShoreTeam. Shore:Combiningthe bestfeaturesof oodbmsandfile systems.InM. J. Carey andD. A. Schneider, editors,Proceedingsof the 1995ACM SIGMODInternationalConferenceon Managementof Data,SanJose,California, May22-25,1995, page486.ACM Press,1995.

[15] I. Tatarinov, S. D. Viglas, K. Beyer, J. Shanmugasundaram,E. J. Shekita,andC. ZhangStoringandQueryingOrderedXML Usinga RelationalDatabaseSystem.ACM SIGMOD, pages204-215,2002.

[16] W3C. DocumentObjectModel (DOM). http://www.w3.org/TR/REC-DOM-Level-1/, 1998.

[17] W3C. XQuery: A Query Languagefor XML. http://www.w3.org/TR/xquery/,February2001.

[18] W3C. XQuery 1.0 and XPath 2.0 Data Model. http://www.w3.org/TR/query-datamodel,December2001.

[19] C. Zainolo,S. Ceri, C. Faloustos,R. Snodgrass,V. S. SubrahmanianandR. Zicari.AdvancedDatabaseSystems. MorganKaufman,1997.

[20] X. Zhang. XML Query Decorrelation. Technicalreport, WorcesterPolytechnicInstitute,2002. to appear.

[21] X. Zhang,G. Mitchell, W.-C. Lee,andE. A. Rundensteiner. Clock: SynchronizingInternal RelationalStoragewith ExternalXML Documents. In RIDE-DM, pages111–118,April 2001.

[22] X. Zhang, M. Mulchandani,S. Christ, B. Murphy, and B. Pielech. Rainbow:Mapping-DrivenXQueryProcessingSystem.In DemoSessionProceedingsof SIG-MOD’02, 2002.

[23] X. Zhang,B. Pielech,andE. A. Rundensteiner. Honey, I ShrunktheXQuery! —An XML AlgebraOptimizationApproach.In WIDM, pages15–22,Nov. 2002.

65

Page 78: Order-sensitive XML Query Processing Over Relational Sources

[24] X. Zhang,B. Pielech,L. Ding, B. Murphy, L. Wang,K. Dimitrova,M. El SayedandE. A. Rundensteiner. RainbowII: Multi-XQuery OptimizationUsing MaterializedXML Views. In DemoSessionProceedingsof SIGMOD’03, 2003.

[25] X. Zhang,B. Pielech,and E. A. Rundensteiner. XAT Optimization. TechnicalReportWPI-CS-TR-02-25,WorcesterPolytechnicInstitute,2002.

[26] X. ZhangandE. A. Rundensteiner. XAT: XML Algebrafor theRainbow System.TechnicalReportWPI-CS-TR-02-24,WorcesterPolytechnicInstitute,July2002.

66

Page 79: Order-sensitive XML Query Processing Over Relational Sources

Appendix A

Loading Queries

A.1 SchemaQueries

A.1.1 EdgeLoading SchemaQuery<xsd:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">

<xsd:element name="DB"><xsd:complexType>

<xsd:sequence><xsd:element name="EDGES">

<xsd:complexType><xsd:sequence>

<xsd:element name="EDGES.ROW" minOccurs="1" maxOccurs="unbounded"><xsd:complexType>

<xsd:sequence><xsd:element name="SOURCE" type="xsd:double"/>,<xsd:element name="POSITION" type="xsd:double"/>,<xsd:element name="NAME" type="xsd:string"/>,<xsd:element name="TARGET" type="xsd:string"/>

</xsd:sequence></xsd:complexType>

</xsd:element></xsd:sequence>

</xsd:complexType></xsd:element>

</xsd:sequence></xsd:complexType>

</xsd:element></xsd:schema>

67

Page 80: Order-sensitive XML Query Processing Over Relational Sources

A.1.2 Inline Loading SchemaQuery<xsd:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">

<xsd:element name="DB"><xsd:complexType>

<xsd:sequence><xsd:element name="EDGES">

<xsd:complexType><xsd:sequence>

<xsd:element name="EDGES.ROW" minOccurs="1" maxOccurs="unbounded"><xsd:complexType>

<xsd:sequence><xsd:element name="SOURCE" type="xsd:double"/>,<xsd:element name="POSITION" type="xsd:double"/>,<xsd:element name="NAME" type="xsd:string"/>,<xsd:element name="TARGET" type="xsd:string"/>

</xsd:sequence></xsd:complexType>

</xsd:element></xsd:sequence>

</xsd:complexType></xsd:element>

</xsd:sequence></xsd:complexType>

</xsd:element></xsd:schema>

68

Page 81: Order-sensitive XML Query Processing Over Relational Sources

A.2 Step0

A.2.1 Local EdgeLoading Step0FUNCTION Q1($root, $pid, $porder1){

LET $maintag := gettag($root),$iid := getiid(),$porder2 := 0,$pos := 0

RETURN{

<$maintag type="ELEM" iid=$iid pid=$pid porder=$porder1>FOR $attribute IN $root/@*LET $atttag := gettag($attribute),

$attiid := getiid(),$pcdataiid := getiid()

RETURN{

<$atttag type="ATT" iid=$attiid pid=$iid porder="0.0"><CDATA type="CDATA" iid=$pcdataiid pid=$attiid porder="0.0">

$attribute</CDATA>

</$atttag>},FOR $elem IN $root/*LET $pos := $pos + 1,

$pcd := $root/text()[position()=$pos]RETURN{

IF (TRIM($pcd)="")THEN{

""}ELSE{

LET $pcdataiid := getiid(),$porder2 := $porder2 + 1

RETURN{

<PCDATA type="PCDATA" iid=$pcdataiid pid=$iid porder=$porder2>TRIM($pcd)

</PCDATA>}

},LET $porder2 := $porder2 + 1RETURN{

Q1($elem, $iid, $porder2)}

},LET $pos := $pos + 1,

$pcd := $root/text()[position()=$pos]RETURN{

IF (TRIM($pcd)="")THEN{

""}ELSE{

LET $pcdataiid := getiid(),$porder2 := $porder2 + 1

RETURN

69

Page 82: Order-sensitive XML Query Processing Over Relational Sources

{<PCDATA type="PCDATA" iid=$pcdataiid pid=$iid porder=$porder2>

TRIM($pcd)</PCDATA>

}}

}</$maintag>

}}

Q1(document("temp/source.xml"), 0, 1)

70

Page 83: Order-sensitive XML Query Processing Over Relational Sources

A.2.2 Global EdgeLoading Step0FUNCTION Q1($root, $pid){

LET $maintag := gettag($root),$iid := getiid(),$pos := 0

RETURN{

<$maintag type="ELEM" iid=$iid pid=$pid porder=$iid>FOR $attribute IN $root/@*LET $atttag := gettag($attribute),

$attiid := getiid(),$pcdataiid := getiid()

RETURN{

<$atttag type="ATT" iid=$attiid pid=$iid porder="0.0"><CDATA type="CDATA" iid=$pcdataiid pid=$attiid porder="0.0">

$attribute</CDATA>

</$atttag>},FOR $elem IN $root/*LET $pos := $pos + 1,

$pcd := $root/text()[position()=$pos]RETURN{

IF (TRIM($pcd)="")THEN{

""}ELSE{

LET $pcdataiid := getiid()RETURN{

<PCDATA type="PCDATA" iid=$pcdataiid pid=$iid porder=$pcdataiid>TRIM($pcd)

</PCDATA>}

},RETURN{

Q1($elem, $iid)}

},LET $pos := $pos + 1,

$pcd := $root/text()[position()=$pos]RETURN{

IF (TRIM($pcd)="")THEN{

""}ELSE{

LET $pcdataiid := getiid()RETURN{

<PCDATA type="PCDATA" iid=$pcdataiid pid=$iid porder=$pcdataiid>TRIM($pcd)

</PCDATA>}

}

71

Page 84: Order-sensitive XML Query Processing Over Relational Sources

}</$maintag>

}}

Q1(document("temp/source.xml"), 0)

72

Page 85: Order-sensitive XML Query Processing Over Relational Sources

A.2.3 Lexicographical EdgeLoading Step0FUNCTION Q1($root, $pid, $paorder){

LET $maintag := gettag($root),$iid := getiid(),$pos := 0,$porder1 := "aa",$fake := LEXICOGRAPHICALORDER("reset"),$corder := "aa"

RETURN{

<$maintag type="ELEM" iid=$iid pid=$pid porder=$paorder>FOR $attribute IN $root/@*LET $atttag := gettag($attribute),

$attiid := getiid(),$pcdataiid := getiid()

RETURN{

<$atttag type="ATT" iid=$attiid pid=$iid porder="0.0"><CDATA type="CDATA" iid=$pcdataiid pid=$attiid porder="0.0">

$attribute</CDATA>

</$atttag>},FOR $elem IN $root/*LET $pos := $pos + 1,

$pcd := $root/text()[position()=$pos],$porder1 := LEXICOGRAPHICALORDER(count($root/../*)),$corder := CONCAT($paorder, ".", $porder1)

RETURN{

IF (TRIM($pcd)="")THEN{

""}ELSE{

LET $pcdataiid := getiid()RETURN{

<PCDATA type="PCDATA" iid=$pcdataiid pid=$iid porder=$corder>TRIM($pcd)

</PCDATA>}

},RETURN{

Q1($elem, $iid, $corder)}

},LET $pos := $pos + 1,

$pcd := $root/text()[position()=$pos],$porder1 := LEXICOGRAPHICALORDER(count($root/../*)),$corder := CONCAT($paorder, ".", $porder1)

RETURN{

IF (TRIM($pcd)="")THEN{

""}ELSE{

LET $pcdataiid := getiid()

73

Page 86: Order-sensitive XML Query Processing Over Relational Sources

RETURN{

<PCDATA type="PCDATA" iid=$pcdataiid pid=$iid porder=$corder>TRIM($pcd)

</PCDATA>}

}}

</$maintag>}

}

Q1(document("temp/source.xml"), 0, "aa")

74

Page 87: Order-sensitive XML Query Processing Over Relational Sources

A.2.4 Local Inline Loading Step0FUNCTION Q1($root, $pid, $porder1){

LET $maintag := gettag($root),$iid := getiid(),$porder2 := 0,$pos := 0

RETURN{

<$maintag type="ELEM" iid=$iid pid=$pid porder=$porder1>FOR $attribute IN $root/@*LET $atttag := gettag($attribute),

$attiid := getiid(),$pcdataiid := getiid()

RETURN{

<$atttag type="ATT" iid=$attiid pid=$iid porder="0.0"><CDATA type="CDATA" iid=$pcdataiid pid=$attiid porder="0.0">

$attribute</CDATA>

</$atttag>},FOR $elem IN $root/*LET $pos := $pos + 1,

$pcd := $root/text()[position()=$pos]RETURN{

IF (TRIM($pcd)="")THEN{

""}ELSE{

LET $pcdataiid := getiid(),$porder2 := $porder2 + 1

RETURN{

<PCDATA type="PCDATA" iid=$pcdataiid pid=$iid porder=$porder2>TRIM($pcd)

</PCDATA>}

},LET $porder2 := $porder2 + 1RETURN{

Q1($elem, $iid, $porder2)}

},LET $pos := $pos + 1,

$pcd := $root/text()[position()=$pos]RETURN{

IF (TRIM($pcd)="")THEN{

""}ELSE{

LET $pcdataiid := getiid(),$porder2 := $porder2 + 1

RETURN{

<PCDATA type="PCDATA" iid=$pcdataiid pid=$iid porder=$porder2>

75

Page 88: Order-sensitive XML Query Processing Over Relational Sources

TRIM($pcd)</PCDATA>

}}

}</$maintag>

}}

Q1(document("temp/source.xml"), 0, 1)

76

Page 89: Order-sensitive XML Query Processing Over Relational Sources

A.2.5 Global Inline Loading Step0FUNCTION Q1($root, $pid){

LET $maintag := gettag($root),$iid := getiid(),$pos := 0

RETURN{

<$maintag type="ELEM" iid=$iid pid=$pid porder=$iid>FOR $attribute IN $root/@*LET $atttag := gettag($attribute),

$attiid := getiid(),$pcdataiid := getiid()

RETURN{

<$atttag type="ATT" iid=$attiid pid=$iid porder="0.0"><CDATA type="CDATA" iid=$pcdataiid pid=$attiid porder="0.0">

$attribute</CDATA>

</$atttag>},FOR $elem IN $root/*LET $pos := $pos + 1,

$pcd := $root/text()[position()=$pos]RETURN{

IF (TRIM($pcd)="")THEN{

""}ELSE{

LET $pcdataiid := getiid()RETURN{

<PCDATA type="PCDATA" iid=$pcdataiid pid=$iid porder=$pcdataiid>TRIM($pcd)

</PCDATA>}

},LET $porder1 := getiid()RETURN{

Q1($elem, $iid)}

},LET $pos := $pos + 1,

$pcd := $root/text()[position()=$pos]RETURN{

IF (TRIM($pcd)="")THEN{

""}ELSE{

LET $pcdataiid := getiid()RETURN{

<PCDATA type="PCDATA" iid=$pcdataiid pid=$iid porder=$pcdataiid>TRIM($pcd)

</PCDATA>}

77

Page 90: Order-sensitive XML Query Processing Over Relational Sources

}}

</$maintag>}

}

Q1(document("temp/source.xml"), 0)

78

Page 91: Order-sensitive XML Query Processing Over Relational Sources

A.2.6 Lexicographical Inline Loading Step0FUNCTION Q0($root, $pid, $paorder){

LET $maintag := gettag($root),$iid := getiid(),$pos := 0,$porder1 := "aa",$fake := LEXICOGRAPHICALORDER("reset"),$corder := "aa"

RETURN{

<$maintag type="ELEM" iid=$iid pid=$pid porder=$paorder>FOR $attribute IN $root/@*LET $atttag := gettag($attribute),

$attiid := getiid(),$pcdataiid := getiid()

RETURN{

<$atttag type="ATT" iid=$attiid pid=$iid porder="0.0"><CDATA type="CDATA" iid=$pcdataiid pid=$attiid porder="0.0">

$attribute</CDATA>

</$atttag>},FOR $elem IN $root/*LET $pos := $pos + 1,

$pcd := $root/text()[position()=$pos],$porder1 := LEXICOGRAPHICALORDER(count($root/../*)),$corder := CONCAT($paorder, ".", $porder1)

RETURN{

IF (TRIM($pcd)="")THEN{

""}ELSE{

LET $pcdataiid := getiid()RETURN{

<PCDATA type="PCDATA" iid=$pcdataiid pid=$iid porder=$corder>TRIM($pcd)

</PCDATA>}

},RETURN{

Q0($elem, $iid, $corder)}

},LET $pos := $pos + 1,

$pcd := $root/text()[position()=$pos],$porder1 := LEXICOGRAPHICALORDER(count($root/../*)),$corder := CONCAT($paorder, ".", $porder1)

RETURN{

IF (TRIM($pcd)="")THEN{

""}ELSE{

LET $pcdataiid := getiid()

79

Page 92: Order-sensitive XML Query Processing Over Relational Sources

RETURN{

<PCDATA type="PCDATA" iid=$pcdataiid pid=$iid porder=$corder>TRIM($pcd)

</PCDATA>}

}}

</$maintag>}

}

Q0(document("temp/source.xml"), 0, "aa")

80

Page 93: Order-sensitive XML Query Processing Over Relational Sources

A.3 Step1This stepis the samefor eachloading regardlessof ordering

A.3.1 EdgeLoading Step1FUNCTION Q1($root){

LET $tag := gettag($root)RETURN{

IF ($root/PCDATA)THEN{

<EDGES.ROW><SOURCE>$root/@pid</SOURCE>,<POSITION>$root/@eorder</POSITION>,<NAME>$tag</NAME>,<TARGET>$root/PCDATA/text()</TARGET>

</EDGES.ROW>}ELSE{

IF ($root/CDATA)THEN{

<EDGES.ROW><SOURCE>$root/@pid</SOURCE>,<POSITION>$root/@eorder</POSITION>,<NAME>$tag</NAME>,<TARGET>$root/CDATA/text()</TARGET>

</EDGES.ROW>}ELSE{

<EDGES.ROW><SOURCE>$root/@pid</SOURCE>,<POSITION>$root/@eorder</POSITION>,<NAME>$tag</NAME>,<TARGET>$root/@iid</TARGET>

</EDGES.ROW>,FOR $child IN $root/*[./@type="ELEM" OR ./@type="ATT"]RETURN{

Q1($child)}

}}

}}

<DB xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:noNamespaceSchemaLocation="schema.xsd"><EDGES>

Q1({document("temp/result0.xml")})</EDGES>

</DB>

81

Page 94: Order-sensitive XML Query Processing Over Relational Sources

A.3.2 Inline Loading Step1FUNCTION Inlinable($root){

LET $elements := document("temp/source.xsd")//xs:element [./@name=$root AND NOT ./@maxOccurs!="1"]RETURN{

IF ( COUNT($elements)=0 )THEN{

FALSE}ELSE{

LET $choices := document("temp/source.xsd")//xs:choice [./xs:element/@name=$root]UNION document("temp/source.xsd")/xs:element [./@name=$root]

RETURN{

IF ( COUNT($choices)=0 )THEN{

TRUE}ELSE{

FALSE}

}}

}}

FUNCTION PCDATAInline($root){

LET $choices := document("temp/source.xsd")//xs:element[./@name=$rootAND NOT ./@ref]/xs:complexType/xs:choice

RETURN{

IF ( COUNT($choices)=0 )THEN{

TRUE}ELSE{

FALSE}

}}

FUNCTION Q1($root, $parentinlinable, $PCDATAInline){

LET $tag := gettag($root)RETURN{

IF ($root/@type="ELEM")THEN{

<$tag INLINABLE=Inlinable($tag) $root/@*>FOR $child IN $root/*

RETURN{

Q1($child, Inlinable($tag), PCDATAInline($tag))}

</$tag>

82

Page 95: Order-sensitive XML Query Processing Over Relational Sources

}ELSE{

IF ($root/@type="ATT")THEN{

LET $tag := CONCAT($tag, "_ATT")RETURN{

<$tag INLINABLE="TRUE" $root/@*>$root/*[1]/text()

</$tag>}

}ELSE{

IF ($parentinlinable="TRUE" and $PCDATAInline="TRUE")THEN{

$root/text()}ELSE{

<$tag INLINABLE=$PCDATAInline $root/@*>$root/text()

</$tag>

}}

}}

}

<DB>Q1({document("temp/result0.xml")}, "FALSE", "TRUE")

</DB>

83

Page 96: Order-sensitive XML Query Processing Over Relational Sources

A.4 Step2This stepis the samefor eachloading regardlessof ordering

A.4.1 EdgeLoading Step2Thereis no secondstepfor theEdgemapping.

A.4.2 Inline Loading Step2FUNCTION Q2($root, $path){

RETURN{

LET $tag := CONCAT($path, "_", gettag($root)),$elements := $root/*[(./@type="ELEM" OR ./@type="PCDATA") AND ./@INLINABLE="TRUE"]

RETURN{

IF (COUNT($elements) = 0)THEN{

IF (TRIM($root/text())="")THEN{

LET $iidtag := CONCAT($tag, "_IID")RETURN{

<$iidtag>$root/@iid</$iidtag>}

}ELSE{

<$tag>$root/text()</$tag>}

}ELSE{

FOR $elem IN $elementsRETURN{

Q2($elem, $tag)}

}},FOR $attchild IN $root/*[./@type="ATT"]LET $atttag := CONCAT($path, "_", gettag($root), "_", gettag($attchild))RETURN{

<$atttag>$attchild/text()</$atttag>}

}}

<DB xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:noNamespaceSchemaLocation="schema.xsd">FOR $tag IN DISTINCT (FOR $xxx IN document("temp/result1.xml")//*[./@INLINABLE="FALSE"]

RETURN{<T>gettag($xxx)</T>})LET $tables := document("temp/result1.xml")//*[name(.)=TRIM($tag/text()) AND ./@INLINABLE="FALSE"],

$tag := $tag/text()RETURN{

<$tag>FOR $table IN $tablesLET $tabletag := CONCAT($tag, ".ROW")RETURN

84

Page 97: Order-sensitive XML Query Processing Over Relational Sources

{<$tabletag>

<IID>$table/@iid</IID>,<PID>$table/@pid</PID>,<POSITION>$table/@porder</POSITION>,IF ($tag="PCDATA")THEN{

<VALUE>$table/text()</VALUE>}ELSE{

FOR $attchild IN $table/*[./@type="ATT"]LET $atttag := CONCAT($tag, "_", gettag($attchild))RETURN{

<$atttag>$attchild/text()</$atttag>},FOR $elem IN $table/*[(./@type="ELEM" OR ./@type="PCDATA") AND ./@INLINABLE="TRUE"]RETURN{

Q2($elem, $tag)}

}</$tabletag>

}</$tag>

}</DB>

85

Page 98: Order-sensitive XML Query Processing Over Relational Sources

Appendix B

View Queries

B.0.3 EdgeLoading<RECORDLIST>FOR $playMap in document("temp/dxvfe.xml")/EDGES/EDGES.ROW[

NAME/text()="short_play"]RETURN<SHORT_PLAY pos=$playMap/POSITION/text()>FOR $songMap in document("temp/dxvfe.xml")/EDGES/EDGES.ROW[

SOURCE = $playMap/TARGET and NAME/text()="song"]RETURN

<SONG pos=$songMap/POSITION/text()>$songMap/TARGET/text()</SONG>SORTBY (./@pos ASCENDING)

</SHORT_PLAY>SORTBY(./@pos)</RECORDLIST>

B.0.4 Inline Loading<RECORDLIST>FOR $playMap in document("temp/dxvli.xml")/SHORT_PLAY/SHORT_PLAY.ROWRETURN<SHORT_PLAY pos=$playMap/POSITION/text()>FOR $songMap in document("temp/dxvli.xml")/SONG/SONG.ROW[

PID/text() = $playMap/IID/text()]RETURN

<SONG pos=$songMap/POSITION/text()>$songMap/SONG_PCDATA/text()

</SONG>SORTBY (./@pos ASCENDING)

</SHORT_PLAY>SORTBY(./@pos)</RECORDLIST>

86

Page 99: Order-sensitive XML Query Processing Over Relational Sources

Appendix C

Recordlist XML Schema

<?xml version="1.0"?><xs:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"

elementFormDefault="qualified"><xs:element name="RECORDLIST">

<xs:complexType><xs:sequence>

<xs:element name="SHORT_PLAY" maxOccurs="unbounded"><xs:complexType>

<xs:sequence><xs:element name="BAND" type="xs:string"/><xs:element name="LABEL" type="xs:string"/><xs:element name="SONG" maxOccurs="6"

type="xs:string"/></xs:sequence>

</xs:complexType></xs:element>

</xs:sequence></xs:complexType>

</xs:element></xs:schema>

87

Page 100: Order-sensitive XML Query Processing Over Relational Sources

Appendix D

Default XML View

Recordlist DXV by Inline Global Strategy

<?xml version="1.0"?><DB xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:noNamespaceSchemaLocation="schema.xsd">

<band><band.ROW>

<IID>33.0

</IID><PID>31.0

</PID><POSITION>33.0

</POSITION><band_PCDATA>Project X

</band_PCDATA></band.ROW><band.ROW>

<IID>19.0

</IID><PID>17.0

</PID><POSITION>19.0

</POSITION><band_PCDATA>Misfits

</band_PCDATA></band.ROW><band.ROW>

<IID>5.0

</IID><PID>3.0

</PID><POSITION>5.0

</POSITION><band_PCDATA>Misfits

</band_PCDATA>

88

Page 101: Order-sensitive XML Query Processing Over Relational Sources

</band.ROW></band><label>

<label.ROW><IID>36.0

</IID><PID>31.0

</PID><POSITION>36.0

</POSITION><label_PCDATA>Schism

</label_PCDATA></label.ROW><label.ROW>

<IID>22.0

</IID><PID>17.0

</PID><POSITION>22.0

</POSITION><label_PCDATA>Plan 9

</label_PCDATA></label.ROW><label.ROW>

<IID>8.0

</IID><PID>3.0

</PID><POSITION>8.0

</POSITION><label_PCDATA>blank

</label_PCDATA></label.ROW>

</label><song>

<song.ROW><IID>39.0

</IID><PID>31.0

</PID><POSITION>39.0

</POSITION><song_PCDATA>SXE Revenge

</song_PCDATA></song.ROW><song.ROW>

<IID>42.0

</IID>

89

Page 102: Order-sensitive XML Query Processing Over Relational Sources

<PID>31.0

</PID><POSITION>42.0

</POSITION><song_PCDATA>Shutdown

</song_PCDATA></song.ROW><song.ROW>

<IID>25.0

</IID><PID>17.0

</PID><POSITION>25.0

</POSITION><song_PCDATA>Bullet

</song_PCDATA></song.ROW><song.ROW>

<IID>28.0

</IID><PID>17.0

</PID><POSITION>28.0

</POSITION><song_PCDATA>We Are 138

</song_PCDATA></song.ROW><song.ROW>

<IID>11.0

</IID><PID>3.0

</PID><POSITION>11.0

</POSITION><song_PCDATA>Cough Cool

</song_PCDATA></song.ROW><song.ROW>

<IID>14.0

</IID><PID>3.0

</PID><POSITION>14.0

</POSITION><song_PCDATA>She

</song_PCDATA>

90

Page 103: Order-sensitive XML Query Processing Over Relational Sources

</song.ROW></song><short_play>

<short_play.ROW><IID>3.0

</IID><PID>1.0

</PID><POSITION>3.0

</POSITION></short_play.ROW><short_play.ROW>

<IID>17.0

</IID><PID>1.0

</PID><POSITION>17.0

</POSITION></short_play.ROW><short_play.ROW>

<IID>31.0

</IID><PID>1.0

</PID><POSITION>31.0

</POSITION></short_play.ROW>

</short_play><recordlist>

<recordlist.ROW><IID>1.0

</IID><PID>0.0

</PID><POSITION>1.0

</POSITION></recordlist.ROW>

</recordlist></DB><!-- end of document -->

91