projecting xml documents

Click here to load reader

Post on 07-Jan-2016

24 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

Projecting XML Documents. Amélie Marian, Columbia University Jérôme Simeon, Bell Laboratories. Motivation. XQuery used in very different environments: XQuery implementations on XML stored in databases (with indexes). - PowerPoint PPT Presentation

TRANSCRIPT

  • Projecting XML DocumentsAmlie Marian, Columbia UniversityJrme Simeon, Bell Laboratories

  • MotivationXQuery used in very different environments:XQuery implementations on XML stored in databases (with indexes).Main-memory XQuery implementations on XML in files, sent as streams, computed on the flyExample Applications: Web Services (e.g., ActiveXML).Telecommunication apps (XML messages).XML documents.Information Integration.

  • Memory LimitationsMain-memory XQuery implementations cannot handle large documents.Complex XQuery expressions require materialization (DOM).DOM is the bottleneck.XMark Query 1 on an IBM laptop T23 (256Mb RAM)

  • Projection: Example ... ... Wagar Bougaut mailto:[email protected] Waheed Randomailto:[email protected] 32 Mallela St Tucson United States 377486 5185 1962 7735

    ... ... ... Wagar Bougaut mailto:[email protected] Waheed Randomailto:[email protected] 32 Mallela St Tucson United States 377486 5185 1962 7735

    ...XMark Query 1for $b in /site/people/person[@id=person0]return $b/nameLess than 2% of original document !

  • Projection: IntuitionGiven a query:For $b in /site/people/person[@id=person0]Return $b/nameMost nodes in the input document(s) are not required.Projection operation removes unnecessary nodes.Evaluation of the query on projected document yields the same results as on the original document.How it works:Projection defined by set of paths.Static analysis infers sets of paths used within a query.

    /site/people/person/site/people/person/@id/site/people/person/name

  • Projection: ChallengesFor an XQuery expression, compute all paths that allow to reach nodes required to evaluate the expression.XQuery is complex: VariablesCompositionSyntactic SugarComplex expressionsHave to be able to analyze all of XQuery.

  • ContributionsDefinition of a notion of projection for XML documents, based on path expressions.Static Analysis algorithm for arbitrary XQuery expressions used to infer projection paths.Loading algorithm to build projected XML document.Integration of XML Projection in XQuery processor (Galax).Experimental evaluation of projection on XML query processing.

  • XML ProjectionSimilar to relational projection:One key operation.Prunes unnecessary part of the data.Essential for memory management.Specific problems related to XML:Projection must operate on trees.Requires analysis of the query.Need to address XQuery complexity.

  • NotationProjection Paths:Path expressions are noted using XPath semantics (/site/people/person/@id)# notation used when subtree should be kept (/site/people/person/name#)Static Analysis: inference rule notationExpr => Paths

  • Static Analysis: VariablesVariables can be bound to nodes coming form different paths.for $b in /site/people/(teacher | student)return $b/nameAnalysis must remember paths to which variable was bound/site/people/teacher/site/people/student Environment is maintained during path analysis: Env |- Expr => Paths

  • Static Analysis: ExampleLiterals do not require any paths:

    Paths are propagated in a sequence:

    Static analysis algorithm is correct (see Technical Report) 32 => {}/a/b => {/a/b}/a/d => {/a/d}/a/b,/a/d => {/a/b,/a/d}

  • Static Analysis: Composition(if (count (/site/regions/*) = 3)then /site/people/personelse /site/open_auctions/open_auction)/@id/@id does not apply to /site/regions/*Final set of paths should be/site/regions/*/site/people/person/@id/site/open_auctions/open_auction/@idNeed to differentiate two sets of paths during analysis: Returned Paths: returned by the expression, further path steps are applied on them.Used Paths: used to compute the expression.

    Env |- Expr => Paths using UPaths

  • XQuery CoreSubset of XQuery:Reduced grammar.All operations are made explicit.Same expressive power as XQuery.Removes syntactic sugar.Simplifies complex expressionsAnalysis only needs to be applied to a small set of expressionsfor $b in /site/people/personreturn $b/namefor . in /return for . in child::site return for . in child::people return for . in child::person return child::nameXQueryXQuery Core

  • Optimized ProjectionXQuery Core decomposition may lead to redundant paths being kept:/site/site/people/site/people/person/site/people/person/name#Optimization on inference rules can avoid redundancyDetails in paper, full optimized analysis in technical report

  • XQuery Processing ArchitectureDocument Data Model

  • Loading Algorithm: DescriptionInput:Set of projection paths.Document SAX events.Decide on action to apply on document nodes:Skip: ignore node and its subtree.KeepSubtree: keep node and its subtree.Keep: keep node without its subtree.Move: keep processing SAX events. Current node is only kept if some of its children are kept.Keep a set of current paths.

  • Loading Algorithm: ExampleProjection Paths:/a/b/c#/a/dDocument Stream

    Current Paths:Loaded Nodes:/b/c#/dAction:MoveSkip/c#

    Keep SubtreecfKeepbd/a/b/c#/a/daSimilar to XML filtering algorithmsLimitations: - Backward Axis!- Number of current paths can be huge (descendant axis)

  • Experiments: SettingsXML Projection Evaluation:Effectiveness: projection impact on different queries.Maximum document size: largest document that can be processed.Processing time: effect on processing time.Experimental Setup:Default XMark document size: 50Mb.

  • Experiments: EffectivenessAll queries but one require less than 5% of the document.

  • Experiments: Maximal Document Size

    ConfigurationABCXMark Query 3 (simple selection with predicate)No Projection33Mb220Mb520MbOptimized Projection1Gb1.5Gb1.5GbXMark Query 14 (Non-selective path query with predicates)No Projection20Mb20Mb20MbOptimized Projection100Mb100Mb100MbXMark Query 15(Long, very selective path query)No Projection33Mb220Mb520MbOptimized Projection1Gb2Gb2Gb

  • Experiments: Query Execution TimeProjection significantly reduces query processing timeNext Bottleneck: Joins!

    Chart1

    227.91190.26189.73

    61.1824.9424.04

    65.629.6428.94

    60.8926.2625.95

    59.0211.7613.9

    43.25131.1625.88

    43.05118.1167.99

    60.019.739.8

    59.6614.398.99

    58.815.3812.01

    59.4123.1222.3

    60.2917.1916.9

    71.4865.9428.45

    63.3925.1625.11

    No Projection

    Projection

    Optimized Projection

    Total Query Execution Time (in seconds)

    Default

    SizeTimeMemoryParseTimeNodes

    DefaultProjectionOptimizedDefaultProjectionOptimizedDefaultProjectionOptimizedDefaultProjectionOptimizedDefaultProjectionOptimized

    ProjectionOptimizedProjectionOptimizedDefaultProjectionProjectionOptimizedOptimizedProjectionOptimized

    Query 157076181261.42371325381261.4237132530.730.260.252238.824299.90613.3956934533299.79313.39064616070.260.13500.1246.1538461538164355113.1092181325113.109218132

    Query 2570761190213.3325682729190213.33256827290.750.310.312230.949347.40215.57193822347.22715.56409402460.250.15600.14561661012377.44732089112377.447320891

    Query 3570761190213.3325682729190213.33256827290.820.370.392256.012372.49216.5110823879372.31616.50328101090.250.15600.15601650711346.869812806711346.8698128067

    Query 4570761214713.7618197459214713.76181974590.760.310.322253.461377.47316.7508113076377.30916.74353361340.250.17680.16641643511246.839062975411246.8390629754

    Query 557076128730.503363053928730.50336305390.730.220.22238.035260.44111.6370387416260.28911.63024706940.250.11440.140164351470.89443261331470.8944326133

    Query 657076157076110032160.56345826010.851.280.332228.4022229.484100.0485549735262.21911.76713178320.250.662640.228816435164351002381.448128993

    Query 7570761570761100122722.15011186820.981.330.382240.842241.902100.0473929419308.5713.77028257260.260.58223.07692307690.2596.153846153816435164351005623.4195314877

    Query 8570761112391.9691254308112391.96912543081.180.620.632240.414317.27714.1615344307316.98814.14863502910.260.13500.13501719714188.245624236814188.2456242368

    Query 9570761152512.6720466185152512.67204661851.250.730.742252.039352.98815.6741512913352.35215.64591021740.250.14560.14561705715018.799906196915018.7999061969

    Query 10570761280934.9220251559266184.66359824870.850.420.412248.297396.73417.6459782671386.99217.2126725250.260.1973.07692307690.1765.384615384620513540526.349144445532225.9445229854

    Query 11570761165192.8942061563150412.63525363511.460.80.812246.48348.16415.498201631341.25415.19060930880.260.1557.69230769230.1557.69230769231719716589.641216491215989.2923184276

    Query 12570761125632.201096430997561.70929688611.460.790.752256.316344.15615.2530053415324.32814.37422772340.250.15600.1456166339675.81374376247804.6894727349

    Query 13570761180853.1685766897180853.16857668970.720.20.22229.645272.92612.2407827255272.69112.2302429310.250.1400.140166674682.80794384114682.8079438411

    Query 1457076157076110014383125.1998647422.753.152.042231.3672232.516100.0514930982516.56223.1500241780.250.682720.261041643516435100183211.1469425008

    Query 1557076168221.19524634655190.09093123040.720.220.22229.77264.31611.8539580315239.04710.72070213520.250.12480.1144164382511.5269497506170.1034189074

    Query 1657076188621.552663899636080.63213849580.730.240.212233.152277.7712.4384726163257.44511.52832409080.250.13520.140164393482.11691708741610.9793783077

    Query 17570761123142.1574704649123142.15747046490.730.260.252230.688305.37113.6895433158305.13713.67905327860.250.12480.1248166998975.37157913658975.3715791365

    Query 1857076139220.687152766239220.68715276620.710.220.212228.672255.44511.4617583924255.28511.45457922920.250.11440.1144164351851.12564648621851.1256464862

    Query 1957076132789257.4482138759132552.32233807150.770.580.332230.5661000.25444.843057771308.17613.81604489620.250.311240.197616975621936.636229749612107.1281296024

    Query 2057076162171.089247513462171.08924751340.790.30.32276.289325.19114.2860155279325.10514.28223744880.250.12480.1248164534192.54664802774192.5466480277

    19.71555519743.2032321762241.010928.653023223314.34012396130.25250.2287.09230769230.14958.969230769216821.53659.721.80706983151038.055.9104519645

    Default

    Query 10.25

    Query 20.24

    Query 30.25

    Query 40.26

    Query 50.25

    Query 60.25

    Query 70.27

    Query 80.25

    Query 90.25

    Query 100.25

    Query 110.25

    Query 120.25

    Query 130.25

    Query 150.26

    Query 160.25

    Query 170.26

    Query 180.25

    Query 200.24

    Default

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    Projection

    Optimized

    Size as percentage of the size of the original document

    10M

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    Projection

    Optimized

    Memory usage as percentage of memory used for the original document

    20M

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    Default

    Projection

    Optimized

    Total Query Execution Time (in seconds)

    50M

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    Default

    Projection

    Optimized

    Parsing+Loading time (in seconds)

    DOM

    SizeTimeMemoryParseTimeNodes

    DefaultProjectionOptimizedDefaultProjectionOptimizedDefaultProjectionOptimizedDefaultProjectionOptimizedDefaultProjectionOptimized

    ProjectionOptimized ProjectionProjectionOptimized ProjectionProjectionOptimizedProjectionOptimized

    Query 1105469231487811.41065787621487811.410657876216.147.567.5136024.3281207.3983.35161838411207.2853.35130470724.342.4155.52995391712.3654.377880184329254791793.137615494391793.1376154943

    Query 2105469233359463.18525128133359463.185251281311.683.783.7236016.4532150.2275.97012426522150.0515.96963559964.552.7961.31868131872.7460.2197802198295680218337.3839962121218337.3839962121

    Query 3105469233359463.18525128133359463.185251281312.584.834.7336041.5162175.3166.03558407482175.1416.03509852364.362.9868.34862385322.8565.3669724771294263204166.9380112348204166.9380112348

    Query 4105469233869073.66843486013869073.668434860111.424.063.9536038.9652302.9026.39003367612302.7386.3895786134.343.1171.6589861751369.1244239631292547197866.7633576827197866.7633576827

    Query 510546923509840.4834016518509840.483401651810.992.292.3236023.539511.7971.4207293736511.6451.42030742734.262.0447.88732394372.0748.591549295829254726220.89626624126220.896266241

    Query 61054692310546923100568550.539067176317.4226.734.2636013.90636014.98100.0029821814650.5351.80634391624.7113.91295.32908704884.0585.987261146529254729254710041181.4076370634

    Query 710546923105469231002162592.050446371924.3732.266.6536026.34436027.398100.00292563691269.233.52306079134.3512.25281.60919540236.11140.459770114929254729254710099503.4011628901

    Query 8105469232062011.95508206522062011.9550820652164.06131.44131.2436025.9181479.3594.10637419431479.074.1055719944.382.761.64383561642.6560.502283105306311255638.3454397655255638.3454397655

    Query 9105469232804932.65947708162804932.6594770816189.73162.68160.4936037.5431898.3795.26778143561897.7425.26601383454.52.97662.964.4444444444303755269748.8801830423269748.8801830423

    Query 10105469235015904.75579465214767114.519905947934.2726.4825.8236033.8012702.3057.49936150232559.1337.10203455924.343.8288.01843317973.2875.576036866440774113860933.994373879513724233.6591120344

    Query 11105469233014132.85782877152676252.5374699332239.51167.41166.136031.9841904.8795.28663367521786.2074.95728184164.293.2475.52447552453.4179.4871794872306311296469.6783987516285679.3261423847

    Query 12105469232321452.20106850121734911.6449442174239.99167.82163.1336041.821662.094.61155957161310.7073.6366282284.333.2274.36489607393.2875.7505773672295127161685.4783195031127954.3354216998

    Query 13105469233904473.70199915183904473.701999151811.222.042.0436015.148951.0232.6406194416950.7892.6399697154.531.7839.2935982341.7739.072847682129708891073.065421693291073.0654216932

    Query 141054692310546923100348779033.0692657944137.49147.76125.9836016.87136018.012100.00316795986575.52318.25678582684.4613.74308.07174887895.12114.7982062782925472925471004027013.7653095058

    Query 15105469231359971.289447168637330.035394209311.142.51.8636015.273740.5592.0562359752246.1840.68355444654.542.27501.7738.98678414129256849021.67550791611010.0345218889

    Query 16105469231747431.6568149782617890.58584859311.262.642.1436018.656921.592.5586462749518.5391.43964005764.552.3351.20879120881.9242.197802197829257566552.274630436627200.9296761514

    Query 17105469232249142.13250822062249142.132508220611.323.092.9936016.1911444.1644.00976327561443.934.00911356784.532.5756.73289183222.4854.7461368653297319161535.4328852176161535.4328852176

    Query 1810546923714700.6776383975714700.677638397511.32.392.3536014.176572.7071.5902265819572.5471.58978231244.552.1346.81318681322.146.153846153829254732471.10990712632471.109907126

    Query 1910546923633802960.09363109982410252.285263673612.289.884.3636016.0714346.72739.83423788331478.5744.10531743194.536.01132.67108167773.6279.911699779230230711269837.2793220137216867.173502433

    Query 20105469231105651.0483152291105651.04831522911.963.593.5536061.7931001.0942.77605165111001.0082.77581317164.392.4255.1252847382.454.669703872429256569952.390921675569952.3909216755

    19.84813011343.568781150736026.5147520.27073285074.45314182834.44154.434599.357503771867.5212592821301571.9567409.722.236227894320966.26.4188245718

    DOM

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    Projection

    Optimized Projection

    Size as percentage of the size of the original document

    Query 1

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    Projection

    Optimized Projection

    Memory usage as percentage of memory used for the original document

    Sheet3

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    Default

    Projection

    Optimized

    Total QueryExecution Time, in Seconds (10M document)

    Sheet1

    SizeTimeMemoryParseTimeNodes

    DefaultProjectionOptimizedNo ProjectionProjectionOptimized ProjectionDefaultProjectionOptimizedDefaultProjectionOptimizedDefaultProjectionOptimized

    ProjectionOptimizedProjectionOptimizedProjectionOptimizedProjectionOptimized

    Query 1209353852991321.42883448292991321.428834482947.5829.9429.7571360.6952168.7033.03907213912168.593.03891378869.485.5958.96624472575.4957.9113924051581797183593.1555680074183593.1555680074

    Query 2209353856694503.19769614946694503.197696149423.328.037.8971352.824044.8835.66884812684044.7075.66860146529.096.1667.7667766777666.0066006601588052434537.3893125098434537.3893125098

    Query 3209353856694503.19769614946694503.197696149424.939.989.7271377.8834069.9735.7020085624069.7975.7017619879.186.4870.58823529416.2668.1917211329584923403246.8938988551403246.8938988551

    Query 4209353857730053.69233716037730053.692337160323.018.558.4571375.3324322.8556.05651123284322.6916.05628146159.096.7674.36743674376.773.7073707371581797393246.759058572393246.759058572

    Query 5209353851020800.48759552311020800.487595523122.334.234.6271359.906779.0081.0916606308778.8551.09144622479.073.8842.77839029774.2646.968026460958179752530.902892245952530.9028922459

    Query 620935385209353851001144860.546854046440.6363.368.8171350.27371351.348100.00150665161065.0311.49267964268.9431.95357.38255033568.4794.742729306558179758179710082211.4130358183

    Query 720935385209353851004327662.067150902663.1483.0816.6171362.71171363.766100.00147836312288.9453.20748044459.0328.62316.943521594715.59172.6467331118581797581797100199153.4230152441

    Query 8209353854146691.98070873794146691.9807087379659.66544.19536.3571362.2852711.3013.79934723222711.0123.79894225649.076.3369.79051819186.1467.6957001103609331511448.3934675899511448.3934675899

    Query 9209353855653782.70058563535653782.7005856353766.01662.82663.0771373.913542.54.96329821363541.8634.96240573069.436.9173.2767762466.8672.7465535525604208539558.9298718322539558.9298718322

    Query 10209353859897134.72746500729416924.4980878068109.6992.6691.2871370.1685049.737.07540719254774.096.68919540729.218.8696.19978284477.5982.410423452879781126186332.822686074825922232.4916552918

    Query 11209353856027702.87919233395325622.543836667953.21680.9673.4871368.3523541.0744.96168665913300.0594.62398094899.47.8583.51063829798.7693.1914893617609331591229.7027723848569639.348449365

    Query 12209353854654692.22336011493447941.6469436793955.81675.24663.3171378.1883045.2034.26629350692338.0433.27557068279.127.885.52631578958.4792.8728070175587179323815.5146726978256334.3654490368

    Query 13209353856922763.30672686466922763.306726864622.43.943.9471351.5161531.912.14699012141531.6762.14666216769.053.5339.00552486193.5238.8950276243590159167682.8412682006167682.8412682006

    Query 142093538520935385100694809433.1882790787420.18448.87385.371353.23871354.379100.001599086512895.94918.07339002618.9732.46361.87290969911.27125.6410256415817975817971008064213.8608483715

    Query 15209353852780671.3282153636134600.064293061722.325.13.6371351.6411256.3871.7608382686269.1760.37725271194.7552.77777777783.538.888888888958188499841.71580589953810.0654769679

    Query 16209353853557241.69915193821299280.620614333122.55.494.3271355.0231615.5592.2641139083810.271.1355472489.044.9554.75663716813.9743.9159292035581897134972.319482657656190.9656348117

    Query 17209353854621682.2075925524621682.20759255222.76.876.7171352.5592684.3753.76212855942684.1413.76180061049.115.9465.20307354565.863.6663007684590733320055.4178452871320055.4178452871

    Query 18209353851430540.68331200981430540.683312009822.645.255.1671350.543908.1761.2728368444908.0161.27261259959.14.8252.9670329674.7552.197802197858179764471.108118467464471.1081184674

    Query 19209353851255191459.95549644833002.308531703625.5821.419.4271352.43828275.06239.62732429692722.663.81579112979.1113.03143.02963776077.9286.937431394160134222466537.3606034503434027.2175234725

    Query 20209353852190081.04611403132190081.046114031324.437.677.7171398.161699.3672.38012716291699.2812.38000671179.25.4959.67391304355.5460.2173913043581815137912.3703410878137912.3703410878

    19.83710402273.570689528871362.8820519.99215383794.12851616229.134510.108111.31918469316.84374.9725672166599062.2133386.322.17988329141041.056.3656365517

    Sheet1

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    Projection

    Optimized

    Size of projected document as a percentage of the size of the original document (20M document)

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    Projection

    Optimized

    SAX Loading memory required as a percentage of the memory required by the total document (20M document)

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    No Projection

    Projection

    Optimized Projection

    SizeTimeMemoryParseTimeNodes

    DefaultProjectionOptimizedNo ProjectionProjectionOptimized ProjectionProjectionOptimizedNo ProjectionProjectionOptimized ProjectionDefaultProjectionOptimized

    ProjectionOptimized ProjectionProjectionOptimized ProjectionNo ProjectionProjectionProjectionOptimized ProjectionOptimized ProjectionProjectionOptimized

    Query 1524658207494181.42839280897494181.4283928089227.91190.26189.73178098.3015051.3712.83628253145051.2582.836219083327.2619.6472.046955245819.3170.83639031551454033459033.1569434807459033.1569434807

    Query 25246582016557253.155816491616557253.155816491661.1824.9424.04178090.4269638.4775.41212529869638.3015.412026472426.1320.3777.956371986219.5274.703406046714696831074507.3111004211074507.311100421

    Query 35246582016557253.155816491616557253.155816491665.629.6428.94178115.4889663.5665.42544958259663.3915.425351331626.421.1380.037878787920.3877.1969696971461845996126.8141287209996126.8141287209

    Query 45246582019202153.65993517319202153.65993517360.8926.2625.95178112.93810297.4845.781435147610297.325.781343071226.0122.0284.659746251421.6883.3525567091454033971086.6785279289971086.6785279289

    Query 5524658202556740.48731536072556740.487315360759.0211.7613.9178097.5121582.4690.88854076751582.3160.888454859526.0510.9942.188099808113.1250.36468330131454033131640.905343964131640.905343964

    Query 6524658205246582010.22884150.549719798543.25131.1625.88178087.879178088.953100.0006030732314.9731.299904863327.24115.17422.797356828225.1492.290748898714540331454033100206001.4167491384

    Query 7524658205246582010.210831002.064391636343.05118.1167.99178100.316178101.371100.00059236285351.8753.004977823825.86100.85389.984532095965.26252.358855375114540331454033100498473.4281890439

    Query 85246582010390021.980340724710390021.98034072474784.133963.793939.89178099.8916407.733.59782926546407.4413.597666996925.8122.8888.64781092622.3586.594343277815228831279158.39952905111279158.3995290511

    Query 95246582014175692.701890487914175692.70189048795587.044898.314896.45178111.5168472.5394.75687321648471.9024.756515575326.9325.2593.761604158925.5594.875603416315100311349188.93478345811349188.9347834581

    Query 105246582024885374.743158498223672344.5119546402667.74609.5612.85178107.77312263.236.885286247411569.1136.495568837425.9132.1123.890389810926.7103.049015824200135466222833.088998747865555532.7555744761

    Query 115246582015154332.888419546313345752.54370369136781.095034.284999.27178105.9578481.8164.7622303847873.6564.420770721326.0130.06115.570934256136.56140.561322568215228831478969.71157994411424969.3569893419

    Query 125246582011731662.23605768488663511.65126743476804.634962.164900.78178115.7937228.2544.05817691875454.843.062524612926.0930.15115.561517822935.69136.79570716751466603801415.4643962954632664.3137781663

    Query 135246582017528283.340895081817528283.340895081860.019.739.8178089.1213536.9731.98606909853536.7381.985937142127.368.8732.41959064338.9132.56578947371475442429322.9097721225429322.9097721225

    Query 14524658205246582010.21736364610.2112106.67178090.844178091.984100.000640122731823.18417.869073606127.06114.81424.279379157436.19133.73983739841454033145403310020084513.8129602286

    Query 15524658207083091.3500389396426340.081260523559.6614.398.99178089.2462824.9491.5862546804336.0820.18871549382713.6650.59259259268.8532.77777777781454315254281.748452020411930.0820317469

    Query 16524658209030591.72123298563342090.637003290958.815.3812.01178092.6293719.7622.08866701611685.5590.946450737225.9714.2154.716981132111.3243.58875625721454337342242.3532372483143000.9832659143

    Query 175246582011549072.201255979611549072.201255979659.4123.1222.3178090.1646347.1093.56398627386346.8753.563854879726.1220.9180.053598774920.0876.8759571211476477800755.4233828228800755.4233828228

    Query 18524658203599750.68611335913599750.686113359160.2917.1916.9178088.1481921.2111.07879778731921.0511.078707944127.3616.2459.356725146215.9458.26023391811454033161111.1080216199161111.1080216199

    Query 19524658203129064110.212142652.31439249471.4865.9428.45178090.04370170.32839.40160090816466.5743.631069930225.9940.24154.828780300124.5194.3055021162150296856077937.31143976451086837.2312251492

    Query 20524658205515901.05133208635515901.051332086363.3925.1625.11178135.7663828.6722.14929998953828.5862.149251711825.9719.5675.317674239519.5575.27916827111454051345132.3735756174345132.3735756174

    3.8794005852.4201398777178100.4875519.81303703363.919719284726.426534.9555131.933425998223.830590.51863124651497555.15333624.822.1846606614102824.36.3697936206

    No ProjectionProjectionOptimized Projection

    Query 1227.91190.26189.73

    Query 261.1824.9424.04

    Query 365.629.6428.94

    Query 460.8926.2625.95

    Query 559.0211.7613.9

    Query 643.25131.1625.88

    Query 743.05118.1167.99

    Query 1360.019.739.8

    Query 1559.6614.398.99

    Query 1658.815.3812.01

    Query 1759.4123.1222.3

    Query 1860.2917.1916.9

    Query 1971.4865.9428.45

    Query 2063.3925.1625.11

    No ProjectionProjectionOptimized Projection

    Query 84784.133963.793939.89

    Query 95587.044898.314896.45

    Query 10667.74609.5612.85

    Query 116781.095034.284999.27

    Query 126804.634962.164900.78

    Query 14002106.67

    Projection

    Optimized Projection

    Size as percentage of the size of the original document

    Projection

    Optimized Projection

    Memory usage as percentage of memory used for the original document

    No Projection

    Projection

    Optimized Projection

    Total Query Execution Time (in seconds)

    No Projection

    Projection

    Optimized Projection

    Parsing+Loading time (in seconds)

    No Projection

    Projection

    Optimized Projection

    No Projection

    Projection

    Optimized Projection

    Total Query Execution Time (in seconds)

    No Projection

    Projection

    Optimized Projection

    Live Memory after loadParsing + Loading TimeTotal Eval Time

    TextDOMSAXDOMSAXDOMSAXDOMSAX

    Default570.7612717.432238.824476.1064613735392.2524489235Default0.310.26Default0.940.73

    10M10546.92345003.70336024.328426.6998346342341.562444326210M6.234.3410M20.5916.14

    20M20935.38589249.75871360.695426.3105646254340.861632112320M12.899.4820M54.2447.58

    50M52465.82222465.039178098.301424.0189879811339.455860977750M34.9927.2650M248.45227.91

    Nodes created

    DOMSAXProj(avg)optim(avg)

    Default16435164353659.71038.0522.26772132646.3160937025

    10M29254729254767409.720966.223.04234875087.1667800388

    20M581797581797133386.341041.0522.92660498427.0541872853

    50M14540331454033333624.8102824.322.94478873597.0716620599

    00

    00

    00

    00

    DOM

    SAX

    Document size as text file

    Percentage of text document size

    Memory requirement for XML files in XQuery implementation as a percentage of the XML text document size

    00

    00

    00

    00

    DOM

    SAX

    Time after document load (query 1)

    00

    00

    00

    00

    DOM

    SAX

    Total Evaluation time

    SizeTimeMemoryParseTimeNodes

    DefaultProjectionOptimizedDefaultProjectionOptimizedDefaultProjectionOptimizedDefaultProjectionOptimizedDefaultProjectionOptimized

    ProjectionOptimizedProjectionOptimizedProjectionOptimizedProjectionOptimized

    500Kb57076181261.42371325381261.4237132530.740.270.252238.773299.83613.392871899299.72313.38782449140.250.13520.1248164355113.1092181325113.109218132

    10Mb116697051655391.41853628691655391.418536286918.88.939.0539885.4571314.573.29586295071314.4573.29557963955.012.3847.504990022.3446.7065868263324272102033.1464326245102033.1464326245

    20Mb235144503318491.41125563223318491.411255632254.6734.8434.879845.072382.1482.98346284872382.0352.983321324710.035.1251.04685942175.0550.3489531406650333204033.1373158059204033.1373158059

    50Mb580057328343571.43840439778343571.4384043977279.76229.63228.72197078.6415585.822.83431018795585.7072.834252850430.7215.6250.846354166715.3950.097656251609991510033.1679059075510033.1679059075

    0

    0

    0

    0

    Projection

    000

    000

    000

    000

    Default

    Projection

    Optimized

    00

    00

    00

    00

    Projection

    Optimized

    00

    00

    00

    00

    00

    00

    00

    00

    Live memory allocated after "SAX parse-load-document":299.906(0.293 M)0.12(0.120s)(0.230s)

    Live memory allocated after "SAX parse-load-document":347.402(0.339 M)0.14(0.140s)(0.300s)

    Live memory allocated after "SAX parse-load-document":372.492(0.364 M)0.15(0.150s)(0.390s)

    Live memory allocated after "SAX parse-load-document":377.473(0.369 M)0.16(0.160s)(0.330s)

    Live memory allocated after "SAX parse-load-document":260.441(0.254 M)0.11(0.110s)(0.230s)

    Live memory allocated after "SAX parse-load-document":2229.484(2.177 M)0.1(0.100s)(0.230s)

    Live memory allocated after "SAX parse-load-document":2241.902(2.189 M)0.1(0.100s)(0.220s)

    Live memory allocated after "SAX parse-load-document":317.277(0.310 M)0.13(0.130s)(0.650s)

    Live memory allocated after "SAX parse-load-document":352.668(0.344 M)0.15(0.150s)(11.940s)

    Live memory allocated after "SAX parse-load-document":396.734(0.387 M)0.16(0.160s)(0.390s)

    Live memory allocated after "SAX parse-load-document":348.164(0.340 M)0.15(0.150s)(0.780s)

    Live memory allocated after "SAX parse-load-document":344.156(0.336 M)0.15(0.150s)(0.780s)

    Live memory allocated after "SAX parse-load-document":272.926(0.267 M)0.1(0.100s)(0.220s)

    Live memory allocated after "SAX parse-load-document":2232.516(2.180 M)0.11(0.110s)(0.210s)

    Live memory allocated after "SAX parse-load-document":264.316(0.258 M)0.11(0.110s)(0.240s)

    Live memory allocated after "SAX parse-load-document":277.77(0.271 M)0.12(0.120s)(0.260s)

    Live memory allocated after "SAX parse-load-document":305.371(0.298 M)0.11(0.110s)(0.210s)

    Live memory allocated after "SAX parse-load-document":255.445(0.249 M)0.13(0.130s)(0.310s)

    Live memory allocated after "SAX parse-load-document":1000.254(0.977 M)

    Live memory allocated after "SAX parse-load-document":325.191(0.318 M)

    104.55

    209

    5027

    10071

    200198

    470751

    480786

    0

    0

    0

    0

    0

    0

    0

    Total number of created nodes:183595.49(5.490s)(29.750s)2.36(2.360s)

    Total number of created nodes:434536(6.000s)(7.890s)2.74(2.740s)

    Total number of created nodes:403246.26(6.260s)(9.720s)2.85(2.850s)

    Total number of created nodes:393246.7(6.700s)(8.450s)3(3.000s)

    Total number of created nodes:52534.26(4.260s)(4.620s)2.07(2.070s)

    Total number of created nodes:82218.47(8.470s)(8.810s)4.05(4.050s)

    Total number of created nodes:1991515.59(15.590s)(16.610s)6.11(6.110s)

    Total number of created nodes:511446.14(6.140s)(8m56.350s)2.65(2.650s)

    Total number of created nodes:2592227.59(7.590s)(1m31.280s)3.28(3.280s)

    Total number of created nodes:569638.76(8.760s)(11m13.480s)3.41(3.410s)

    Total number of created nodes:256338.47(8.470s)(11m3.310s)3.28(3.280s)

    Total number of created nodes:167683.52(3.520s)(3.940s)1.77(1.770s)

    Total number of created nodes:8064211.27(11.270s)(6m25.300s)5.12(5.120s)

    Total number of created nodes:3813.5(3.500s)(3.630s)1.77(1.770s)

    Total number of created nodes:56193.97(3.970s)(4.320s)1.92(1.920s)

    Total number of created nodes:320055.8(5.800s)(6.710s)2.48(2.480s)

    Total number of created nodes:64474.75(4.750s)(5.160s)2.1(2.100s)

    Total number of created nodes:434027.92(7.920s)(9.420s)3.62(3.620s)

    Total number of created nodes:137915.54(5.540s)(7.710s)2.4(2.400s)

    Total number of created nodes:4190.12(0.120s)(0.300s)

    5000.75

    10000160

    20000660

    50000

    0

    0

    0

    0

    Chart2

    4784.133963.793939.89

    5587.044898.314896.45

    667.74609.5612.85

    6781.095034.284999.27

    6804.634962.164900.78

    002106.67

    No Projection

    Projection

    Optimized Projection

    Default

    SizeTimeMemoryParseTimeNodes

    DefaultProjectionOptimizedDefaultProjectionOptimizedDefaultProjectionOptimizedDefaultProjectionOptimizedDefaultProjectionOptimized

    ProjectionOptimizedProjectionOptimizedDefaultProjectionProjectionOptimizedOptimizedProjectionOptimized

    Query 157076181261.42371325381261.4237132530.730.260.252238.824299.90613.3956934533299.79313.39064616070.260.13500.1246.1538461538164355113.1092181325113.109218132

    Query 2570761190213.3325682729190213.33256827290.750.310.312230.949347.40215.57193822347.22715.56409402460.250.15600.14561661012377.44732089112377.447320891

    Query 3570761190213.3325682729190213.33256827290.820.370.392256.012372.49216.5110823879372.31616.50328101090.250.15600.15601650711346.869812806711346.8698128067

    Query 4570761214713.7618197459214713.76181974590.760.310.322253.461377.47316.7508113076377.30916.74353361340.250.17680.16641643511246.839062975411246.8390629754

    Query 557076128730.503363053928730.50336305390.730.220.22238.035260.44111.6370387416260.28911.63024706940.250.11440.140164351470.89443261331470.8944326133

    Query 657076157076110032160.56345826010.851.280.332228.4022229.484100.0485549735262.21911.76713178320.250.662640.228816435164351002381.448128993

    Query 7570761570761100122722.15011186820.981.330.382240.842241.902100.0473929419308.5713.77028257260.260.58223.07692307690.2596.153846153816435164351005623.4195314877

    Query 8570761112391.9691254308112391.96912543081.180.620.632240.414317.27714.1615344307316.98814.14863502910.260.13500.13501719714188.245624236814188.2456242368

    Query 9570761152512.6720466185152512.67204661851.250.730.742252.039352.98815.6741512913352.35215.64591021740.250.14560.14561705715018.799906196915018.7999061969

    Query 10570761280934.9220251559266184.66359824870.850.420.412248.297396.73417.6459782671386.99217.2126725250.260.1973.07692307690.1765.384615384620513540526.349144445532225.9445229854

    Query 11570761165192.8942061563150412.63525363511.460.80.812246.48348.16415.498201631341.25415.19060930880.260.1557.69230769230.1557.69230769231719716589.641216491215989.2923184276

    Query 12570761125632.201096430997561.70929688611.460.790.752256.316344.15615.2530053415324.32814.37422772340.250.15600.1456166339675.81374376247804.6894727349

    Query 13570761180853.1685766897180853.16857668970.720.20.22229.645272.92612.2407827255272.69112.2302429310.250.1400.140166674682.80794384114682.8079438411

    Query 1457076157076110014383125.1998647422.753.152.042231.3672232.516100.0514930982516.56223.1500241780.250.682720.261041643516435100183211.1469425008

    Query 1557076168221.19524634655190.09093123040.720.220.22229.77264.31611.8539580315239.04710.72070213520.250.12480.1144164382511.5269497506170.1034189074

    Query 1657076188621.552663899636080.63213849580.730.240.212233.152277.7712.4384726163257.44511.52832409080.250.13520.140164393482.11691708741610.9793783077

    Query 17570761123142.1574704649123142.15747046490.730.260.252230.688305.37113.6895433158305.13713.67905327860.250.12480.1248166998975.37157913658975.3715791365

    Query 1857076139220.687152766239220.68715276620.710.220.212228.672255.44511.4617583924255.28511.45457922920.250.11440.1144164351851.12564648621851.1256464862

    Query 1957076132789257.4482138759132552.32233807150.770.580.332230.5661000.25444.843057771308.17613.81604489620.250.311240.197616975621936.636229749612107.1281296024

    Query 2057076162171.089247513462171.08924751340.790.30.32276.289325.19114.2860155279325.10514.28223744880.250.12480.1248164534192.54664802774192.5466480277

    19.71555519743.2032321762241.010928.653023223314.34012396130.25250.2287.09230769230.14958.969230769216821.53659.721.80706983151038.055.9104519645

    Default

    Query 10.25

    Query 20.24

    Query 30.25

    Query 40.26

    Query 50.25

    Query 60.25

    Query 70.27

    Query 80.25

    Query 90.25

    Query 100.25

    Query 110.25

    Query 120.25

    Query 130.25

    Query 150.26

    Query 160.25

    Query 170.26

    Query 180.25

    Query 200.24

    Default

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    Projection

    Optimized

    Size as percentage of the size of the original document

    10M

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    Projection

    Optimized

    Memory usage as percentage of memory used for the original document

    20M

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    Default

    Projection

    Optimized

    Total Query Execution Time (in seconds)

    50M

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    Default

    Projection

    Optimized

    Parsing+Loading time (in seconds)

    DOM

    SizeTimeMemoryParseTimeNodes

    DefaultProjectionOptimizedDefaultProjectionOptimizedDefaultProjectionOptimizedDefaultProjectionOptimizedDefaultProjectionOptimized

    ProjectionOptimized ProjectionProjectionOptimized ProjectionProjectionOptimizedProjectionOptimized

    Query 1105469231487811.41065787621487811.410657876216.147.567.5136024.3281207.3983.35161838411207.2853.35130470724.342.4155.52995391712.3654.377880184329254791793.137615494391793.1376154943

    Query 2105469233359463.18525128133359463.185251281311.683.783.7236016.4532150.2275.97012426522150.0515.96963559964.552.7961.31868131872.7460.2197802198295680218337.3839962121218337.3839962121

    Query 3105469233359463.18525128133359463.185251281312.584.834.7336041.5162175.3166.03558407482175.1416.03509852364.362.9868.34862385322.8565.3669724771294263204166.9380112348204166.9380112348

    Query 4105469233869073.66843486013869073.668434860111.424.063.9536038.9652302.9026.39003367612302.7386.3895786134.343.1171.6589861751369.1244239631292547197866.7633576827197866.7633576827

    Query 510546923509840.4834016518509840.483401651810.992.292.3236023.539511.7971.4207293736511.6451.42030742734.262.0447.88732394372.0748.591549295829254726220.89626624126220.896266241

    Query 61054692310546923100568550.539067176317.4226.734.2636013.90636014.98100.0029821814650.5351.80634391624.7113.91295.32908704884.0585.987261146529254729254710041181.4076370634

    Query 710546923105469231002162592.050446371924.3732.266.6536026.34436027.398100.00292563691269.233.52306079134.3512.25281.60919540236.11140.459770114929254729254710099503.4011628901

    Query 8105469232062011.95508206522062011.9550820652164.06131.44131.2436025.9181479.3594.10637419431479.074.1055719944.382.761.64383561642.6560.502283105306311255638.3454397655255638.3454397655

    Query 9105469232804932.65947708162804932.6594770816189.73162.68160.4936037.5431898.3795.26778143561897.7425.26601383454.52.97662.964.4444444444303755269748.8801830423269748.8801830423

    Query 10105469235015904.75579465214767114.519905947934.2726.4825.8236033.8012702.3057.49936150232559.1337.10203455924.343.8288.01843317973.2875.576036866440774113860933.994373879513724233.6591120344

    Query 11105469233014132.85782877152676252.5374699332239.51167.41166.136031.9841904.8795.28663367521786.2074.95728184164.293.2475.52447552453.4179.4871794872306311296469.6783987516285679.3261423847

    Query 12105469232321452.20106850121734911.6449442174239.99167.82163.1336041.821662.094.61155957161310.7073.6366282284.333.2274.36489607393.2875.7505773672295127161685.4783195031127954.3354216998

    Query 13105469233904473.70199915183904473.701999151811.222.042.0436015.148951.0232.6406194416950.7892.6399697154.531.7839.2935982341.7739.072847682129708891073.065421693291073.0654216932

    Query 141054692310546923100348779033.0692657944137.49147.76125.9836016.87136018.012100.00316795986575.52318.25678582684.4613.74308.07174887895.12114.7982062782925472925471004027013.7653095058

    Query 15105469231359971.289447168637330.035394209311.142.51.8636015.273740.5592.0562359752246.1840.68355444654.542.27501.7738.98678414129256849021.67550791611010.0345218889

    Query 16105469231747431.6568149782617890.58584859311.262.642.1436018.656921.592.5586462749518.5391.43964005764.552.3351.20879120881.9242.197802197829257566552.274630436627200.9296761514

    Query 17105469232249142.13250822062249142.132508220611.323.092.9936016.1911444.1644.00976327561443.934.00911356784.532.5756.73289183222.4854.7461368653297319161535.4328852176161535.4328852176

    Query 1810546923714700.6776383975714700.677638397511.32.392.3536014.176572.7071.5902265819572.5471.58978231244.552.1346.81318681322.146.153846153829254732471.10990712632471.109907126

    Query 1910546923633802960.09363109982410252.285263673612.289.884.3636016.0714346.72739.83423788331478.5744.10531743194.536.01132.67108167773.6279.911699779230230711269837.2793220137216867.173502433

    Query 20105469231105651.0483152291105651.04831522911.963.593.5536061.7931001.0942.77605165111001.0082.77581317164.392.4255.1252847382.454.669703872429256569952.390921675569952.3909216755

    19.84813011343.568781150736026.5147520.27073285074.45314182834.44154.434599.357503771867.5212592821301571.9567409.722.236227894320966.26.4188245718

    DOM

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    Projection

    Optimized Projection

    Size as percentage of the size of the original document

    Query 1

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    Projection

    Optimized Projection

    Memory usage as percentage of memory used for the original document

    Sheet3

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    Default

    Projection

    Optimized

    Total QueryExecution Time, in Seconds (10M document)

    Sheet1

    SizeTimeMemoryParseTimeNodes

    DefaultProjectionOptimizedNo ProjectionProjectionOptimized ProjectionDefaultProjectionOptimizedDefaultProjectionOptimizedDefaultProjectionOptimized

    ProjectionOptimizedProjectionOptimizedProjectionOptimizedProjectionOptimized

    Query 1209353852991321.42883448292991321.428834482947.5829.9429.7571360.6952168.7033.03907213912168.593.03891378869.485.5958.96624472575.4957.9113924051581797183593.1555680074183593.1555680074

    Query 2209353856694503.19769614946694503.197696149423.328.037.8971352.824044.8835.66884812684044.7075.66860146529.096.1667.7667766777666.0066006601588052434537.3893125098434537.3893125098

    Query 3209353856694503.19769614946694503.197696149424.939.989.7271377.8834069.9735.7020085624069.7975.7017619879.186.4870.58823529416.2668.1917211329584923403246.8938988551403246.8938988551

    Query 4209353857730053.69233716037730053.692337160323.018.558.4571375.3324322.8556.05651123284322.6916.05628146159.096.7674.36743674376.773.7073707371581797393246.759058572393246.759058572

    Query 5209353851020800.48759552311020800.487595523122.334.234.6271359.906779.0081.0916606308778.8551.09144622479.073.8842.77839029774.2646.968026460958179752530.902892245952530.9028922459

    Query 620935385209353851001144860.546854046440.6363.368.8171350.27371351.348100.00150665161065.0311.49267964268.9431.95357.38255033568.4794.742729306558179758179710082211.4130358183

    Query 720935385209353851004327662.067150902663.1483.0816.6171362.71171363.766100.00147836312288.9453.20748044459.0328.62316.943521594715.59172.6467331118581797581797100199153.4230152441

    Query 8209353854146691.98070873794146691.9807087379659.66544.19536.3571362.2852711.3013.79934723222711.0123.79894225649.076.3369.79051819186.1467.6957001103609331511448.3934675899511448.3934675899

    Query 9209353855653782.70058563535653782.7005856353766.01662.82663.0771373.913542.54.96329821363541.8634.96240573069.436.9173.2767762466.8672.7465535525604208539558.9298718322539558.9298718322

    Query 10209353859897134.72746500729416924.4980878068109.6992.6691.2871370.1685049.737.07540719254774.096.68919540729.218.8696.19978284477.5982.410423452879781126186332.822686074825922232.4916552918

    Query 11209353856027702.87919233395325622.543836667953.21680.9673.4871368.3523541.0744.96168665913300.0594.62398094899.47.8583.51063829798.7693.1914893617609331591229.7027723848569639.348449365

    Query 12209353854654692.22336011493447941.6469436793955.81675.24663.3171378.1883045.2034.26629350692338.0433.27557068279.127.885.52631578958.4792.8728070175587179323815.5146726978256334.3654490368

    Query 13209353856922763.30672686466922763.306726864622.43.943.9471351.5161531.912.14699012141531.6762.14666216769.053.5339.00552486193.5238.8950276243590159167682.8412682006167682.8412682006

    Query 142093538520935385100694809433.1882790787420.18448.87385.371353.23871354.379100.001599086512895.94918.07339002618.9732.46361.87290969911.27125.6410256415817975817971008064213.8608483715

    Query 15209353852780671.3282153636134600.064293061722.325.13.6371351.6411256.3871.7608382686269.1760.37725271194.7552.77777777783.538.888888888958188499841.71580589953810.0654769679

    Query 16209353853557241.69915193821299280.620614333122.55.494.3271355.0231615.5592.2641139083810.271.1355472489.044.9554.75663716813.9743.9159292035581897134972.319482657656190.9656348117

    Query 17209353854621682.2075925524621682.20759255222.76.876.7171352.5592684.3753.76212855942684.1413.76180061049.115.9465.20307354565.863.6663007684590733320055.4178452871320055.4178452871

    Query 18209353851430540.68331200981430540.683312009822.645.255.1671350.543908.1761.2728368444908.0161.27261259959.14.8252.9670329674.7552.197802197858179764471.108118467464471.1081184674

    Query 19209353851255191459.95549644833002.308531703625.5821.419.4271352.43828275.06239.62732429692722.663.81579112979.1113.03143.02963776077.9286.937431394160134222466537.3606034503434027.2175234725

    Query 20209353852190081.04611403132190081.046114031324.437.677.7171398.161699.3672.38012716291699.2812.38000671179.25.4959.67391304355.5460.2173913043581815137912.3703410878137912.3703410878

    19.83710402273.570689528871362.8820519.99215383794.12851616229.134510.108111.31918469316.84374.9725672166599062.2133386.322.17988329141041.056.3656365517

    Sheet1

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    Projection

    Optimized

    Size of projected document as a percentage of the size of the original document (20M document)

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    Projection

    Optimized

    SAX Loading memory required as a percentage of the memory required by the total document (20M document)

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    No Projection

    Projection

    Optimized Projection

    SizeTimeMemoryParseTimeNodes

    DefaultProjectionOptimizedNo ProjectionProjectionOptimized ProjectionProjectionOptimizedNo ProjectionProjectionOptimized ProjectionDefaultProjectionOptimized

    ProjectionOptimized ProjectionProjectionOptimized ProjectionNo ProjectionProjectionProjectionOptimized ProjectionOptimized ProjectionProjectionOptimized

    Query 1524658207494181.42839280897494181.4283928089227.91190.26189.73178098.3015051.3712.83628253145051.2582.836219083327.2619.6472.046955245819.3170.83639031551454033459033.1569434807459033.1569434807

    Query 25246582016557253.155816491616557253.155816491661.1824.9424.04178090.4269638.4775.41212529869638.3015.412026472426.1320.3777.956371986219.5274.703406046714696831074507.3111004211074507.311100421

    Query 35246582016557253.155816491616557253.155816491665.629.6428.94178115.4889663.5665.42544958259663.3915.425351331626.421.1380.037878787920.3877.1969696971461845996126.8141287209996126.8141287209

    Query 45246582019202153.65993517319202153.65993517360.8926.2625.95178112.93810297.4845.781435147610297.325.781343071226.0122.0284.659746251421.6883.3525567091454033971086.6785279289971086.6785279289

    Query 5524658202556740.48731536072556740.487315360759.0211.7613.9178097.5121582.4690.88854076751582.3160.888454859526.0510.9942.188099808113.1250.36468330131454033131640.905343964131640.905343964

    Query 6524658205246582010.22884150.549719798543.25131.1625.88178087.879178088.953100.0006030732314.9731.299904863327.24115.17422.797356828225.1492.290748898714540331454033100206001.4167491384

    Query 7524658205246582010.210831002.064391636343.05118.1167.99178100.316178101.371100.00059236285351.8753.004977823825.86100.85389.984532095965.26252.358855375114540331454033100498473.4281890439

    Query 85246582010390021.980340724710390021.98034072474784.133963.793939.89178099.8916407.733.59782926546407.4413.597666996925.8122.8888.64781092622.3586.594343277815228831279158.39952905111279158.3995290511

    Query 95246582014175692.701890487914175692.70189048795587.044898.314896.45178111.5168472.5394.75687321648471.9024.756515575326.9325.2593.761604158925.5594.875603416315100311349188.93478345811349188.9347834581

    Query 105246582024885374.743158498223672344.5119546402667.74609.5612.85178107.77312263.236.885286247411569.1136.495568837425.9132.1123.890389810926.7103.049015824200135466222833.088998747865555532.7555744761

    Query 115246582015154332.888419546313345752.54370369136781.095034.284999.27178105.9578481.8164.7622303847873.6564.420770721326.0130.06115.570934256136.56140.561322568215228831478969.71157994411424969.3569893419

    Query 125246582011731662.23605768488663511.65126743476804.634962.164900.78178115.7937228.2544.05817691875454.843.062524612926.0930.15115.561517822935.69136.79570716751466603801415.4643962954632664.3137781663

    Query 135246582017528283.340895081817528283.340895081860.019.739.8178089.1213536.9731.98606909853536.7381.985937142127.368.8732.41959064338.9132.56578947371475442429322.9097721225429322.9097721225

    Query 14524658205246582010.21736364610.2112106.67178090.844178091.984100.000640122731823.18417.869073606127.06114.81424.279379157436.19133.73983739841454033145403310020084513.8129602286

    Query 15524658207083091.3500389396426340.081260523559.6614.398.99178089.2462824.9491.5862546804336.0820.18871549382713.6650.59259259268.8532.77777777781454315254281.748452020411930.0820317469

    Query 16524658209030591.72123298563342090.637003290958.815.3812.01178092.6293719.7622.08866701611685.5590.946450737225.9714.2154.716981132111.3243.58875625721454337342242.3532372483143000.9832659143

    Query 175246582011549072.201255979611549072.201255979659.4123.1222.3178090.1646347.1093.56398627386346.8753.563854879726.1220.9180.053598774920.0876.8759571211476477800755.4233828228800755.4233828228

    Query 18524658203599750.68611335913599750.686113359160.2917.1916.9178088.1481921.2111.07879778731921.0511.078707944127.3616.2459.356725146215.9458.26023391811454033161111.1080216199161111.1080216199

    Query 19524658203129064110.212142652.31439249471.4865.9428.45178090.04370170.32839.40160090816466.5743.631069930225.9940.24154.828780300124.5194.3055021162150296856077937.31143976451086837.2312251492

    Query 20524658205515901.05133208635515901.051332086363.3925.1625.11178135.7663828.6722.14929998953828.5862.149251711825.9719.5675.317674239519.5575.27916827111454051345132.3735756174345132.3735756174

    3.8794005852.4201398777178100.4875519.81303703363.919719284726.426534.9555131.933425998223.830590.51863124651497555.15333624.822.1846606614102824.36.3697936206

    No ProjectionProjectionOptimized Projection

    Query 1227.91190.26189.73

    Query 261.1824.9424.04

    Query 365.629.6428.94

    Query 460.8926.2625.95

    Query 559.0211.7613.9

    Query 643.25131.1625.88

    Query 743.05118.1167.99

    Query 1360.019.739.8

    Query 1559.6614.398.99

    Query 1658.815.3812.01

    Query 1759.4123.1222.3

    Query 1860.2917.1916.9

    Query 1971.4865.9428.45

    Query 2063.3925.1625.11

    No ProjectionProjectionOptimized Projection

    Query 84784.133963.793939.89

    Query 95587.044898.314896.45

    Query 10667.74609.5612.85

    Query 116781.095034.284999.27

    Query 126804.634962.164900.78

    Query 14002106.67

    Projection

    Optimized Projection

    Size as percentage of the size of the original document

    Projection

    Optimized Projection

    Memory usage as percentage of memory used for the original document

    No Projection

    Projection

    Optimized Projection

    Total Query Execution Time (in seconds)

    No Projection

    Projection

    Optimized Projection

    Parsing+Loading time (in seconds)

    No Projection

    Projection

    Optimized Projection

    No Projection

    Projection

    Optimized Projection

    Total Query Execution Time (in seconds)

    No Projection

    Projection

    Optimized Projection

    Live Memory after loadParsing + Loading TimeTotal Eval Time

    TextDOMSAXDOMSAXDOMSAXDOMSAX

    Default570.7612717.432238.824476.1064613735392.2524489235Default0.310.26Default0.940.73

    10M10546.92345003.70336024.328426.6998346342341.562444326210M6.234.3410M20.5916.14

    20M20935.38589249.75871360.695426.3105646254340.861632112320M12.899.4820M54.2447.58

    50M52465.82222465.039178098.301424.0189879811339.455860977750M34.9927.2650M248.45227.91

    Nodes created

    DOMSAXProj(avg)optim(avg)

    Default16435164353659.71038.0522.26772132646.3160937025

    10M29254729254767409.720966.223.04234875087.1667800388

    20M581797581797133386.341041.0522.92660498427.0541872853

    50M14540331454033333624.8102824.322.94478873597.0716620599

    00

    00

    00

    00

    DOM

    SAX

    Document size as text file

    Percentage of text document size

    Memory requirement for XML files in XQuery implementation as a percentage of the XML text document size

    00

    00

    00

    00

    DOM

    SAX

    Time after document load (query 1)

    00

    00

    00

    00

    DOM

    SAX

    Total Evaluation time

    SizeTimeMemoryParseTimeNodes

    DefaultProjectionOptimizedDefaultProjectionOptimizedDefaultProjectionOptimizedDefaultProjectionOptimizedDefaultProjectionOptimized

    ProjectionOptimizedProjectionOptimizedProjectionOptimizedProjectionOptimized

    500Kb57076181261.42371325381261.4237132530.740.270.252238.773299.83613.392871899299.72313.38782449140.250.13520.1248164355113.1092181325113.109218132

    10Mb116697051655391.41853628691655391.418536286918.88.939.0539885.4571314.573.29586295071314.4573.29557963955.012.3847.504990022.3446.7065868263324272102033.1464326245102033.1464326245

    20Mb235144503318491.41125563223318491.411255632254.6734.8434.879845.072382.1482.98346284872382.0352.983321324710.035.1251.04685942175.0550.3489531406650333204033.1373158059204033.1373158059

    50Mb580057328343571.43840439778343571.4384043977279.76229.63228.72197078.6415585.822.83431018795585.7072.834252850430.7215.6250.846354166715.3950.097656251609991510033.1679059075510033.1679059075

    0

    0

    0

    0

    Projection

    000

    000

    000

    000

    Default

    Projection

    Optimized

    00

    00

    00

    00

    Projection

    Optimized

    00

    00

    00

    00

    00

    00

    00

    00

    Live memory allocated after "SAX parse-load-document":299.906(0.293 M)0.12(0.120s)(0.230s)

    Live memory allocated after "SAX parse-load-document":347.402(0.339 M)0.14(0.140s)(0.300s)

    Live memory allocated after "SAX parse-load-document":372.492(0.364 M)0.15(0.150s)(0.390s)

    Live memory allocated after "SAX parse-load-document":377.473(0.369 M)0.16(0.160s)(0.330s)

    Live memory allocated after "SAX parse-load-document":260.441(0.254 M)0.11(0.110s)(0.230s)

    Live memory allocated after "SAX parse-load-document":2229.484(2.177 M)0.1(0.100s)(0.230s)

    Live memory allocated after "SAX parse-load-document":2241.902(2.189 M)0.1(0.100s)(0.220s)

    Live memory allocated after "SAX parse-load-document":317.277(0.310 M)0.13(0.130s)(0.650s)

    Live memory allocated after "SAX parse-load-document":352.668(0.344 M)0.15(0.150s)(11.940s)

    Live memory allocated after "SAX parse-load-document":396.734(0.387 M)0.16(0.160s)(0.390s)

    Live memory allocated after "SAX parse-load-document":348.164(0.340 M)0.15(0.150s)(0.780s)

    Live memory allocated after "SAX parse-load-document":344.156(0.336 M)0.15(0.150s)(0.780s)

    Live memory allocated after "SAX parse-load-document":272.926(0.267 M)0.1(0.100s)(0.220s)

    Live memory allocated after "SAX parse-load-document":2232.516(2.180 M)0.11(0.110s)(0.210s)

    Live memory allocated after "SAX parse-load-document":264.316(0.258 M)0.11(0.110s)(0.240s)

    Live memory allocated after "SAX parse-load-document":277.77(0.271 M)0.12(0.120s)(0.260s)

    Live memory allocated after "SAX parse-load-document":305.371(0.298 M)0.11(0.110s)(0.210s)

    Live memory allocated after "SAX parse-load-document":255.445(0.249 M)0.13(0.130s)(0.310s)

    Live memory allocated after "SAX parse-load-document":1000.254(0.977 M)

    Live memory allocated after "SAX parse-load-document":325.191(0.318 M)

    104.55

    209

    5027

    10071

    200198

    470751

    480786

    0

    0

    0

    0

    0

    0

    0

    Total number of created nodes:183595.49(5.490s)(29.750s)2.36(2.360s)

    Total number of created nodes:434536(6.000s)(7.890s)2.74(2.740s)

    Total number of created nodes:403246.26(6.260s)(9.720s)2.85(2.850s)

    Total number of created nodes:393246.7(6.700s)(8.450s)3(3.000s)

    Total number of created nodes:52534.26(4.260s)(4.620s)2.07(2.070s)

    Total number of created nodes:82218.47(8.470s)(8.810s)4.05(4.050s)

    Total number of created nodes:1991515.59(15.590s)(16.610s)6.11(6.110s)

    Total number of created nodes:511446.14(6.140s)(8m56.350s)2.65(2.650s)

    Total number of created nodes:2592227.59(7.590s)(1m31.280s)3.28(3.280s)

    Total number of created nodes:569638.76(8.760s)(11m13.480s)3.41(3.410s)

    Total number of created nodes:256338.47(8.470s)(11m3.310s)3.28(3.280s)

    Total number of created nodes:167683.52(3.520s)(3.940s)1.77(1.770s)

    Total number of created nodes:8064211.27(11.270s)(6m25.300s)5.12(5.120s)

    Total number of created nodes:3813.5(3.500s)(3.630s)1.77(1.770s)

    Total number of created nodes:56193.97(3.970s)(4.320s)1.92(1.920s)

    Total number of created nodes:320055.8(5.800s)(6.710s)2.48(2.480s)

    Total number of created nodes:64474.75(4.750s)(5.160s)2.1(2.100s)

    Total number of created nodes:434027.92(7.920s)(9.420s)3.62(3.620s)

    Total number of created nodes:137915.54(5.540s)(7.710s)2.4(2.400s)

    Total number of created nodes:4190.12(0.120s)(0.300s)

    5000.75

    10000160

    20000660

    50000

    0

    0

    0

    0

  • Improvements Complete XQuery implementation with projection available in GalaxDemo at VLDB 2003. Galax uses a more recent pure streaming algorithm for applying projection to a document.Better performance.Can be used as a stand-alone operation, without loading.

  • Conclusion and Future WorkMain contributions:Definition of a notion of projection for XML.Static analysis to infer projection paths from any XQuery expression.Full implementation in Galax.Experimental results:Dramatic increase in the size main-memory XQuery processor can handle.Projection helps reducing query processing time.Future work:Define loading algorithm for backward axis.Combine projection with other optimizations.Pushing down query operations during projection (e.g., predicate evaluation)

  • AdvertismentXQuery from the ExpertsJust released by Addison WesleyAsk Jerome for 20% discount flyers!