indexing and querying xml data for regular path expressions
DESCRIPTION
Indexing and Querying XML Data for Regular Path Expressions. Quanzhong Li and Bongki Moon Dept. of Computer Science University of Arizona VLDB 2001. Querying XML. XML has tree structured data model. Queries involve navigating data using regular path expressions.(e.g., XPath) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/1.jpg)
Indexing and Querying XML Data for Regular Path
Expressions
Quanzhong Li and Bongki Moon Dept. of Computer Science
University of ArizonaVLDB 2001.
![Page 2: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/2.jpg)
Querying XML
XML has tree structured data model.Queries involve navigating data using regular path expressions.(e.g., XPath)e.g. /chapter/-*/figure[@caption=“Tree Frogs”] Accessing all elements with same name
string. Ancestor-descendant relationship between
elements.
![Page 3: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/3.jpg)
Contribution
New system for Indexing XML data.Querying XML data based on a numbering scheme for elementsJoin algorithms for processing complex regular path expressions.
![Page 4: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/4.jpg)
Outline
Numbering schemeIndex structureJoin algorithmsExperimental results
![Page 5: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/5.jpg)
Path expression evaluation
Previous approaches Conventional tree traversals
Disadvantage: Overhead of traversing for long or unknown path lengths.
New approach Indexing for efficient element access. Numbering scheme for ancestor-
descendant relationship.
![Page 6: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/6.jpg)
Dietz’s Numbering Scheme
for two given nodes x and y, x is an ancestor of y, if and only if x occurs before y
in the preorder traversal of T and
after y in postorder traversal.
(1,7)
(2,4)
(3,1) (4,2) (5,3)
(6,6)
(7,5)
![Page 7: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/7.jpg)
Proposed numbering scheme
This associates with each nodea pair of numbers <order,
size>as follows:
For a tree node y and its parent x,
order(x) < order(y) order(y)+size(y) =<
order(x) + size(x)For two sibling nodes x and y, if x is the predecessor of y in preorder traversal then
order(x) + size(x) < order(y)
(1,100)
(10,30)
(11,5) (17,5)(25,5)
(41,10)
(45,5)
![Page 8: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/8.jpg)
Advantages
Efficient Updates• Extra space can be reserved to
accommodate future insertions.
![Page 9: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/9.jpg)
Ancestor–descendant relationship
For two given nodes x and y of a tree T, x is an ancestor of y if and only if order(x) < order(y) =< order(x) +
size(x).
![Page 10: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/10.jpg)
Outline
Numbering schemeIndex structureJoin algorithmsExperimental results
![Page 11: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/11.jpg)
Index and Data Organization
XML Raw
Data
Document
Loader
Element
Index
Attribute
Index
Structure
Index
Name
Index
Value
Table
Paged File
Query
ProcessorQuery
XISS
Result
![Page 12: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/12.jpg)
Element Index
Element nid
Document ID list
Element list with the
Same name in the
Same Document
B+-tree<Order, Size>
Depth,
Parent ID
Element
Record
Element nid
B+-tree
![Page 13: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/13.jpg)
Structure Index
Document ID
(did)
Array of All Elements
And Attributes in the
Same Document
nid,
<order,size>,
Parent order,
Child order,
Sibling order,
Attribute order
B+-tree
![Page 14: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/14.jpg)
Outline
Numbering schemeIndex structureJoin algorithmsExperimental results
![Page 15: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/15.jpg)
Regular Path expression
complex regular path expressions. e.g.,
/chapter/_*/figure[@caption=“Tree Frogs”]
Symbol Function of symbol
__ Any single node
/ Union of node
* Zero or more occurrences of a node
@ Denotes attributes
![Page 16: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/16.jpg)
Regular expression Decomposition
A regular path expression can be decomposed to a combination of following basic subexpressions:
1. A subexpression with a single element or a single attribute,
2. A subexpression with an element and an attribute ( e.g., figure[@caption = “Tree Frogs”])
3. A subexpression with two elements (e.g., chapter/figure or chapter/_*/figure),
4. A subexpression with a Kleene closure (+,*) of another subexpression, and
5. A subexpression that is a union of two other subexpressions.
![Page 17: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/17.jpg)
Example ( E1 / E2 ) * / E3 / ( ( E4 [ @A = v ] ) | ( E5 / _* / E6 ) )
*
[ ]
E1 E2 E3 E4 @A=v E5 E6
/
/
/
/
/_*/EE-Join
KC-Join
EE-Join
EE-Join
Union
EA-Join EE-Join
![Page 18: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/18.jpg)
Join algorithms
Element – Attribute joinElement – Element joinKleene – Closure join
![Page 19: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/19.jpg)
EA-Join Algorithm
Input: {E1..Em}: Ei is a set of elements having a common
document identifier; {A1..An}: Aj is a set of attributes having a common
document identifier;Output:
A set of (e,a) pairs such that the element e is the parent of the attribute a.
//Sort-merge {Ei} and {Aj} by document identifier.For each Ei and Aj with the same did do
//Sort-merge Ei and Aj by PARENT-CHILD relationship.For each e in Ei and a in Aj do
If ( e is a parent of a) then output (e,a);End
End.
![Page 20: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/20.jpg)
Example
chapter chapter chapter appendix
Figure Figure Figure
book
![Page 21: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/21.jpg)
Attribute-element position
chapter <1,3>
chapter<2,1>
name <3,0>
name <4,0>
chapter <1,3>
name<2,0>
chapter <3,1>
name <4,0>
![Page 22: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/22.jpg)
EE-Join Algorithm
Input: {E1..Em} and {F1..Fn}: Ei and Fj is a set of elements
having a common document identifier.Output:
A set of (e,f) pairs such that the element e is an ancestor of the element f.
//Sort-merge {Ei} and {Fj} by doc. identifier.For each Ei and Fj with the same did do
//Sort-merge Ei and Fj by ANCESTOR-DESCENDANT relationship.For each e in Ei and f in Fj do
If (e is an ancestor of f ) then output (e,f)End
End
![Page 23: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/23.jpg)
Extreme case of EE-Join
chapter <1,90>
chapter <2,80>
chapter <8,20>
chapter <9,10>
figure <10,0>
figure <11,0>
figure <19,0>
![Page 24: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/24.jpg)
KC-Join Algorithm
Input: {E1..Em}: where Ei is a group of elements from an
XML document.Output:
A Kleene Closure of {E1..Em}//Apply EE-Join algorithm repeatedly.Set x = 1;Set Ki = {E1..Em};Repeat
Set I = I +1;Set Ki = EE-Join(Ei-1, E1);
Until ( Ki is empty);Output union of K1,K2..Ki-1.
![Page 25: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/25.jpg)
Outline
Numbering schemeIndex structureJoin algorithmsExperimental results
![Page 26: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/26.jpg)
Experiment Results
Comparison with top-down and bottom-up evaluation methods.Comparison for EE-Join ( E1 /_*/ E2 ) EA-Join ( E[@A] )
Scalability test
![Page 27: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/27.jpg)
EE-Join performance
![Page 28: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/28.jpg)
EA-Join performance
![Page 29: Indexing and Querying XML Data for Regular Path Expressions](https://reader036.vdocuments.mx/reader036/viewer/2022081515/5681592d550346895dc65bbf/html5/thumbnails/29.jpg)
Results
EE-Join algorithm outperformed bottom-up.EA-Join algorithm is comparable with top-down but outperformed bottom-up.Both are linearly scalable.