on the use of query-driven xml auto-indexing€¦ · isp: 10 tables each having 8 columns and max 3...
TRANSCRIPT
![Page 1: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/1.jpg)
University of Kaiserslautern
On the Use of Query-driven XML Auto-Indexing
Karsten Schmidt and Theo Härder
SMDB'10 (ICDE), Long Beach – March, 1
![Page 2: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/2.jpg)
University of Kaiserslautern
Motivation – Self-Tuning '10● The last 10+ years
– Index tuning
What-if
Wizards, Guides, Druids
– Monitoring
– Workload analysis
– Static resources
– Single-user optimization
– …● Today: dynamic environments (queries, data, resources)
– Self-tuning has to be done Permanently and Online1
![Page 3: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/3.jpg)
University of Kaiserslautern
Motivation – Index Tuning● Which column(s) to index?
2
ID Name Age DepID Location
1 Norman 28 12A North
2 Charles 58 98 West
![Page 4: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/4.jpg)
University of Kaiserslautern
Motivation – Index Tuning● Which column(s) to index?
2
PK?
ID Name Age DepID Location
1 Norman 28 12A North
2 Charles 58 98 West
![Page 5: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/5.jpg)
University of Kaiserslautern
Motivation – Index Tuning● Which column(s) to index?
2
ID Name Age DepID Location
1 Norman 28 12A North
2 Charles 58 98 West
query-supportingcandidate?
![Page 6: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/6.jpg)
University of Kaiserslautern
Motivation – Index Tuning● Which column(s) to index?
2
ID Name Age DepID Location
1 Norman 28 12A North
2 Charles 58 98 West
multiple candidates?
![Page 7: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/7.jpg)
University of Kaiserslautern
Motivation – Index Tuning● Which column(s) to index?
2
ID Name Age DepID Location
1 Norman 28 12A North
2 Charles 58 98 West
multi-column indexor
multiple indexes ?
![Page 8: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/8.jpg)
University of Kaiserslautern
ID Name Age DepID Location
1 Norman 28 12A North
2 Charles 58 98 West
ID Name Age DepID Location
1 Norman 28 12A North
2 Charles 58 98 West
Motivation – Index Tuning● Which column(s) to index?
2
ID Name Age DepID Location
1 Norman 28 12A North
2 Charles 58 98 West
ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes
redundancyand
maintenance!
![Page 9: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/9.jpg)
University of Kaiserslautern
ID Name Age DepID Location
1 Norman 28 12A North
2 Charles 58 98 West
ID Name Age DepID Location
1 Norman 28 12A North
2 Charles 58 98 West
Motivation – Index Tuning● Which column(s) to index?
2
ID Name Age DepID Location
1 Norman 28 12A North
2 Charles 58 98 West
ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes
● XML specifics on top of the ISP
– Semi-structured and schema changes
– Indexing may focus on structure and/or content
– Flexible path expressions (axes, wildcard, name)
→ containment problem and generalized indexes
→ descendant axis is evil, but wildcards too
– Node types (elements, attributes, text, …)
–
![Page 10: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/10.jpg)
University of Kaiserslautern
Outline
● XML Index Types● Statistics for Cost-based Index Selection● Index Configuration Management● Evaluation● Summary & Outlook
![Page 11: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/11.jpg)
University of Kaiserslautern
Outline
● XML Index Types● Statistics for Cost-based Index Selection● Index Configuration Management● Evaluation● Summary & Outlook
![Page 12: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/12.jpg)
University of Kaiserslautern
XML Index Types - Storage● XML Mapping
– Native
– No Shredding or Blob
● Fundamentals– Node labeling
– B-tree
4
![Page 13: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/13.jpg)
University of Kaiserslautern
XML Index Types - Storage
5
bib
publication
book book book . . .
year id title
1994 1 TCP/IP
author
last first
Stevens W.
price
65.95
Structure
content
. . .
● XML Mapping Elementless
![Page 14: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/14.jpg)
University of Kaiserslautern
XML Index Types - Storage
5
1
1.3
bib
publication
1.3.3 book 1.3.5 book 1.3.7 book . . .
1.3.3.1year
1.3.3.2.1id
1.3.3.3title
1994 1 TCP/IP
1.3.3.5author
1.3.3.5.3last
1.3.5.5first
Stevens W.
1.3.3.7price
65.95
1.3.3.1.3 1.3.3.2.1.3 1.3.3.3.3
1.3.3.5.3.3 1.3.3.5.5.3
1.3.3.7.3
Structure
content
. . .
● XML Mapping Elementless
![Page 15: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/15.jpg)
University of Kaiserslautern
● XML Mapping Elementless 1
1.3
bib
publication
1.3.3 book 1.3.5 book 1.3.7 book . . .
1.3.3.1year
1.3.3.2.1id
1.3.3.3title
1994 1 TCP/IP
1.3.3.5author
1.3.3.5.3last
1.3.5.5first
Stevens W.
1.3.3.7price
65.95
1.3.3.1.3 1.3.3.2.1.3 1.3.3.3.3
1.3.3.5.3.3 1.3.3.5.5.3
1.3.3.7.3
Structure
content
. . .
XML Index Types - Storage
5
![Page 16: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/16.jpg)
University of Kaiserslautern
● XML Mapping Elementless
Path Synopsis
bib
last
id titleyear price
book
publication
type author paperpublisher
name
journal
first loc titlename
1
1817
16
1514
13
12
11
109
87654
3
2
1
1.3
bib
publication
1.3.3 book 1.3.5 book 1.3.7 book . . .
1.3.3.1year
1.3.3.2.1id
1.3.3.3title
1994 1 TCP/IP
1.3.3.5author
1.3.3.5.3last
1.3.5.5first
Stevens W.
1.3.3.7price
65.95
1.3.3.1.3 1.3.3.2.1.3 1.3.3.3.3
1.3.3.5.3.3 1.3.3.5.5.3
1.3.3.7.3
Structure
content
. . .
XML Index Types - Storage
5
![Page 17: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/17.jpg)
University of Kaiserslautern
● XML Mapping Elementless
content
4
1.3.3.5.3.3
. . .
1994
1.3.3.7.3
1.3.3.3.3 TCP/IP10
1.3.3.2.1.3
W.
DeweyID PCR
1.3.3.1.3 1.3.3.9.3
1.3.7.1.3 . . .
document container
documentindex1.3.9
1.3.3.9.3 1.3.11.1.3
1.3.11
. . .
9 Stevens
1.3.3.1.3
5 1 . . .7
1.3.3.5.5.311
. . .13
65.95 4
. . .
Path Synopsis + B-tree layout
bib
last
id titleyear price
book
publication
type author paperpublisher
name
journal
first loc titlename
1
1817
16
1514
13
12
11
109
87654
3
2
bib1
1994 1 TCP/IP
Stevens W.
65.95
1.3.3.1.3 1.3.3.2.1.3 1.3.3.3.3
1.3.3.5.3.3 1.3.3.5.5.3
1.3.3.7.3
Structure
content
XML Index Types - Storage
5
![Page 18: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/18.jpg)
University of Kaiserslautern
● XML Mapping Elementless
content
4
1.3.3.5.3.3
. . .
1994
1.3.3.7.3
1.3.3.3.3 TCP/IP10
1.3.3.2.1.3
W.
1.3.3.1.3 1.3.3.9.3
1.3.7.1.3 . . .
documentindex1.3.9
1.3.3.9.3 1.3.11.1.3
1.3.11
. . .
9 Stevens
1.3.3.1.3
5 1 . . .7
1.3.3.5.5.311
. . .13
65.95 4
. . .
Path Synopsis + B-tree layout
bib
last
id titleyear price
book
publication
type author paperpublisher
name
journal
first loc titlename
1
1817
16
1514
13
12
11
109
87654
3
2
bib1
1994 1 TCP/IP
Stevens W.
65.95
1.3.3.1.3 1.3.3.2.1.3 1.3.3.3.3
1.3.3.5.3.3 1.3.3.5.5.3
1.3.3.7.3 content
DeweyID PCR
document container
Primary target: space efficiencySecondary: more indexing options for free!
XML Index Types - Storage
5
![Page 19: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/19.jpg)
University of Kaiserslautern
XML Index Types● Content
6
● Element
![Page 20: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/20.jpg)
University of Kaiserslautern
XML Index Types
● Content
● Path
6
● Element
● Content And Structure
● Content
● Path
● Element
● Content And StructurePath: /bib//title PCRs: → 5,11,33 Path: /bib//title PCRs: → 5,11,33
![Page 21: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/21.jpg)
University of Kaiserslautern
XML Index Types
● Content
● Path
● Element
● Content And Structure
Path: /bib//title PCRs: → 5,11,33 Path: /bib//title PCRs: → 5,11,33
● Content
● Path
● Element
● Content And Structure
●Easy to define●Generic●Large●Maintenance costs
●Easy to define●Generic●False positive filtering●Large●Maintenance costs
●Specific●Clustering●Small●Hard to define
●Specific●Clustering●Medium size●No document order●Type support●Hard to define
6
![Page 22: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/22.jpg)
University of Kaiserslautern
XML Index Types – Usage Sample● Simplified Query Graph Model for XMark 01
7
Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()
No Indexes: 3 x document scans 1 x document navigational access
Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()
![Page 23: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/23.jpg)
University of Kaiserslautern
XML Index Types – Usage Sample
7
Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()
● Simplified Query Graph Model for XMark 01
![Page 24: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/24.jpg)
University of Kaiserslautern
XML Index Types – Usage Sample
7
Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()
No Indexes: 3 x document scans 1 x document navigational access
Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()
● Simplified Query Graph Model for XMark 01
![Page 25: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/25.jpg)
University of Kaiserslautern
XML Index Types – Usage Sample
7
Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()
Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()
● Simplified Query Graph Model for XMark 01
![Page 26: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/26.jpg)
University of Kaiserslautern
XML Index Types – Usage Sample
7
Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()
● Simplified Query Graph Model for XMark 01
![Page 27: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/27.jpg)
University of Kaiserslautern
XML Index Types – Usage Sample
7
Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()
● Simplified Query Graph Model for XMark 01
![Page 28: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/28.jpg)
University of Kaiserslautern
XML Index Types – Usage Sample
Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()
● Simplified Query Graph Model for XMark 01
Query: let $auction := doc("auction.xml") return for $b in $auction/site/people/person[@id = "person0"] return $b/name/text()
Besides element and content indexMore than 280 path or CAS indexes possible
by varying depth, axes (“/”,”//”), clustering, and wildcards
7
![Page 29: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/29.jpg)
University of Kaiserslautern
Outline
● XML Index Types● Statistics for Cost-based Index Selection● Index Configuration Management● Evaluation● Summary & Outlook
![Page 30: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/30.jpg)
University of Kaiserslautern
Statistics● Extended Path Synopsis
9
bib
last
id titleyear price
book
publication
type author paperpublisher
name
journal
first loc titlename
1
1817
16
1514
13
12
11
109
87654
3
2
bib1
Each node is extended:
● Index Statistics
Gathered during storage:●B-tree height●# leave pages●Cardinality●Index size
…
… h
pages, size
Instance counter | o content length | IUD counter
●Extended PS typically <0.1% of XML document●No optimization yet (e.g., recursion)●Processing overhead less than 5-6%
Overhead:
journal12
…
… …
![Page 31: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/31.jpg)
University of Kaiserslautern
Statistics● Cost Estimation for Index Candidates
● Estimation Accuracy
10
1.Evaluate Index Expression on Path Synopsis → PCR set2.For each PCR add node's statistics (cardinality, width, IUD counter)3.Index type-dependent estimation of height, no. of leaves, and size
![Page 32: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/32.jpg)
University of Kaiserslautern
Outline
● XML Index Types● Statistics for Cost-based Index Selection● Index Configuration Management● Evaluation● Summary & Outlook
![Page 33: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/33.jpg)
University of Kaiserslautern
Index Configuration Management● Integrated into XTC
backend
● Asynchronous jobs for
– Candidate search
– Index building● Flexible configuration
– Type support
– Pruning thresholds
– Size limits
– Job schedule
12
![Page 34: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/34.jpg)
University of Kaiserslautern
Index Configuration Management1. Record query processing costs
2. If cost > threshold do Auto Indexing
3. Feedback Auto Indexing costs
Auto Indexing
1. Traverse query plan bottom-up (access operators)
2. Generate index candidates
3. Rerun optimization including candidates
4. Analyze plan(s) for selected candidates
5. Update candidate set
Cost/benefit calculation → schedule index materialization jobs13
![Page 35: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/35.jpg)
University of Kaiserslautern
Candidate Generation
14
Auto Indexing – Traverse
Candidates:QGM representation
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
![Page 36: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/36.jpg)
University of Kaiserslautern
14
Auto Indexing – Traverse
Candidates:QGM representation
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
1. //
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
Candidate Generation
![Page 37: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/37.jpg)
University of Kaiserslautern
1. Record query processing costs
2. If cost < threshold do Auto Indexing
3. Feedback Auto Indexing costs
Auto Indexing
1. Traverse query plan bottom-up (access operators)
2. Create index candidates
3. Rerun optimization including candidates
4. Analyze plan(s) for selected candidates
5. Update candidate set
14
Auto Indexing – Traverse
Candidates:QGM representation
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
1. //2. book
Candidate Generation
![Page 38: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/38.jpg)
University of Kaiserslautern
14
Auto Indexing – Traverse
Candidates:QGM representation
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
1. //2. book3. @id
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
Candidate Generation
![Page 39: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/39.jpg)
University of Kaiserslautern
14
Auto Indexing – Traverse
Candidates:QGM representation
1. //2. book3. @id4. “book1”
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
Candidate Generation
![Page 40: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/40.jpg)
University of Kaiserslautern
14
Auto Indexing – Traverse
Candidates:QGM representation
1. //2. book3. @id4. “book1”…
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
Candidate Generation
![Page 41: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/41.jpg)
University of Kaiserslautern
14
Auto Indexing – Traverse
Candidates:QGM representation
1. //2. book3. @id4. “book1”…
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
Candidate Generation
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
Over 18 enumeration rules and optimization rules
![Page 42: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/42.jpg)
University of Kaiserslautern
Cost-Benefit Calculation
● Rank candidates by benefit
● Observe space limitation (Greedy algorithm)
● Search space pruning (merge indexes)
Containment Problem:
● Comparing PCR sets, but index types and semantics may differ
Reconfigure Index Set
● Schedule materialization and deletion jobs
Candidate Selection
Query: let $doc:=doc("sample.xml") return for $book in $doc//book[@xtc:id="book1"] return $book/title
15
![Page 43: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/43.jpg)
University of Kaiserslautern
Outline
● XML Index Types● Statistics for Cost-based Index Selection● Index Configuration Management● Evaluation● Summary & Outlook
![Page 44: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/44.jpg)
University of Kaiserslautern
Evaluation● Workload Documents
– XMark documents (12MB and 112MB)
– TPoX collection (~250MB)
– Standard XML documents (dblp, treebank, psd, …)
● Benchmark Scenarios– No Indexes
– Manual (element + content index)
– Manual + Self-tuning
– Self-tuning
17
![Page 45: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/45.jpg)
University of Kaiserslautern
Evaluation
18
![Page 46: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/46.jpg)
University of Kaiserslautern
Evaluation
18
![Page 47: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/47.jpg)
University of Kaiserslautern
Evaluation
18
(2)
![Page 48: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/48.jpg)
University of Kaiserslautern
Evaluation
18
(2)
![Page 49: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/49.jpg)
University of Kaiserslautern
Evaluation
18
(2)(2 + 23)
![Page 50: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/50.jpg)
University of Kaiserslautern
Evaluation
18
(2)(2 + 23)
![Page 51: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/51.jpg)
University of Kaiserslautern
Evaluation
18
(2)(2 + 23)
(23)
![Page 52: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/52.jpg)
University of Kaiserslautern
Evaluation
Impact of parallel indexing
19
Impact of aggressive indexing
![Page 53: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/53.jpg)
University of Kaiserslautern
Evaluation
20
Overhead in %
% of workload
Overhead / Pruning effects Workload shifts
![Page 54: On the Use of Query-driven XML Auto-Indexing€¦ · ISP: 10 tables each having 8 columns and max 3 columns per index → ~4,000 indexes XML specifics on top of the ISP – Semi-structured](https://reader033.vdocuments.mx/reader033/viewer/2022050211/5f5d7896695f201f3340b8b2/html5/thumbnails/54.jpg)
University of Kaiserslautern
Summary● Self-tuning of XML indexes causes new challenges to the ISP
● Path Synopsis use for storage, indexing, and managing
● Overhead of statistics and management (mostly) pays off
● Self-tuning of tuning “frequency” and pruning are effective
● Integrate Update workload
● Analyze workload shift reaction (stability vs. effectiveness)
● Evaluate XML warehouse scenarios
● Integrate data placement decision
● Combine with buffer tuning
Outlook
21