bitmap indexes for relational xml twig query processing

22
Kyong-Ha Lee and Bongki Moon The University of Arizona Bitmap Indexes For Relational XML Twig Query Processing

Upload: kyong-ha-lee

Post on 24-Jun-2015

903 views

Category:

Technology


0 download

DESCRIPTION

The slides I presented at CIKM'09

TRANSCRIPT

Page 1: Bitmap Indexes for Relational XML Twig Query Processing

Kyong-Ha Lee and Bongki MoonThe University of Arizona

Bitmap Indexes For Relational XML Twig Query Processing

Page 2: Bitmap Indexes for Relational XML Twig Query Processing

CIKM'09, Hong Kong 2

XML Data and Queriesa1

a2

b1 c1

d1 e1

a3

b2 d2

c2e2

a4

b3 e3

c3 d3

(1, 32,1)

(2,11,2)

(3,4,3) (5,10,3)

(6,7,4) (8,9,4)

(12,21,2)

(13,16,3) (17,20,3)

(18,19,4)(14,15,4)

(22,31,2)

(23,28,3) (29,30,3)

(24,25,4) (26,27,4)

0

1

2 3

4 5

6

7

108

9

11

12

13 14

15

<a> <a> <b>t1</b> <c> <d>t2</d> <e>t3</e> </c> </a> <a> <b> <e>t4</e> </b> <d> <c>t5</c> </d> </a>. . . . .</a>

A

E

C

B

A

CB

//A/B/C

//A[//B]//C

//A[./B/C]//E

A

B

C

Page 3: Bitmap Indexes for Relational XML Twig Query Processing

CIKM'09, Hong Kong 3

XML Stored in RDBtagName start end level value pathId

a11 31 1 - 0

a22 11 2 - 1

b13 4 3 t1 2

c15 10 3 - 3

d16 7 4 t2 4

e18 9 4 t3 5

a312 21 2 - 1

b213 16 3 - 2

e214 15 4 t4 6

NODE tablepathId pathString

0 A#

1 A##A#

2 B##A##A#

3 C##A##A#

4 D##C##A##A#

5 E##C##A##A#

6 E##B##A##A#

7 D##A##A#

8 C##D##A##A#

9 C##B##A##A#

10 D##B##A##A#

11 E##A##A#

PATH table

e329 30 3 - 11

. . .

. . .

. . .

Page 4: Bitmap Indexes for Relational XML Twig Query Processing

CIKM'09, Hong Kong 4

To answer a twig query A twig pattern is decomposed into

several path patterns. Path solutions are joined together to

compose a final result.

Holistic Twig Join(HTJ) algorithm Specialized multi-way& sort-merge

join guarantees I/O optimality for a cer-

tain subset of XML query.The optimality depends on how the

elements are partitioned. uses stacks and streams in which el-

ements are sorted in an order.

Twig Join

a1 a2 a3 a4

b1 b2 b3

c1 c2 c3

d1 d2 d3

e1 e2 e3

StreamsStacks

SA

SBSE

A

E

C

B

A

B

C

A

E

A

SC

Page 5: Bitmap Indexes for Relational XML Twig Query Processing

CIKM'09, Hong Kong 5

Discrepancy between XML in RDB and conventional HTJ algorithmsLogical: Streams vs. TablePhysical: partitioned vs. record-orientedSupporting actual data including a large volume of texts

requires references to records.How to feed tuples to HTJ algorithm?What’s the best partitioning scheme for XML stored in

RDB?

Bitmap index, a conventional index in RDBMSAn efficient way to indicate tuples.Efficient support for logical operationsCan we use the bitmap index for supporting HTJ?

Motivation

Page 6: Bitmap Indexes for Relational XML Twig Query Processing

CIKM'09, Hong Kong 6

Tag-based partitioningSimple, and skipping technique can be used to

read useful elements only. For a query node, only one stream is accessed

Tag+Level partitioningMore I/O optimality, suitable for deep XMLSome streams may be accessed for a single query

node Path-based partitioning

More I/O optimality, suitable for shallow XMLA path with //-axes may require accessing many

streams for a single query node

HTJ on Different Partitioning Schemes

Page 7: Bitmap Indexes for Relational XML Twig Query Processing

CIKM'09, Hong Kong 7

How to partition tuples in NODE ta-ble By building a bitmap index on certain

column(s) in the table.bitTag for tagName, bitTag+ for (tagName, Level), bitPath for pathId column

Determines I/O optimality of holistic twig join algorithms.

During twig join process, useful tu-ples are accessed via the bitmap index.

Bitmap Index

1100001000

0010000100

0000010000

. . .

. . . B

it-vecto

rs

A B E

disk blocks

Page 8: Bitmap Indexes for Relational XML Twig Query Processing

CIKM'09, Hong Kong 8

bitAnc : A bit-vector represents terminal elements corr. to a certain path and all their ancestors.

bitDesc: A bit–vector represents terminal elements corr. to a certain path and all their descendants.

Additional Indexes

0010000100001000

1110001100011000

0123456789

101112131415

0010000110001110

(a) bitPath, bitAnc, and bitDesc for PathId=2, i.e. /A/A/B

a1

a2

b1

a3

b2

e2

a4

b3

c3 d3

0

1

2

6

7

8

11

12

13 14

(b) A subtree covered by the left 3 bit-vectors

Page 9: Bitmap Indexes for Relational XML Twig Query Processing

CIKM'09, Hong Kong 9

Basic indexBit-vectors are built on a single column or a

group of columnsRequires labeled values, and reading records

Hybrid indexA Combination of two different indexesdescTag : bitDesc & bitTagbitTwig : bitPath & bitAnc

does not require labeled values to compute twig solution

Two Types of Indexes

Page 10: Bitmap Indexes for Relational XML Twig Query Processing

CIKM'09, Hong Kong 10

Identifying Element Rela-tionship with Bit-vectors

•For a query //A//B, can the pairs (a1, b1) and (a2, b2) be solution?

1110001100011000

1000000000000000

1100001000010000

a2a1

b1

P2: /A/A/B P0: /A P1: /A/A

b2

a1

a2

b1

a3

b2

a4

b3

0

1

2

6

7

11

12

0123456789

101112131415

Page 11: Bitmap Indexes for Relational XML Twig Query Processing

CIKM'09, Hong Kong 11

Choose the minimum position value among the current 1’s as a current el-ement for a query node

Check if 1 exists in an interval, pos(a) and pos(d)?looking-ahead at the next 1

Advancing Cursors

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0

P0 : /A

P1 : /A/A

0

Current1 Next1

Currq

eov

1 6

(0,0,1)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

q : //A

Page 12: Bitmap Indexes for Relational XML Twig Query Processing

CIKM'09, Hong Kong 12

Early detection with a bit-vector ab-sence

Condensing query nodes For path-based partition Reduces |INDEX| and |RECORD|

Skipping reading obsolete records with advance(k) For tag, (tag, level)-based partition Reduces |RECORD|

Moving cursors over compressed bit-vectors with no decompression A composite cursor moving over a bit-vec-

tor compressed by run-length encoding scheme

Reduces |INDEX|

Optimizations

A

E

C

B

A

EC

P: //A/B/C

10000000000100000

00001000010000100

CA = 11

CB = 4 advance(11)

Page 13: Bitmap Indexes for Relational XML Twig Query Processing

CIKM'09, Hong Kong 13

Compressed Bit-vector000100000000100000000000000011 00000000000 . . . 00000000000000 0000000000000000000000000000001 00

(a) An original bit-vector with 8,000 bits

000010…010…011 100… 0100000000 000…001

(b) Grouping as a unit of 31 bits and Merging identical groups

31 bits 2 bits256* 31 bits31 bits

000…000

(c) Encoding each group as 1 word (4byte on a 32-bit machine)

Uncompressed word Compressed word

Run-length is 256 Remainingword

31 literal bits

Cursor C ={ C.position, //Integer position value (Logical address) C. word, // The current word C is located at. C.bit, // The position of the bit C is visiting, in C.word C. rest } //The bit position in the remaining word

Page 14: Bitmap Indexes for Relational XML Twig Query Processing

CIKM'09, Hong Kong 14

Moving A Cursor over A Com-pressed Bit-vector

000010…010…011 100… 0100000000 000…001 000…000

Run-length is 256Remaining

word

C = {31, 0, 31,0}

a) Get the position of the next 1

Skip to examine 31* 256 bits

C={7998, 2, 31, 0}

b) Check a bit value at the position 3,000

000010…010…011 100… 0100000000 000…001 000…000

C = {31, 0, 31,0}with distance to move, 2,869=(3000-31)

Since 31* 256 > 2,869,The bit we find is within the word 1.

Page 15: Bitmap Indexes for Relational XML Twig Query Processing

CIKM'09, Hong Kong 15

Experiments

Datasets Synthetic : XMarkReal : DBLP, Treebank, Swiss-prot

Query sets

Page 16: Bitmap Indexes for Relational XML Twig Query Processing

CIKM'09, Hong Kong 16

Statistics of Dataset and Indexes

•# of distinct paths really varies

•# of distinct tag names are not much different

•Index build time is largelyaffected by attribute cardinality

•Index size is smaller than labeled value size in most cases

Page 17: Bitmap Indexes for Relational XML Twig Query Processing

CIKM'09, Hong Kong 17

Query Execution Time

Page 18: Bitmap Indexes for Relational XML Twig Query Processing

CIKM'09, Hong Kong 18

Input Data Size

Page 19: Bitmap Indexes for Relational XML Twig Query Processing

CIKM'09, Hong Kong 19

Merging used bit-vectors for a path pattern with //-axes and putting it into a bitmap index for the next timefor a given path //A//B, P:/A/A/B P:/A/Bacts like a pre-computed join indexA path pattern with //-axes can be repre-

sented by a single bit-vector. Logical operations: OR, NOT

are simply supported by bitwise-logical operations: &, |, ^

Other Features on bitPath

Page 20: Bitmap Indexes for Relational XML Twig Query Processing

CIKM'09, Hong Kong 20

Twig Queries with Logical Opera-tions

//A[./B/C or ./B/D]//E

A

E

(C|D)

B

//A[./B/not(C)]//E

A

E

¬ C

B

A

E

X

B

A A P//A,P//A//B//X ≡P//A//B//C V P//A//B//D ,P//A//E

A

E

C

B

A AA

B

P//A ,P//A//E ,P//A/B (Pⓧ //A/B ⊙A//A/B/C)

Page 21: Bitmap Indexes for Relational XML Twig Query Processing

CIKM'09, Hong Kong 21

We investigated the possibilities of bitmap indexes for XML query processing Partitioning XML stored in RDB in various ways Cursor movements do not require decompression of bit-

vectors We devised a way to identify element relationship

with only bitmap index, bitTwig Our experiments showed that bitTwig was best for

queries against shallow XML documents For deep XML documents, bitTag/w advance(k)

showed the best performance. Future work: evaluating our system with more HTJ al-

gorithms and other indexes

Conclusions

Page 22: Bitmap Indexes for Relational XML Twig Query Processing

Thanks! Questions?