storage representations for set-oriented selection predicates karthikeyan ramasamy with jeffrey f....
DESCRIPTION
Set Valued Attributes Many semantic notions of the real world can be described by sets (e.g) set of courses, set of products, etc. Set valued attributes provide conciseness and ease of expressionTRANSCRIPT
Storage Representations for Set-Oriented Selection Predicates
Karthikeyan Ramasamywith
Jeffrey F. Naughton and David Maier
Object Relational DBMS
• OR-DBMS are gaining acceptance• Market for OR-DBMS is growing • Many vendors are working on a version of
OR-DBMS• Main features of OR-DBMS
– Type extensibility– Collections
Set Valued Attributes
• Many semantic notions of the real world can be described by sets (e.g) set of courses, set of products, etc.
• Set valued attributes provide conciseness and ease of expression
Classification of Representations
Internal External
Nes
ted
Unn
este
d
Yes Yes
YesNo
Nested Internal Representation
• Stored at the end of the tuple• Requires support for handling large tuples• Retrieval cost of a tuple increases• Updates could reorganize the whole tuple• Might do well when the size of the set is
small
Nested Internal Representation
Cardinality
Element 1
Element 2
Element N
.
.
Length
Tuple
A1 A2 A3
Unnested External Representation
• Set-valued attributes are stored separately in an auxiliary relation
• Set instances are unnested and each element stored as a tuple
• Uses key - foreign key for connecting tuple and its set elements
• Requires join to assemble elements
Unnested External Representation
• Example– Moviegoer(name, street, city, state, zip, {movies})
• Base Relation– Moviegoer-Base(name, street, city, state, zip, id)
• Set Relation– Moviegoer-Set(id, movie-name)
Nested External Representation
• Set valued attributes are stored in an auxiliary relation
• Set instances are nested in auxiliary relation• Uses key - foreign key• Number of tuples is the same as base
relation• Resorts to join
Nested External Representation
• Example– Moviegoer(name, street, city, state, zip, {movies})
• Base Relation– Moviegoer-Base(name, street, city, state, zip, id)
• Set Relation– Moviegoer-Set(id, {movies})
Indexed Variants
• Augmentation with Indexes• Nested Representations
– Unnested and unclustered Index• Unnested Representations
– Clustered Index– Unclustered Index
Performance - Settings
• Implementation in Paradise - Set Adt• Intel Pentium 333 MHz - Solaris 2.6• Main memory - 128 MB• Buffer pool size - 32 MB• Used raw disks of size 4 GB• Each experiment was run against cold
database
Performance - Experimental Schema
Moviegoer(name, street, city, state, zipcode, {movies})
– Average tuple size 68 bytes – Number of Base Relation tuples 10000– Number of Set Elements 1000000– Set element size is 20 bytes
Performance - Queries
• Queries ran are– Conjunctive Queries– Disjunctive Queries– Queries not referring to set valued attribute
• Sets are not in the result• Sets in the result
Performance - Parameters Varied
• Cardinality• Selectivity of the predicate• Number of elements in the predicate• Size of each set element
Conjunctive Queries
SELECT m.name, m.street, m.city, m.state, m.zipcodeFROM Moviegoer mWHERE { “movieA50061”, “movieA50062” }
SUBSET OF m.movies
No Set in the Result
SELECT m.name, m.street, m.city, m.state, m.zipcode, m.moviesFROM Moviegoer mWHERE { “movieA50061”, “movieA50062” }
SUBSET OF m.movies
Set in the Result
Disjunctive Queries
SELECT m.name, m.street, m.city, m.state, m.zipcodeFROM Moviegoer mWHERE “movieA50061” IN m.movies OR
“movieA50062” IN m.movies
No Set in the Result
SELECT m.name, m.street, m.city, m.state, m.zipcode, m.moviesFROM Moviegoer mWHERE “movieA50061” IN m.movies OR
“movieA50062” IN m.movies
Set in the Result
No Set in Result - Varying Cardinality
0
10
20
30
40
50
60
70
80
0 20 40 60 80 100 120Cardinality
Res
pons
e Ti
me
(sec
)
Nested Internal Indexed Nested InternalNested External Indexed Nested ExternalUnnested External Indexed Unnested External
Selectivity of 1 % for Six Element Predicate Query
No Set in Result - Varying Selectivity
0
20
40
60
80
100
120
0.01 0.1 1 10 25 50Selectivity (%)
Res
pons
e Ti
me
(sec
)
Nested Internal Indexed Nested InternalNested External Indexed Nested ExternalUnnested External Indexed Unnested External
Six Element Predicate Query with Cardinality of 100
No Set in Result - Varying Number of Elements in Predicate
Selectivity of 1% with cardinality of 100
0
10
20
30
40
50
60
70
80
1 2 4 6Number of Elements in the Predicate
Res
pons
e Ti
me
(sec
)
Nested Internal Indexed Nested InternalNested External Indexed Nested ExternalUnnested External Indexed Unnested External
No Set in Result - Varying Size of Set Element
Selectivity of 1% with cardinality of 100
0
10
20
30
40
50
60
70
80
11 20 30Size of Set Element
Res
pons
e Ti
me
(sec
)
Nested Internal Indexed Nested Internal
Nested External Indexed Nested External
Unnested External Indexed Unnested External
Queries - Not Referring Set Valued Attribute
SELECT m1.name, m1.street, m1.city, m1.state, m1.zipcodeFROM Moviegoer m1, Moviegoer m2WHERE m1.id = m2.id
Join Query
SELECT m.name, m.street, m.city, m.state, m.zipcode,FROM Moviegoer m
Select Query
Select Query
0
2
4
6
8
10
12
10 25 50 100
Cardinality
Res
pons
e Ti
me
(sec
)
Unnested External Nested External Nested Internal
Conclusions and Future Work
• Nested representations perform better for set oriented selection predicates
• Indexes on nested representations are effective than unnested representations
• Evaluation of these representations for nested set joins
• Specialized operators for nested representations
Unnested External Representation
• Ability to handle any cardinality• Easily slides into existing relational engine• Set operations might be inefficient since
elements are scattered• Keys provide overhead when set elements
are small• Cardinality Explosion
No Set in Result - Cost BreakdownNested Internal
0
10
20
30
40
50
60
70
80
10 25 50 100
Cardinality
Res
pons
e Ti
me
(sec
)
I/O Cost Buffer Pool Cost
Predicate Eval Cost Other System Cost
Unnested External
0
10
20
30
40
50
60
70
80
10 25 50 100
Cardinality
Res
pons
e Ti
me
(sec
)
I/O Cost Buffer Pool Cost
Predicate Eval Cost Other System Cost
Selectivity of 1 % for Six Element Predicate Query
Conjunctive Queries - Unnested External
SELECT mb.set-id, mb.name, mb.street, mb.city, mb.state, mb.zipcode
FROM Moviegoer-Base mb, Moviegoer-Set msWHERE mb.set-id = ms.set-id AND
(ms.movie-name = “movieA50061” OR ms.movie-name = “movieA50062”)
GROUP BY mb.set-id, mb.name, mb.street, mb.city, mb.state, mb.zipcode
HAVING count(*) = 2
No Set in the Result
Conjunctive Queries - Unnested External
SELECT mb.set-id, mb.name, mb.street, mb.city, mb.state, mb.zipcode
FROM Moviegoer-Base mb, Moviegoer-Set ms1, Moviegoer-Set ms2
WHERE mb.set-id = ms1.set-id AND mb.set-id = ms2.set-id AND ms1.movie-name = “movieA50061” AND ms2.movie-name = “movieA50062”
No Set in the Result
Conjunctive Queries - Unnested External
INSERT INTO tempSELECT DISTINCT mb.set-id, mb.name, mb.street, mb.city,
mb.state, mb.zipcodeFROM Moviegoer-Base mb, Moviegoer-Set msWHERE mb.set-id = ms.set-id AND
ms.movie-name = “movieA50061” OR ms.movie-name = “movieA50062”
GROUP BY mb.set-id, mb.name, mb.street, mb.city, mb.state, mb.zipcode
HAVING count(*) = 2
SELECT t.name, t.street, t.city, t.state, t.zip, ms.movie-nameFROM temp t, Moviegoer-Set msWHERE t.set-id = ms.set-id
Set in the Result
Disjunctive Queries - Unnested External
SELECT DISTINCT mb.set-id, mb.name, mb.street, mb.city, mb.state, mb.zipcode
FROM Moviegoer-Base mb, Moviegoer-Set msWHERE mb.set-id = ms.set-id AND
(ms.movie-name = “movieA50061” OR ms.movie-name = “movieA50062”)
No Set in the Result
Disjunctive Queries - Unnested External
SELECT DISTINCT mb.set-id, mb.name, mb.street, mb.city, mb.state, mb.zipcode, ms2.movie-name
FROM Moviegoer-Base mb, Moviegoer-Set ms1, Moviegoer-Set ms2
WHERE mb.set-id = ms1.set-id AND ms1.set-id = ms2.set-id (ms.movie-name = “movieA50061” OR ms.movie-name = “movieA50062”)
Set in the Result