e.bertino, l.matino object-oriented database systems chapter 8. storage management and indexing...
TRANSCRIPT
E.Bertino, L.MatinoObject-Oriented Database Systems
Chapter 8. Storage Managementand Indexing Techniques
Seoul National University
Department of Computer Engineering
OOPSLA Lab.
2OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Table of Contents
Storage Techniques for Relational DBMS Storage Techniques for Objects Clustering Techniques Indexing Techniques for OODBMS Object Identifiers Swizzling
3OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Storage Techniques for Relational DBMS
Disk Organization Storing Records in RDBMS Addressing Records with a Slot Vector
4OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Disk Organization
Disk partitions segments pages/blocks Disk header
# of partitions the address and the size of each partition log for recovery in case of a system crash
Page addresses for each segment are stored in tables
Page = page header + offsets of objects + objects
5OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
DISK
header
partition1 partitionnN logl1
l1
ln
ln
… …
segment1… … segmentm
page1 … … pagei
header
array of offset
adjacent free space
totalfree space
Z A F B
6OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Storing Records in RDBMS
Fixed length records normally stored contiguously on the disk all the records of a relation can be stored in a single file
Variable length records stored directly on the disk with an ID structure of ID is important on the retrieval speed
Structure of ID in System R high order bits for the segment and the page of the file low order bits for a record within a page
7OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Addressing Records with a Slot Vector
Advantages as fast as using the complete address of a record the length of records can be changed the records can be relocated often faster than using the purely logical ID
RECORD
SLOT
8OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Storage Techniques for Objects
Structure of Objects Access Patterns to Objects Approaches to Storage Organization for Objects Storage and Variable Length and Large Attributes Storage and Inheritance Hierarchy
9OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Structure of objects
Storage/memory organization must support objects with both atomic and complex attributes objects with multi-valued attributes objects with variant attributes objects with long field attributes such as multimedia
information, texts, images, voice, etc
Efficiency of storage organization depends on structure of objects and their relations access pattern which is the way in which the application
programs access the objects
10OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Categories of Access Patterns
Access based on the whole object for applications which execute complex manipulations of
objects by means of specialized program whole object is copied onto the application's memory direct model
Access based on the attributes of the object appropriate when large objects need to be accessed used to retrieve attributes of objects along the aggregation
hierarchy normalized model
11OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Direct Model of Storage Organization(1)
Objects are stored in the same way in which they are defined in the conceptual schema storage unit = semantic unit objects of the same class are stored in the same file
Advantages simplest and same as the one used in RDBMS transferring of a whole objects is a very efficient
Disadvantages accesses to a set of attributes of an object can be very
expensive
12OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Direct Model of Storage Organization(2)
Situations where direct model is inefficient variable length attribute new attributes the majority of attributes have the null value
13OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Normalized Model of Storage Organization
Decompose an object into atomic components Each component are stored in different files Relation between the components is maintained by
OIDs
14OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Intermediate Approach
Complex objects are decomposed Components are grouped together according to
access patterns to be stored in the same file Problem
efficiency depends on having prior knowledge of the exact access pattern for applications
15OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Variable-length and Large Attributes
Normalized method Property list method Stream (or demand-page) mechanism
portions of the object can be transferred in increments
16OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Property list(1)
Sequence of triples < identifier, size, value > identifier : which attribute of the object is stored size : # of bytes stored value : that (of varying size) of the attribute
17OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Property List(2)
Advantages variable length attributes different set of attributes sparse attributes attributes can be stored in different physical locations
Disadvantages whole property list scanning to find the desired attribute transformation of the property list to the proper format
for the application programming language
18OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Storage and Inheritance hierarchy
Attributes of the superclass should be stored Single inheritance
storing the attributes of superclass first, then those of subclass
variable length attribute alongside with the property list
Multiple inheritance property lists storing objects separately each of above contains the fields for superclass, and linked
to one another
19OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Clustering Techniques
Clustering in DBMS Clustering in RDBMS Clustering in OODBMS Static Clustering Dynamic Clustering Clustering for Multiple Relations
20OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Clustering in DBMS
Focus partitioning objects in the database placing these partitions on disk
Aim reduce the number of I/O operations on disk
Consideration structure of the objects access pattern of applications
21OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Clustering Techniques for RDBMS
Tuples of a relation in the same page segment on the basis of the value of an attribute or of a
combination of attributes in a relation
Tuples of more than one relation in the same segment one or more attributes in common with the same values efficient for processing queries with join operation
22OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Clustering Techniques for OODBMS
New considerations compared with RDBMS complex objects single or multiple inheritance methods
Linear clustering sequence for complex object all the descendant nodes of each node p in the hierarchy
are stored immediately after p in depth-first order efficient on retrieval of an object and all its descendants
23OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Basic Options for Clustering for OODBMS
Proposed by Won Kim in 1990 both clustering techniques as in RDBMS clustering all the instances of classes which belong to
an aggregation hierarchy clustering all the instances of classes which belong to
the inheritance hierarchy combination of the two previous strategies
The clustering strategies above are static
24OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Static Clustering
Unchangeable at run-time Problems
no considerations on the dynamic evolution of objects objects can be shared among several objects clustering schema based on the single access pattern
25OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Dynamic Clustering
The sequence of creation of objects would NOT be the same as the desired clustering sequence.
Reorganizing and recompacting pages in a cluster Types of file reorganization
on-line : optimal one is NP-complete problem off-line : when the reorganization will be done?
On-line reorganization technique by Chen, Hurson chunks(set of pages) as the unit of clustering cost model ratio between the read and write operations
26OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Clustering for Multiple Relations
Certain relationships can be used more frequently Direct graph
nodes for objects arcs for relationships weights for ordering relationships
Clustering algorithm with levels by Chen, Hurson arranges all the nodes of the graph in a linear sequence nodes connected by heavier arcs are nearer than others access time is around half that for objects randomly
27OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Indexing Techniques for OODBMS
Indexing Techniques for Aggregation Hierarchy Index Structures and Operations Comparison of Index Organization Indexing Techniques for Inheritance Hierarchy Precomputing and Caching
28OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Preliminary Definitions
Path a branch in an aggregation hierarchy
Path instantiation a sequence of objects obtained by instantiating the path
Nested index an index for a direct connection between the starting object
and the ending object of the path instantiation
Path index an index for storing instantiation of a path same index key as nested index
Index Key
29OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Project
Company
Division
PersonExample of Aggregation Hierarchy
30OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Definition of Path
Given an aggregation hierarchy H, a path P is defined as C1.A1.A2…..An(n 1) where C1 is a class in H
A1 is an attribute of class C1
Ai is an attribute of class Ci in H, such that Ci is the domain of the attribute Ai - 1 of class Ci - 1 (1< i n )
length(P) : the length of the path classes(P) : the set of classes along the path dom(P) : the domain of attribute An of class Cn
31OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Examples of Path
P1:Project.main_contracting_company.divisions.head.name length( P1) = 4
classes( P1) = { Project, Company, Division, Person }
dom( P1) = STRING
P2 : Person.divisions.city
length(P2) = 2
classes(P2) = { Project, Division }
dom(P2) = STRING
32OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Definition of Complete Instantiation
Complete instantiation is a sequence of objects along path
Given the path P = C1.A1.A2…..An , CI is denoted as O1.O2…..On+1 , where
O1 is an instance of class C1
Oi is the value of the attribute Ai - 1 of object Oi - 1
• Oi = Oi - 1 .Ai - I or Oi Oi - i . Ai - i (1 i n +1)
Examples of CI, where path is given as P1
Project[i].Company[k].Division[k].Person[x].Jones Project[j].Company[i].Division[h].Person[y].Smith
33OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Definition of Partial Instantiation
Partial instantiation is the part of CI, which ends at the last object of CI
Given a path P = C1.A1.A2…..An, PI is denoted as O1.O2…..Oj (j<n+1), where O1 is an instance of class Ck in Class(P) such that k+j-
1=n+1 Oi is the value of attribute Ai - 1 of an object Oi - 1
Examples of PI, where path is given as P1
Division[k].Person[x].Jones Division[h].Person[y].Smith
34OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Definition of Redundancy
Given a PI as O1.O2…..Oj, it is not redundant if there are no CI or PI as O'1.O'2…..O’k, where k>j and
Oi = O’k - j + 1 (i=1,...,j)
Examples of redundant PI Division[k].Person[x].Jones is redundant to
Project[i].Company[k].Division[k].Person[x].Jones
Division[h].Person[y].Smith is redundant to
Project[j].Company[i].Division[h].Person[y].Smith
35OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Definition of Projection of Path
Projection of Path is the part of CI or PI, which begins from the first object of it
<m>(p) denotes a projection of p with a length m P = C1.A1.A2…..An
as PI (or CI) of P, p = O1.O2.O3…..Oj (j n+1)
<m>(p) = O1.O2.…..Om (m<j)
Example <2>(Project[i].Company[k].Division[k].Person[x].Jones) ==
Project[i].Company[k]
36OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Multi-index
Index to each of the classes constituting the path Multi-index is a set of n simple indices I1, I2 ,…,In
given a path P = C1.A1.A2…..An
Ii is an index defined on Ci . Ai, 1 i n
Solving a nested predicate scans n indices first scanning the last index In on the path
the results of the scan using Ii are used as keys for Ii-1
Only for reverse traversal scanning strategies Low updating cost
37OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Examples of Multi-index
First index I1 on Project.main_contracting_company (Company[k], {Project[i]}) (Company[i], {Project[j], Project[l]})
Second index I2 on Company.divisions (Division[h], {Company[i]}) (Division[i], {Company[i]}) (Division[k], {Company[k]})
Third index I3 on Division.city (Boston, {Division[h]}) (New York, {Division[i]}) (Los Angeles, {Division[k]})
38OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Example of Using Multi-index
Select all the projects with a main contracting company which has a division in Los Angeles Scanning index I3 with the key-value = Los Angeles
{Division[k]} Scanning index I2 with the key-value = Division[k]
{Company[k]} Scanning index I1 with the key-value = Company[k]
{Project[i]} Result: {Project[i]}
39OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Join Index
To perform joins in relational model efficiently Binary join index for binary relation (r, s)
one index clustered on r the other index clustered on s
BJI can be used in a multi-index organization reverse traversal faster forward traversal in cases of high access costs to
objects since no database access for objects more suitable for complex queries
40OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Nested Index
Direct association between the ending object and the starting object in path
Given a path P = C1.A1.A2…..An, nested index on P is defined as a set of pairs (O,S) S = {O' such that there is O1.O2…..On+1 as a CI where O'
= O1 and O = On+1}
Examples (Boston, {Project[j]}) (New York, {Project[j], Project[k], Project[l]}) (Los Angeles, {Project[i]})
41OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Properties of Nested Index
Retrieval is quite fast for scanning only one index Problem on update operation
the access to several objects forward traversal to determine the value of the indexed
attribute reverse traversal to determine all instances at the
beginning of the path ==> inverse references
42OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Path Index
Given a key, all the path instantiations are stored Given a path P=C1.A1.A2…..An, a path index on P is
defined as a set of pairs (O,S) where S={<j-1>(pi),
pi = O1.O2.O…..On (1 j n+1) is a CI or non-redundant PI of P
Oj = O }
Examples (Boston, {Project[j].Company[i].Division[h]}) (New York, {Project[j].Company[i].Division[i],
Project[k].Company[m].Division[j], Project[l].Company[i].Division[i]})
43OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Properties of Path Index
For nested predicates in all classes along the path Updates of a path index
only forward traversals are required
Identical with nested index where n = 1
44OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Access Relations
Similar to path indices storing all instantiations along a path in a relation
Examples <Project[i], Company[k], Division[k], Los Angeles> <Project[j], Company[i], Division[h], Boston> <Project[j], Company[i], Division[i], New York> <Project[k], Company[m], Division[j], New York> <Project[l], Company[i], Division[h], Boston> <Project[l], Company[i], Division[i], New York>
Several subpaths to different relations
45OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Index Structures using B+tree Structure of the internal node
n records of <key-length, key, pointer> A record of a leaf node in a nested index
record-length key-length, key-value # of OIDs associated with the key list of OIDs
A record of a leaf node in a path index record-length key-length, key-value # of the path instantiations associated with the key list of path instantiations
46OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Operations with Nested Index
To solve a predicate against a nested attribute An of class C1 single index scan same cost to solve the predicate on a simple attribute of C1
For update operation one forward traversal to find the old key value another one forward traversal to find the new key value one reverse traversal to find the OID of associated object
47OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Operations with Path Index
To solve a predicate against the nested attribute An of class Ci (1 i n) one index scan determine the PI or CI associated with the key value extract the OIDs occupying the i-th position of them
For update operation one forward traversal to find the old path instantiation another one forward traversal to find the new path
instantiation
48OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Comparisons of Index Organizations(1)
Degree of reference sharing important in evaluating an index organization reference is shared when two or more objects refer to the
same object
Retrieval operation nested index has the lowest cost path index has a lower cost than the multi-index nested index has better performance than the path index path index allows predicates to be solved for all the
classes along a path but not nested index
49OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Comparisons of Index Organizations(2)
Update operation the multi-index has the lowest cost for paths with a length 2 nested index has slightly lower cost than the path index
for paths with a length greater than 2 nested index has slightly lower cost than the path index if
the updates are executed on the first two classes In other cases nested index involves a significantly higher cost
50OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Indexing Techniques for Inheritance Hierarchies
Scope of a query only a given class C the class C and the inheritance hierarchy rooted in C
Solution based on conventional indices construct an index on an attribute for each of the
classes of the subgraph scan all these indices perform the union of their result
51OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Inherited Index
By Won Kim, et al in 1989 direct support for queries on an inheritance subgraph one index on the common attributes for all classes an index entry contains the identifiers of all the classes
in the hierarchy
A leaf node of an inherited index
More efficient for all queries whose access scope involves significant subset of classes in the hierarchy
recordlength
keylength
keyvalue
classdirectory
# of OIDs (OID1,...,OIDn) ...
# of classes class1 offset ... classn offset
52OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Precomputing and Caching
Index on attributes Index on methods
precomputing(caching) the results of method invocation
How to detect when the computed method results are no longer valid?
Dependency information keeps track of which objects and attributes have been
used to compute a given method when an object is modified, all the precomputed results
of the methods which have used them are invalidated
53OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Solutions for Dependency Information
A relation by Kemper et al. in 1991 record <oidi, method-name, <oid1, ..., oidn>>
oidi was used for compute the method method-name with input parameters <oid1, ..., oidn>
By Bertino, Quarati in 1992 for local methods, all the dependency information is
stored in the object itself for other methods, they are stored in the special object
• all the objects whose attributes are used in precomputation of the method have reference to the special object
54OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Identification of Attributes used in Precomputing
Static approach inspection of the method implementation determines all
attributes that can possibly be used in the execution of it system keeps the list of attributes used in the method on modification of a attribute, the system invalidates a
method only if it uses the modified attribute same method precomputed on different objects may use
different sets of attributes
Dynamic approach attributes are determined only when the method is
actually precomputed
55OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Object Identifiers
OID is used to refer object represent relations between objects
Physical OID actual address of the object
Logical OID index from which the address of the object is obtained
Influence the performance of an OODBMS
56OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Types of OID(1)
Physical address rarely used in OODBMS
Structured address (segment number, page number) (logical slot number) retrieve an object with a single page access movement of the object to another page requires two
disk reads to retrieve the object.
57OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Types of OID(2)
Surrogate OID pure logical not very efficient in object retrieval transformed into an address by a hash function GemStone, POSTGRES
Typed Surrogate OID (Type_ID, OID) similar performance to that of surrogate OID more difficult to change the type of an object ORION, ITASCA
58OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Length of the OIDs
Another factor which affects the performance 32~ 48-bit long OIDs
affect the overall size of a DB 32-bit long OIDs can have thousand million objects
64-bit long OIDs in the following situations OID must be unique for the entire life of the object surrogates generated by a monotonically increasing
function distributed environment
59OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Swizzling
Transformation of OID into the memory address on the retrieval of object from disk to the main memory
Advantage increase the speed of navigation of objects using OIDs
Disadvantages costly process not the best solution for infrequently referenced objects
60OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Alternatives to Swizzling(1)
Tables mapping OIDs to object memory addresses when objects will be swapped out with high probability when the references are not used frequently Objectivity/DB
Combination swizzling with disk imaging the main memory address is physically written over the
field of the object which contains the OID before writing the object back to disk, all the swizzled
OIDs must be transformed back into OIDs
61OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
Alternatives to Swizzling(2)
Maintenance of the OIDs in the swizzled format at the creation of the object,
• assigned to fixed address in adjacent segments of VM at the loading of the object into main memory
• map the object to the same virtual memory address
• if impossible, the object on the page is transformed to be placed in another VM address
limits the total number of objects in the database to the maximum size of the VM
ObjectStore
62OOPSLA Lab
Chapter8.Storage Management and Indexing Techniques
When to Execute Swizzling
The first time an application retrieves an object from disk
The first time a reference has to be followed Under application request, by an explicit call to
the OODBMS at run-time