e.bertino, l.matino object-oriented database systems chapter 8. storage management and indexing...

E.Bertino, L.MatinoObject-Oriented Database Systems

Chapter 8. Storage Managementand Indexing Techniques

Seoul National University

Department of Computer Engineering

OOPSLA Lab.

2OOPSLA Lab

Chapter8.Storage Management and Indexing Techniques

Table of Contents

Storage Techniques for Relational DBMS Storage Techniques for Objects Clustering Techniques Indexing Techniques for OODBMS Object Identifiers Swizzling

3OOPSLA Lab


Storage Techniques for Relational DBMS

Disk Organization Storing Records in RDBMS Addressing Records with a Slot Vector

4OOPSLA Lab


Disk Organization

Disk partitions segments pages/blocks Disk header

# of partitions the address and the size of each partition log for recovery in case of a system crash

Page addresses for each segment are stored in tables

Page = page header + offsets of objects + objects

5OOPSLA Lab


DISK

header

partition1 partitionnN logl1

l1

ln

ln

… …

segment1… … segmentm

page1 … … pagei

header

array of offset

adjacent free space

totalfree space

Z A F B

6OOPSLA Lab


Storing Records in RDBMS

Fixed length records normally stored contiguously on the disk all the records of a relation can be stored in a single file

Variable length records stored directly on the disk with an ID structure of ID is important on the retrieval speed

Structure of ID in System R high order bits for the segment and the page of the file low order bits for a record within a page

7OOPSLA Lab


Addressing Records with a Slot Vector

Advantages as fast as using the complete address of a record the length of records can be changed the records can be relocated often faster than using the purely logical ID

RECORD

SLOT

8OOPSLA Lab


Storage Techniques for Objects

Structure of Objects Access Patterns to Objects Approaches to Storage Organization for Objects Storage and Variable Length and Large Attributes Storage and Inheritance Hierarchy

9OOPSLA Lab


Structure of objects

Storage/memory organization must support objects with both atomic and complex attributes objects with multi-valued attributes objects with variant attributes objects with long field attributes such as multimedia

information, texts, images, voice, etc

Efficiency of storage organization depends on structure of objects and their relations access pattern which is the way in which the application

programs access the objects

10OOPSLA Lab


Categories of Access Patterns

Access based on the whole object for applications which execute complex manipulations of

objects by means of specialized program whole object is copied onto the application's memory direct model

Access based on the attributes of the object appropriate when large objects need to be accessed used to retrieve attributes of objects along the aggregation

hierarchy normalized model

11OOPSLA Lab


Direct Model of Storage Organization(1)

Objects are stored in the same way in which they are defined in the conceptual schema storage unit = semantic unit objects of the same class are stored in the same file

Advantages simplest and same as the one used in RDBMS transferring of a whole objects is a very efficient

Disadvantages accesses to a set of attributes of an object can be very

expensive

12OOPSLA Lab


Direct Model of Storage Organization(2)

Situations where direct model is inefficient variable length attribute new attributes the majority of attributes have the null value

13OOPSLA Lab


Normalized Model of Storage Organization

Decompose an object into atomic components Each component are stored in different files Relation between the components is maintained by

OIDs

14OOPSLA Lab


Intermediate Approach

Complex objects are decomposed Components are grouped together according to

access patterns to be stored in the same file Problem

efficiency depends on having prior knowledge of the exact access pattern for applications

15OOPSLA Lab


Variable-length and Large Attributes

Normalized method Property list method Stream (or demand-page) mechanism

portions of the object can be transferred in increments

16OOPSLA Lab


Property list(1)

Sequence of triples < identifier, size, value > identifier : which attribute of the object is stored size : # of bytes stored value : that (of varying size) of the attribute

17OOPSLA Lab


Property List(2)

Advantages variable length attributes different set of attributes sparse attributes attributes can be stored in different physical locations

Disadvantages whole property list scanning to find the desired attribute transformation of the property list to the proper format

for the application programming language

18OOPSLA Lab


Storage and Inheritance hierarchy

Attributes of the superclass should be stored Single inheritance

storing the attributes of superclass first, then those of subclass

variable length attribute alongside with the property list

Multiple inheritance property lists storing objects separately each of above contains the fields for superclass, and linked

to one another

19OOPSLA Lab


Clustering Techniques

Clustering in DBMS Clustering in RDBMS Clustering in OODBMS Static Clustering Dynamic Clustering Clustering for Multiple Relations

20OOPSLA Lab


Clustering in DBMS

Focus partitioning objects in the database placing these partitions on disk

Aim reduce the number of I/O operations on disk

Consideration structure of the objects access pattern of applications

21OOPSLA Lab


Clustering Techniques for RDBMS

Tuples of a relation in the same page segment on the basis of the value of an attribute or of a

combination of attributes in a relation

Tuples of more than one relation in the same segment one or more attributes in common with the same values efficient for processing queries with join operation

22OOPSLA Lab


Clustering Techniques for OODBMS

New considerations compared with RDBMS complex objects single or multiple inheritance methods

Linear clustering sequence for complex object all the descendant nodes of each node p in the hierarchy

are stored immediately after p in depth-first order efficient on retrieval of an object and all its descendants

23OOPSLA Lab


Basic Options for Clustering for OODBMS

Proposed by Won Kim in 1990 both clustering techniques as in RDBMS clustering all the instances of classes which belong to

an aggregation hierarchy clustering all the instances of classes which belong to

the inheritance hierarchy combination of the two previous strategies

The clustering strategies above are static

24OOPSLA Lab


Static Clustering

Unchangeable at run-time Problems

no considerations on the dynamic evolution of objects objects can be shared among several objects clustering schema based on the single access pattern

25OOPSLA Lab


Dynamic Clustering

The sequence of creation of objects would NOT be the same as the desired clustering sequence.

Reorganizing and recompacting pages in a cluster Types of file reorganization

on-line : optimal one is NP-complete problem off-line : when the reorganization will be done?

On-line reorganization technique by Chen, Hurson chunks(set of pages) as the unit of clustering cost model ratio between the read and write operations

26OOPSLA Lab


Clustering for Multiple Relations

Certain relationships can be used more frequently Direct graph

nodes for objects arcs for relationships weights for ordering relationships

Clustering algorithm with levels by Chen, Hurson arranges all the nodes of the graph in a linear sequence nodes connected by heavier arcs are nearer than others access time is around half that for objects randomly

27OOPSLA Lab


Indexing Techniques for OODBMS

Indexing Techniques for Aggregation Hierarchy Index Structures and Operations Comparison of Index Organization Indexing Techniques for Inheritance Hierarchy Precomputing and Caching

28OOPSLA Lab


Preliminary Definitions

Path a branch in an aggregation hierarchy

Path instantiation a sequence of objects obtained by instantiating the path

Nested index an index for a direct connection between the starting object

and the ending object of the path instantiation

Path index an index for storing instantiation of a path same index key as nested index

Index Key

29OOPSLA Lab


Project

Company

Division

PersonExample of Aggregation Hierarchy

30OOPSLA Lab


Definition of Path

Given an aggregation hierarchy H, a path P is defined as C1.A1.A2…..An(n 1) where C1 is a class in H

A1 is an attribute of class C1

Ai is an attribute of class Ci in H, such that Ci is the domain of the attribute Ai - 1 of class Ci - 1 (1< i n )

length(P) : the length of the path classes(P) : the set of classes along the path dom(P) : the domain of attribute An of class Cn

31OOPSLA Lab


Examples of Path

P1:Project.main_contracting_company.divisions.head.name length( P1) = 4

classes( P1) = { Project, Company, Division, Person }

dom( P1) = STRING

P2 : Person.divisions.city

length(P2) = 2

classes(P2) = { Project, Division }

dom(P2) = STRING

32OOPSLA Lab


Definition of Complete Instantiation

Complete instantiation is a sequence of objects along path

Given the path P = C1.A1.A2…..An , CI is denoted as O1.O2…..On+1 , where

O1 is an instance of class C1

Oi is the value of the attribute Ai - 1 of object Oi - 1

• Oi = Oi - 1 .Ai - I or Oi Oi - i . Ai - i (1 i n +1)

Examples of CI, where path is given as P1

Project[i].Company[k].Division[k].Person[x].Jones Project[j].Company[i].Division[h].Person[y].Smith

33OOPSLA Lab


Definition of Partial Instantiation

Partial instantiation is the part of CI, which ends at the last object of CI

Given a path P = C1.A1.A2…..An, PI is denoted as O1.O2…..Oj (j<n+1), where O1 is an instance of class Ck in Class(P) such that k+j-

1=n+1 Oi is the value of attribute Ai - 1 of an object Oi - 1

Examples of PI, where path is given as P1

Division[k].Person[x].Jones Division[h].Person[y].Smith

34OOPSLA Lab


Definition of Redundancy

Given a PI as O1.O2…..Oj, it is not redundant if there are no CI or PI as O'1.O'2…..O’k, where k>j and

Oi = O’k - j + 1 (i=1,...,j)

Examples of redundant PI Division[k].Person[x].Jones is redundant to

Project[i].Company[k].Division[k].Person[x].Jones

Division[h].Person[y].Smith is redundant to

Project[j].Company[i].Division[h].Person[y].Smith

35OOPSLA Lab


Definition of Projection of Path

Projection of Path is the part of CI or PI, which begins from the first object of it

<m>(p) denotes a projection of p with a length m P = C1.A1.A2…..An

as PI (or CI) of P, p = O1.O2.O3…..Oj (j n+1)

<m>(p) = O1.O2.…..Om (m<j)

Example <2>(Project[i].Company[k].Division[k].Person[x].Jones) ==

Project[i].Company[k]

36OOPSLA Lab


Multi-index

Index to each of the classes constituting the path Multi-index is a set of n simple indices I1, I2 ,…,In

given a path P = C1.A1.A2…..An

Ii is an index defined on Ci . Ai, 1 i n

Solving a nested predicate scans n indices first scanning the last index In on the path

the results of the scan using Ii are used as keys for Ii-1

Only for reverse traversal scanning strategies Low updating cost

37OOPSLA Lab


Examples of Multi-index

First index I1 on Project.main_contracting_company (Company[k], {Project[i]}) (Company[i], {Project[j], Project[l]})

Second index I2 on Company.divisions (Division[h], {Company[i]}) (Division[i], {Company[i]}) (Division[k], {Company[k]})

Third index I3 on Division.city (Boston, {Division[h]}) (New York, {Division[i]}) (Los Angeles, {Division[k]})

38OOPSLA Lab


Example of Using Multi-index

Select all the projects with a main contracting company which has a division in Los Angeles Scanning index I3 with the key-value = Los Angeles

{Division[k]} Scanning index I2 with the key-value = Division[k]

{Company[k]} Scanning index I1 with the key-value = Company[k]

{Project[i]} Result: {Project[i]}

39OOPSLA Lab


Join Index

To perform joins in relational model efficiently Binary join index for binary relation (r, s)

one index clustered on r the other index clustered on s

BJI can be used in a multi-index organization reverse traversal faster forward traversal in cases of high access costs to

objects since no database access for objects more suitable for complex queries

40OOPSLA Lab


Nested Index

Direct association between the ending object and the starting object in path

Given a path P = C1.A1.A2…..An, nested index on P is defined as a set of pairs (O,S) S = {O' such that there is O1.O2…..On+1 as a CI where O'

= O1 and O = On+1}

Examples (Boston, {Project[j]}) (New York, {Project[j], Project[k], Project[l]}) (Los Angeles, {Project[i]})

41OOPSLA Lab


Properties of Nested Index

Retrieval is quite fast for scanning only one index Problem on update operation

the access to several objects forward traversal to determine the value of the indexed

attribute reverse traversal to determine all instances at the

beginning of the path ==> inverse references

42OOPSLA Lab


Path Index

Given a key, all the path instantiations are stored Given a path P=C1.A1.A2…..An, a path index on P is

defined as a set of pairs (O,S) where S={<j-1>(pi),

pi = O1.O2.O…..On (1 j n+1) is a CI or non-redundant PI of P

Oj = O }

Examples (Boston, {Project[j].Company[i].Division[h]}) (New York, {Project[j].Company[i].Division[i],

Project[k].Company[m].Division[j], Project[l].Company[i].Division[i]})

43OOPSLA Lab


Properties of Path Index

For nested predicates in all classes along the path Updates of a path index

only forward traversals are required

Identical with nested index where n = 1

44OOPSLA Lab


Access Relations

Similar to path indices storing all instantiations along a path in a relation

Examples <Project[i], Company[k], Division[k], Los Angeles> <Project[j], Company[i], Division[h], Boston> <Project[j], Company[i], Division[i], New York> <Project[k], Company[m], Division[j], New York> <Project[l], Company[i], Division[h], Boston> <Project[l], Company[i], Division[i], New York>

Several subpaths to different relations

45OOPSLA Lab


Index Structures using B+tree Structure of the internal node

n records of <key-length, key, pointer> A record of a leaf node in a nested index

record-length key-length, key-value # of OIDs associated with the key list of OIDs

A record of a leaf node in a path index record-length key-length, key-value # of the path instantiations associated with the key list of path instantiations

46OOPSLA Lab


Operations with Nested Index

To solve a predicate against a nested attribute An of class C1 single index scan same cost to solve the predicate on a simple attribute of C1

For update operation one forward traversal to find the old key value another one forward traversal to find the new key value one reverse traversal to find the OID of associated object

47OOPSLA Lab


Operations with Path Index

To solve a predicate against the nested attribute An of class Ci (1 i n) one index scan determine the PI or CI associated with the key value extract the OIDs occupying the i-th position of them

For update operation one forward traversal to find the old path instantiation another one forward traversal to find the new path

instantiation

48OOPSLA Lab


Comparisons of Index Organizations(1)

Degree of reference sharing important in evaluating an index organization reference is shared when two or more objects refer to the

same object

Retrieval operation nested index has the lowest cost path index has a lower cost than the multi-index nested index has better performance than the path index path index allows predicates to be solved for all the

classes along a path but not nested index

49OOPSLA Lab


Comparisons of Index Organizations(2)

Update operation the multi-index has the lowest cost for paths with a length 2 nested index has slightly lower cost than the path index

for paths with a length greater than 2 nested index has slightly lower cost than the path index if

the updates are executed on the first two classes In other cases nested index involves a significantly higher cost

50OOPSLA Lab


Indexing Techniques for Inheritance Hierarchies

Scope of a query only a given class C the class C and the inheritance hierarchy rooted in C

Solution based on conventional indices construct an index on an attribute for each of the

classes of the subgraph scan all these indices perform the union of their result

51OOPSLA Lab


Inherited Index

By Won Kim, et al in 1989 direct support for queries on an inheritance subgraph one index on the common attributes for all classes an index entry contains the identifiers of all the classes

in the hierarchy

A leaf node of an inherited index

More efficient for all queries whose access scope involves significant subset of classes in the hierarchy

recordlength

keylength

keyvalue

classdirectory

# of OIDs (OID1,...,OIDn) ...

# of classes class1 offset ... classn offset

52OOPSLA Lab


Precomputing and Caching

Index on attributes Index on methods

precomputing(caching) the results of method invocation

How to detect when the computed method results are no longer valid?

Dependency information keeps track of which objects and attributes have been

used to compute a given method when an object is modified, all the precomputed results

of the methods which have used them are invalidated

53OOPSLA Lab


Solutions for Dependency Information

A relation by Kemper et al. in 1991 record <oidi, method-name, <oid1, ..., oidn>>

oidi was used for compute the method method-name with input parameters <oid1, ..., oidn>

By Bertino, Quarati in 1992 for local methods, all the dependency information is

stored in the object itself for other methods, they are stored in the special object

• all the objects whose attributes are used in precomputation of the method have reference to the special object

54OOPSLA Lab


Identification of Attributes used in Precomputing

Static approach inspection of the method implementation determines all

attributes that can possibly be used in the execution of it system keeps the list of attributes used in the method on modification of a attribute, the system invalidates a

method only if it uses the modified attribute same method precomputed on different objects may use

different sets of attributes

Dynamic approach attributes are determined only when the method is

actually precomputed

55OOPSLA Lab


Object Identifiers

OID is used to refer object represent relations between objects

Physical OID actual address of the object

Logical OID index from which the address of the object is obtained

Influence the performance of an OODBMS

56OOPSLA Lab


Types of OID(1)

Physical address rarely used in OODBMS

Structured address (segment number, page number) (logical slot number) retrieve an object with a single page access movement of the object to another page requires two

disk reads to retrieve the object.

57OOPSLA Lab


Types of OID(2)

Surrogate OID pure logical not very efficient in object retrieval transformed into an address by a hash function GemStone, POSTGRES

Typed Surrogate OID (Type_ID, OID) similar performance to that of surrogate OID more difficult to change the type of an object ORION, ITASCA

58OOPSLA Lab


Length of the OIDs

Another factor which affects the performance 32~ 48-bit long OIDs

affect the overall size of a DB 32-bit long OIDs can have thousand million objects

64-bit long OIDs in the following situations OID must be unique for the entire life of the object surrogates generated by a monotonically increasing

function distributed environment

59OOPSLA Lab


Swizzling

Transformation of OID into the memory address on the retrieval of object from disk to the main memory

Advantage increase the speed of navigation of objects using OIDs

Disadvantages costly process not the best solution for infrequently referenced objects

60OOPSLA Lab


Alternatives to Swizzling(1)

Tables mapping OIDs to object memory addresses when objects will be swapped out with high probability when the references are not used frequently Objectivity/DB

Combination swizzling with disk imaging the main memory address is physically written over the

field of the object which contains the OID before writing the object back to disk, all the swizzled

OIDs must be transformed back into OIDs

61OOPSLA Lab


Alternatives to Swizzling(2)

Maintenance of the OIDs in the swizzled format at the creation of the object,

• assigned to fixed address in adjacent segments of VM at the loading of the object into main memory

• map the object to the same virtual memory address

• if impossible, the object on the page is transformed to be placed in another VM address

limits the total number of objects in the database to the maximum size of the VM

ObjectStore

62OOPSLA Lab


When to Execute Swizzling

The first time an application retrieves an object from disk

The first time a reference has to be followed Under application request, by an explicit call to

the OODBMS at run-time

e.bertino, l.matino object-oriented database systems chapter 8. storage management and indexing...

Documents