implementation and evaluation of dynamic spatial query...
TRANSCRIPT
Implementation and Evaluation of
Dynamic Spatial Query Language
by
Ju Wu
B.Sc. Shanghai Jiao Tong University, 1988
A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF
THE REQUIREMENTS FOR THE DEGREE OF
Master of Science
in the School
of
Computing Science
0 Ju Wu 1992
SIMON FRASER UNIVERSITY
September 1992
All rights reserved. This thesis may not be
reproduced in whole or in part. by photocopy
or other means. without the permission of the author
APPROVAL
Name:
Degree:
Title of Thesis:
Master of Applied Science
Implementation and Evaluation of Dynamic Spatial Query Language
Examining Committee: Dr. Tiko Kameda , Chairman
Date Approved:
I$ M - ~ h h h Luk, Senior Supervisor
Dr. Jiawei Ifan, Supervisor
~r.&#ian Li, External gxaminer
&~~~ 10, lq??
PART I AL COPYR l GHT L i CENSE
I hereby g ran t t o S l m n Fraser U n l v e r s l t y the r i g h t t o lend
my thes i s , proJect o r extended essay ( t h e t i t l e o f which i s shown below)
t o users o f the Simon Fraser U n i v e r s i t y L ib rary , and t o make p a r t i a l o r
s i n g l e copies o n l y f o r such users o r I n response t o a request from the
I i b r a r y o f any o t h e r un i v e r s l t y , o r o the r educational i n s t i t u t ion, on
i t s own behalf o r f o r one o f i t s users. I f u r t h e r agree t h a t permission
f o r m u l t i p l e copying o f t h i s work f o r scho la r l y purposes may be granted
by me o r t h e Dean o f Graduate Studies, It i s understood t h a t copying
o r publication o f t h i s work f o r f i n a n c i a l ga in s h a l l not be al lowed
w i thou t my w r i t t e n permission.
T i t l e o f Thesls/Project/Extended Essay
lm~ lemen ta t i on and Evaluat ion o f Dynamic Apa t ia l Query Language.
Author:
( s igna tu re )
Ju Wu
( name
September 14, 1992
(da te )
ABSTRACT
There have been many problems concerning generic languages for object-oriented
database systems. However, unlike relational systems, in which SQL has become a
standard high level query language with an user-friendly, English-like interface, current
Object-Oriented Database (OODB) systems cannot provide a generic query facility.
Therefore, we propose to construct an application specific query language (ASQL), which
is incorporated with semantics of specific applications or features common to several
classes of applications.
A dynamic spatial query language (DSQL) has been implemented in this thesis, which
is based on a bi-level data model and OODB system. This implementation shows that
how to develop a query processor and optimizer for an ASQL application with impressive
performance. Ad hoc queries are processed using compilation strategy and customized
query optimization techniques. The DSQL system has an X-Window interface. Version
management is implemented to provide a better multi-user environment.
The advantages of OODB systems over the relational systems are also investigated in
this thesis. Experiments are performed on two systems, one using Objectstore (an OODB
system) and the other using Sybase (a relational system). Three different approaches are
tested on the platform of Sybase, they are the Pure Relational (Normalized), the Less
Normalized and the Pre-Loading method. The experimental results demonstrate that the
bi-level data model coupled with OODB systems is well suited in ASQL development
and the impedance mismatch is the bottleneck for the performance of relational systems.
Moreover, the Pre-Loading method provides a reasonable solution to the development of
ASQL to overcome the performance problem brought about by impedance mismatch, but
iii
there are still problems remaining after adding the Pre-Loading method (e.g. there are
no considerations for data recovery and concurrency control).
ACKNOWLEDGMENTS
I am most grateful to my supervisor Dr. Wo-Shun Luk for his thoughtful guidance,
invaluable encouragement and great patience in this thesis research. With his help, I
have really enjoyed in the thesis work, and have greatly learned from this. Many thanks
to Dr. Jiawei Han and Dr. Ze-Nian Li for their great help from the beginning of my
master program. I would like to thank Dr. Tiko Kameda for serving as the Chairman
of my thesis committee.
I am also very grateful to Kathy Peters and Craig Jones for their kindness of reviewing
my thesis.
I would thank my dad, mom and younger sister for their cares and encouragements.
Finally, I would thank my dear Hai-Ping for her greatest love.
Contents
APPROVAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
g 1.1 Application-Specific Query Language . . . . . . . . . . . . . 1
g 1.2
g 1.2.1
g 1.2.2
•˜ 1.2.3
g 1.2.4
5 1.3
5 1.4
Chapter 2
g 2.1
g 2.1.1
g 2.1.2
Dynamic Spatial Query Language . . . . . . . . . . . . . . . 3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Work 3
DSQL and Bi-Level Data Model . . . . . . . . . . . . . . . . 4
. . . . . . . . . . . . . . . . . . Relational vs Object-Oriented 5
Query Processing . . . . . . . . . . . . . . . . . . . . . . . . . 10
Thesis Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 11
Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . 12
. . . . . . . . . . . . . . . . Bi-Level Data Model and DSQL 14
Bi-level Data Model . . . . . . . . . . . . . . . . . . . . . . . . 14
Block Object Data Model . . . . . . . . . . . . . . . . . . . . . 17
. . . . . . . . . . . . . . . . . . Geometric Object Data Model 18
g 2.2
g 2.2.1
g 2.2.2
5 2.2.3
g 2.2.4
Chapter 3
$ 3.1
•˜ 3.2
g 3.2.1
5 3.2.2
$ 3.2.3
g 3.3
g 3.4
g 3.4.1
3.4.2
Chapter 4
g 4.1
•˜ 4.2
g 4.3
4.3.1
5 4.3.2
g 4.3.3
DSQL and the Basic Spatial Functions . . . . . . . . . . . . 19
Syntax of DSQL . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Mapping Block Object Level to Geometric Object Level . 25
Experimental Environment and Graphical User lnterface . 28
Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Objectstore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Persistent C++ Programming Language . . . . . . . . . . . 29
Memory Mapped 110 . . . . . . . . . . . . . . . . . . . . . . . 29
Set-Oriented Query Language . . . . . . . . . . . . . . . . . 31
Sybase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Graphical User Interface . . . . . . . . . . . . . . . . . . . . . 32
X Window Interface . . . . . . . . . . . . . . . . . . . . . . . . 32
Graphical Display . . . . . . . . . . . . . . . . . . . . . . . . . 35
Implementation of DSQL . . . . . . . . . . . . . . . . . . . . . 40
Outline of Query Processing . . . . . . . . . . . . . . . . . . . 40
. . . . . . . . . . . . . . . . . . . . . . Query Processing Plan 42
Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Applying Non-spatial Constraints to Spatial Predicates . . 47
Applying Constraint When OR Operations Are Present . . 50
Propagation of Constraints Through Spatial Predicates . . 53
vii
g 4.4
Chapter 5
•˜ 5.1
g 5.2
•˜ 5.2.1
g 5.2.2
g 5.3
5.3.1
•˜ 5.3.2
g 5.4
$ 5.4.1
$ 5.4.2
g 5.5
g 5.5.1
5 5.5.2
g 5.5.3
g 5.5.4
•˜ 5.6
•˜ 5.6.1
5 5.6.2
•˜ 5.6.3
5 5.6.4
Version Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
. . . . . . . . . . . . . . . . . . . . Performance Comparison 64
Implementation Effort in DSQL for Relational DBMS . . . 64
Design of Relational Schema . . . . . . . . . . . . . . . . . . 66
. . . . . . . . . . . . . . . . . Pure Relational (PR) Approach 66
Less Normalized (LN) Approach . . . . . . . . . . . . . . . . 67
DSQL Query Processing . . . . . . . . . . . . . . . . . . . . . 68
PR (Normalized) Approach . . . . . . . . . . . . . . . . . . . 69
LN (Less Normalized) Approach . . . . . . . . . . . . . . . . 71
Performance Comparison . . . . . . . . . . . . . . . . . . . . 72
Spatial Query Comparison . . . . . . . . . . . . . . . . . . . . 74
Non-Spatial Query Comparison . . . . . . . . . . . . . . . . . 76
Discussion on Performance Comparison . . . . . . . . . . . 76
Relational Key vs . Object Identifier . . . . . . . . . . . . . . 77
Client Caching and Sewer Caching . . . . . . . . . . . . . . 78
. . . . . . . . . . . . . . . . . . . . . . . . . . . Data Clustering 80
Complexity of Query . . . . . . . . . . . . . . . . . . . . . . . . 83
Relational Approach with PreLoading . . . . . . . . . . . . . 84
. . . . . . . . . . . . . . . . . . . . . . . . . Query Processing 85
. . . . . . . . . . . . . . . . . . . . Performance Comparison 89
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discussion 90
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary 92
... V l l l
Chapter 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Appendix A Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix B Glossary 106
List of Figures
Figure 2.1
Figure 2.2
Figure 2.3
Figure 3.1
Figure 4.1
Figure 4.2
Figure 4.3
Figure 4.4
Figure 4.5
Figure 4.6
Figure 4.7
Figure 4.8
Figure 4.9
Figure 5.1
Figure 5.2
Figure 5.3
Figure 5.4
Architecture for Implementation of the DSQL . . . . . . . . 15
Block Object Data Model Hierarchy . . . . . . . . . . . . . . 17
Geometric Object Data Model Hierarchy . . . . . . . . . . . 18
Block World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Flow Chart of ad hoc Query Processing . . . . . . . . . . . 42
Query Processing Plan Algorithm . . . . . . . . . . . . . . . 43
Parse Tree with Proper Symbol Names . . . . . . . . . . . . 44
Parse Tree Before Applying Constraints for 01 . . . . . . . 48
Optimal Parse Tree After Applying Constraints to Spatial
Predicate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Parse Tree with OR Operation Obstacle Applying
Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Optimal Parse Tree After Applying Constraints Through
. . . . . . . . . . . . . . . . . . . . . . . . . . . . OR Operation 52
Parse Tree with More Than One Spatial Predicate . . . . . 56
. . . . . . . . . . . Checkout Branch and Merge of Versions 60
. . . . . . . Schema Definition for Pure Relational Concept 66
Schema Definition for Less Normalized Relational
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Approach 68
. . . . . . . . . . . . . . . . . . Composite Object Clustering 81
Relational Key Links to Object Pointer in Virtual Memory
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (read) 86
Figure 5.5 Indirect Mapping Based vs. Memory Mapped . . . . . . . .91
Figure 5.6 Performance Comparison of 4 Approaches in DSQL . . .93
List of Tables
Table 3.1
Table 3.2
Table 4.3
Table 5.4
Table 5.5
Table 5.6
Table 5.7
Table 5.8
Table 5.9
Table 5.10
Table 5.1 1
Table 5.12
Table 5.13
Display vs. Retrieval (On Sud480 with 64M Memory) . .38
Display vs. Retrieval (On SunISparc 1 with 8M Memory) .38
Comparison of 3 Optimization Strategies in Query
Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 8
Non-spatial Predicate Comparison Based on 500
Cuboids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .73
Spatial Predicate Comparison Based on 500 Cuboids . .73
Warm & Cold Comparison (Select With 1 Spatial
Predicate on Cuboids) . . . . . . . . . . . . . . . . . . . . . .74
Warm & Cold Comparison (Select with 1 Spatial Predicate
on Assorted Blocks) . . . . . . . . . . . . . . . . . . . . . . . . 75
Performance Comparison with Sybase Statistics I10
Turned On . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75
Warm & Cold Comparison (Select with 1 Non-spatial
Predicate on Cuboid) . . . . . . . . . . . . . . . . . . . . . . .76
Crossing Boundary in Relational Database System . . . .79
Performance Comparison During Initialization . . . . . . . .89
Non-spatial Predicate Comparison Based on 500
Cuboids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .89
Spatial Predicate Comparison Based on 500 Cuboids . .90
xii
Chapter 1 Introduction
1.1 Application-Specific Query Language
A database query language is a non-procedural language designed for the end user
(i.e. casual, non-professional computer user) to manipulate data without resorting to a
procedural language. SQL is a standard query language for relational databases, which
has a user-friendly, English-like appearance. Nonetheless, most end users prefer to work
with programs written by application programmers that run on top of the Relational
Database Management Systems (RDBMS), largely because SQL does not operate fully
at a level of data abstraction preferred by the end user.
The generic query language for Object-Oriented Database Management Systems
(OODBMS) (i.e. not designed for any specific classes of application) is still in the infant
stage. In a sense, the Object-Oriented (00) query languages are extensible, since new
operations may be introduced and encapsulated in messages (or functions), assuming here
the message is a query primitive. While this scheme might work for simple associative
retrieval, it runs into difficulties for complex computations involving objects with different
types. Messages must be carefully designed to encapsulate these operations. But there
is a more fundamental problem for the current approach of 00 query language. Like
SQL, 00 queries involving set operations are called object algebraic operations. But
in typical 00 applications, set operations using generic procedures are often inefficient
and at times might not even be possible. Take, for example, a Geographical Information
System (GIs) as the application domain. Consider the computation task of locating a site
which is 5 miles away from object A and 3 miles away from object B. The set-theoretic
approach will not work, because it involves intersection of two infinite sets of points (i.e.
circles). In addition, the computation must be carried out in consideration of the data
representation of points and circles as defined by the application program.
Based on our experiences, it is difficult for OODBMS' to provide a generic query
facility that caters to the needs of the end user, while, at the same time being well
integrated into the target programming system. For a relational SQL, the same problem
still restricts the end users from adding user defined predicates to SQL for complicated
queries. We believe that the truly user-friendly query language should be built on top of
generic query facility offered by OODBMS', and implemented by what the OODBMS
considers as an application prograrn(s). Just as the view mechanism of a RDBMS, it
provides an additional level of abstraction. But there are many major differences between
a customized query facility and the view mechanism. First of all, the customized query
language can incorporate semantic features of a specific application or those features
common to several classes of applications, in a way that the generic query language of an
OODBMS can not. In fact, this query language should be associated with an application-
oriented data model, where data may be manipulated by this language. Thus, there maybe
many different query languages customized for different application domains. We call
these types of query languages, Application Specific Query Languages. Similar to the
Application Layer in the ISOIOSI Reference Model for a computer network architecture,
an ASQL provides a level of application-oriented abstraction. Further, the customized
software could provide mapping of every possible state of data defined on the application
data model into some state of objects defined on the OODBMS', thus, the end user could
update the associated data model with the query language. In addition, query optimization
is facilitated by the semantic knowledge embedded in the query processor. Despite all
these advantages of ASQL, we argue that is not necessarily difficult to implement, and
the object-oriented concept embodied in OODBMS greatly eases its development.
The goal of this thesis is to design and implement an ASQL, called Dynamic Spatial
Query Language, to convince people of the importance and feasibility of developing an
ASQL.
1.2 Dynamic Spatial Query Language
1.2.1 Related Work
Many researches have done work to develop various types of an ASQL (e.g. spatial
SQL and temporal SQL.)
Most ASQL's that have been developed are on top of relational database systems (e.g.
TQuel is a temporal ASQL developed on Ingres, by extending the traditional relational
query language with some temporal features (or predicates)). [14]
Most of the spatial query languages are using a flat table format without much
consideration of the hierarchical features of spatial data. An improvement on this was
made in the SAND[l] spatial database system, as it stores the spatial attribute in the
table along with the non-spatial ones, by providing a link from the spatial attribute to the
spatial data structure. This still does not use all of the hierarchical structures that can
occur in the applications that we are considering.
Due to the hierarchical structure of the spatial data model, object-oriented database
systems match well with this data model and we may separate the spatial data from non-
spatial ones at a different level. The major differences of our DSQL from other ASQL's
are: Bi-Level Data Model and Object-Oriented Database Platform.
1.2.2 DSQL and Bi-Level Data Model
We present here a customized query language, called DSQL, for manipulating
geometric objects in a three dimensional space. Using DSQL, the end user can retrieve,
update and delete spatial and non-spatial information about objects without any knowledge
of the exact spatial location of the objects. Our application data model domain is a
simplified database, called Block World, which consist of a collection of blocks (e.g.
spheres and cylinders) and cuboids in a 3-D space (e.g. room). Various types of
operations could be applied to the blocks in Block World (e.g. move and rotate). DSQL
can be considered as a high-level interface, whereby, the user can initiate the movement
of the blocks and/or interrogate the spatial and non-spatial characteristics of the blocks[9].
DSQL is an English-like, application-specific query language, associated with spatial
and non-spatial predicates. The spatial predicates (e.g. on-top-of) define a topological
relationship of blocks and the non-spatial predicates (e.g. color) define the characteristics
of the block itself.
The data model of DSQL is a bi-level model, block object data model and geometric
object data model, to make a distinction between user defined objects and the primitive
objects of GIs. Using OODBMS' with some abstraction through its class hierarchy
construct, we are able to provide a more elaborate mapping between objects on the
two levels. Moreover, the end user may also prefer to leave out low-level, non-
essential information about the block objects. Finally, a clearly defined data model at the
OODBMS level (i.e. geometric level) will facilitate the task of maintaining, modifying
and/or extending the DSQL.
Once we build up the generic GIs query language layer, GIs application programmers
will be relieved of the heavy load of manipulating geographical objects. This would allow
the query language to be used either interactively or embedded into host language to set
up a more specific user interface, providing a similar functionality as the relational SQL.
We are trying to provide a development interface for the system programer (i.e. the
user often provides the ASQL syntax and application DB schema, we will automatically
generate the parsing and evaluating program to do the query processing).
1.2.3 Relational vs. Object-Oriented
Since the DSQL system is a typical engineering application, OODBMS' should be
better suited than relational ones, in theory, but we really want to make a comparison
between object-oriented and relational systems on this specific application to show OODB
systems are better suited.
Object-Oriented Database System Most object-oriented database systems are an ex-
tension of existing programming languages, such as C++, to provide persistent, concur-
rency control and other database facilities [2]. They are often called database program-
ming languages. The common features of these languages are: applications are written
in an extended form of an existing programming language, and the language and its
implementation (compiler, preprocessor and execution environment) have been extended
to incorporate database functionality. These systems typically provide a query language,
but both the query language and the programming language execute in the application
program environment, sharing the type system and the data workspace [2]. This archi-
tecture provides a single unique environment for persistent database objects to appear
semi-transparently as transient programming objects.
Object-Oriented vs. Relational Database management systems (DBMSs) based on the
relational data model have dominated the market through the 1980s, which is mostly
because of its simplicity and strong mathematical foundation of the relational model.
However, some of today's data modeling requirements may fail to fit well into the
relational framework [3]. Examples are the databases for Computer Aided Design (CAD),
Geographical Information System (GIs), Office Information System (01s) and Artificial
Intelligence (AI), in which the representation of complex objects, as well as sophisticated
operations is required. Using a relational DBMS in such an environment usually leads
to awkward data decomposition and hence to poor performance (shown in Chapter 7).
The object-oriented data model allows a natural and direct mapping between conceptual
objects and the underlying data model resulting in better performance.
Relational database systems are famous for their simplicity, which makes them
attractive because they are easy for end users to understand and utilize. In addition,
the model results in mathematically simple structures, and this makes possible a concise
and powerful query language that can perform most of the data retrieval and manipulation
that typical business applications require. But just because of its simplicity, the tables and
the query language, it is far beyond the needs to provide even some basic functionality on
complex data models, and thus requires an additional layer to build on top. In this case,
binding from SQL variables to host language variables will be established explicitly,
resulting in an extra cost. The problems with using two language environments have
been widely cited in the database literatures as motivation for database programming
languages. The problems have collectively been called the impedance mismatch between
the application programming language and database query language [lo].
Object-objected database systems are best suited in engineer applications (complex
data model), compared to any other systems. As Cheng and Hurson point out in their
paper, object identifiers (an explicit representation of semantic links among objects) and
complex objects (aggregation of heterogeneous objects) are two important features of the
object-oriented database system [3]. For example in DSQL, a cuboid is a block object
which consists of different kinds of sub-objects, such as center, normal and top face. This
allows navigation through semantic links and retrieval of complex objects. Naturally, the
most important task for an object-oriented system is to traverse the corresponding graph
structure efficiently. Object clustering based on their semantic links becomes a powerful
tool to improve the performance. So it is quite natural that object-oriented database
systems could provide much better performance in the hierarchical structure than in the
relational one, but the object-oriented query language is still in its infant stage. Even
still, it is certain that object-oriented system query languages will closely integrate with
the host language, hence provide a seamless interface.
Performance Comparisons The promise of object-oriented databases has been that
they could potentially provide a much better performance than traditional RDBMS'.
As pointed out in Dual and Damon's paper, this is difficult to test because of a lack
of benchmarks that are capable of comparing between different types of databases [6].
Many benchmarks, like the Sun benchmark[6], did not involve data clustering, which is
the most important factor in performance, and each individual benchmark strictly called
for objects to be randomly selected and operated upon. Objects could only be clustered
with objects of a similar type, which is seldom true in most engineering applications, and
could be kept in the cache over the entire benchmark. In many engineering applications
closely related objects are accessed successively, with greater frequency and to a much
higher degree than are random, disjoint objects [6]. Because of this general usage pattern,
semantically linked objects are often physically clustered together in the database. This
semantic clustering normally results in much higher performance, since many of the
related objects are read ahead and reside in memory improving the overall access times
for related objects.
Performance is a major issue in the acceptance of object-oriented database systems
aimed at engineering applications and the most accurate measure of performance would be
to run an actual application, representing data in the manner best suited to each potential
DBMS. The performance running a DSQL query is obviously a best choice to tell the
efficiency of different platforms. Accordingly, we will obtain the performance benchmark
on a complicated object-oriented system, which will definitely be more meaningful than
those either targeted at relational database usages (Wisconsin benchmark [5 ] ) or based
on simple object systems (Sun benchmark [6]) .
An object-oriented database system is the best suited for GIs applications, therefore
we should build DSQL on top of an OODBMS (Objectstore) which will then provide
maximum functionality and performance. But, since not much effort has been made in the
performance comparison between OODBMS' and RDBMS' based on a real application
system, we are most interested in the result and how to improve the performance on a
relational database system to catch up with an object-oriented one.
DSQL queries with a different degree of complexity, including spatial and non-spatial
predicates, will be posed to each approach. And fairness to each approach is another
major consideration to convince people of the meaningful comparisons.
Improvement of Relational Approaches There are many reasons why object-oriented
systems may have much better performance over relational ones in engineering appli-
cations, some of the reasons are composite object and data clustering, object identifier
and a closely integrated interface from host language environment to persistent data. We
will see from our performance comparison the degree to which we could improve the
relational performance by applying these features and what will be the limitations
Composite object and data clustering is hampered by the relational normalization
rules. In a relational system, due to the purpose of normalization, a composite object
should be decomposed into several tables. But in our application environment, composite
objects are handled as a whole, so join operations will be needed to link all the information
of one composite object from different tables. Joins obviously hamper performance.
Another issue is data clustering. Normally in relational systems, the data are clustered
by tables (i.e. to retrieve one composite object, we may have to access several tables,
and thus more than one disk 110). To solve these problems, the pre-join may be a better
choice. Instead of decomposing the data into several tables, we put all the information
into one (or two) large table(s). With this approach, we do not have to do join in retrieving
records and therefore less disk 110 is possible.
The impedance mismatch is the biggest problem in relational systems. The relational,
persistent, data management language SQL is completely different from the host language,
translation will be needed to convert the programming transient object into a persistent
data representation. Upon receiving the SQL result, an extra copy of data will be required
(i.e. from server buffer to programming variables). Thus two copies of data will be
the worse case (i.e. from server disk to server buffer and from server buffer to client
variables). While, in object-oriented systems, only one copy is needed, from disk to
programming variables, due to no impedance mismatch. We may want to get around
this problem by preloading the object into the client machine once, and manage it by
ourselves in the following access. But to manage the object in the client machine, we
will have to borrow the idea from object identifiers to access the object instead of by the
relational key. A pointer conversion should be provided.
In this thesis, we not only test the performance of a pure relational approach
(normalized), but also apply these two improvements on relational approach to achieve
better performance.
1.2.4 Query Processing
The feasibility of ASQL depends heavily on the efficiency of query processing.
Two kinds of processing strategies are commonly used, interpretation and compilation.
Normally for on-line query processing (interactive mode), the interpretation strategy will
be applied, while for embedded programming, precompilation of embedded queries is
widely used.
We choose the compilation strategy for query processing, simply because we will
later extend DSQL for embedded usage and would not want to see any inefficiency in
query processing. Actually, for an embedded query, compilation strategy could gain more
efficiency, because the processing plan has already generated during the precompiling.
The query preprocessing should only take a very small part of the whole query
processing time, otherwise this application specific query language is totally unacceptable.
People would rather write the program to fulfill certain tasks efficiently than waiting for a
more flexible but time consuming ASQL to be processed. To get around this problem in
our ASQL, we will only generate a very small query processing plan which contains the
function calls to the function implementer and then use the dynamic linking and loading
method to execute these functions to process the query.
Another crucial part in query processing is optimization, a query optimizer, with a
carefully defined rule for optimization, is applied to generate an optimal processing plan.
Usually, a generic query optimizer may not be very desirable, since the optimization
strategies are also application specific. Our goal is to let the user tell the system some
basic optimization rules, and the system could then perform the proper optimization
based on these.
Another important issue in query processing is the display time as the graphical
display becomes more popular in almost all the areas. The graphical display of spatial
objects will be investigated, as the total query processing time is no longer the only
database access time, display time has been proved to take up more time than database
retrieval in some cases. We will also involve investigation of graphical display of object
in this thesis.
1.3 Thesis Objectives
This thesis extends the design of the DSQL with implementation in [4] and provides
a meaningful performance comparison between the object-oriented and the relational
DBMS from a specific application point of view.
The things we want to show in this thesis are:
I.
II.
Propose the methods to develop an ASQL query processor and customized
query optimizer and show how much benefits we can achieved from an ASQL
development.
Make a performance comparison between object-oriented platform (Objectstore)
and a relational one (Sybase), and try to find out the suitable platform for an
ASQL development.
People may worry about the cost for developing an ASQL. With our implementation
efforts in DSQL, we could show people how to develop a query processor and optimizer,
and to justify the ASQL as being easy to implement with impressive performance.
The bi-level data modeling (i.e. geographic and geometric objects) provide a concrete
base for all detailed design and implementation, and seems to be a standard data model for
the generic GIs query. The efficiency and user-friendly features of the DSQL depends
heavily on the success of this data model.
In order to obtain a meaningful benchmark, we transfer our low level database
accessing part from ObjectStore to Sybase, and obtain the comparison results from two
different approaches (Normalized and Less Normalized). Importance of Data Clustering
and Memory Mapped architecture has been carefully studied, and related performance
tests have been done. Based on our experiences, we could show people how much
effort should be made to use relational DBMS for object-oriented usage, and why object-
oriented DBMS' are more desirable in engineering applications.
1.4 Thesis Organization
Chapter 2 introduces the bi-level data model for Block World, and the database
schemata of block objects and geometric objects. Finally, we present the design of the
dynamic spatial query language, based on the bi-level data model.
Since the underlying database is ObjectStore, we will describe in Chapter 3 some of
the related features of our DSQL system development and the graphical user interface
of DSQL.
Implementation issues (i.e. query processing and optimization) are discussed in
Chapter 4. We show how to apply a compilation strategy to query processing and
discuss how to generate and evaluate a processing plan. To achieve better performance,
we emphasize the optimization strategies to obtain an optimal spatial query design. We
also mentioned in this chapter the version control of DSQL.
Object-oriented DBMS' have always claimed to have better performance than rela-
tional ones in engineering applications. GIs application benchmarks are given in Chapter
5. Different relational models are presented for comparison, each with the related per-
formance results. We will discuss two important factors in achieving better performance
from object-oriented DBMS' - Object Identifier and Data Clustering. To improve object
access efficiency through OID, the memory-mapped architecture will also be discussed
in details.
Chapter 6 is the conclusion and future work.
Chapter 2 Bi-Level Data Model and DSQL
In this chapter, we describe the design of an application oriented query language,
called Dynamic Spatial Query Language (DSQL) [9], used for three dimensional (3-D)
spatial databases, which keep spatial and non-spatial information about dynamic objects
in 3-D space. Applications in GIs, robotics and mechanical CAD are examples which
may benefit from this query facility. We create a simplified database, called Block World
which consists of a collection of blocks (e.g. cuboids, cylinders and pyramids) in a 3-D
space (e.g. a room). In the Block World, objects may be subjected to various types of
motion (e.g. move and rotate). The purpose of DSQL is to provide a high-level interface
whereby the user can initiate movement of blocks andlor interrogate the spatial and non-
spatial characteristics of the blocks in the Block World. The chapter mainly describes
the research work in the paper [9].
2.1 Bi-level Data Model
Since the bi-level data model is our foundation for the DSQL design, we will have
a close look at this data model before going into the spatial query language design.
Consider the diagram shown in Figure 2.1. At the top is the end user who can
pose queries using DSQL. The B-Kernel is a software layer consisting of a transient
object manager and a query processor. The transient object manager1 is in charge of
unique identifier generation and management of all block objects. The query processor
generates the query processing plans for user queries and does all the essential mapping
In Objectstore, we wuld use the. facilities of persistent set and unique object pointer to fulfil the task, in Sybase we should
build the manager by ourselves.
of functions in a query that involve spatial relationships and/or computations of block
objects to the geometric object data model. Finally, at the lowest layer is the DBMS
(relational or object-oriented) which includes an additional interface module called the
function implementer. The function implementer assigns the optimal procedures for
carrying out each function in the query processing plan generated by the query processor
of the B-Kernel and executes it. It computes new information and interfaces with the
DBMS for the invocation of procedures and the actual retrieval and storage of objects [9].
End User
Transient 1 I Object I Manager
Query
Processor
Function Implementor
I DBMS I
Object
Model
Geometric J "%? ') Model u
Figure 2.1 Architecture for Implementation of the DSQL
15
The block object data model consists primarily of blocks, a set of semantic spatial
functions and some type hierarchies, through which the topological relationship of block
objects are defined or derived [9]. DSQL is the language for manipulation of these
block objects. The geometric object data model, on the other hand, contains geometric
objects which are the actual spatial representation of the blocks and a set of geometric
functions that do the retrieval, updating and computing for geometric objects. The
DBMS in DSQL could be an object-oriented or relational DBMS. With an OODBMS
(i.e. Objectstore), facilities of its data manipulation could be used directly to construct
the geometric functions, while with a RDBMS (i.e. Sybase) an additional layer for
converting class hierarchies to relational flat tables is needed.
2.1.1 Block Object Data Model
composite object
Block superclass
subclass
ndari ..... Objects dependent objects
( face )
Figure 2.2 Block Object Data Model Hierarchy
The Block level data model, which is defined also in boundary representation, is
considered to be the end user's interface to the whole system. The geometric level's
information (spatial information) is transparent to the end users, except for the colour
attribute of faces (non-spatial information). Using the object-oriented model, the spatial
functions, such as On-Top-Of, could be more abstract and independent of block type.
There are virtual functions in class Block (e.g. Volume and Translate) which means
that there will be a run time binding for the actual implementation of these functions
according to the particular block type; this is the so-called "polymorphism".
The detail of the data structure Block Object are shown in Appendix A.
2.1.2 Geometric Object Data Model
oeo-Ob j superclase
( curve ) dependent objects
Figure 2.3 Geometric Object Data Model Hierarchy
In geometric level object design, one curve is represented by the normal and the
length (e.g. edge or circle) not by two end points. The reason why we decided to use this
representation (plus the inherited center) is that we are trying to make a level abstraction
in each level of boundary (e.g. a cuboid consists of 6 rectangle faces, each rectangle face
consists of 4 edges.) If an edge is represented by two end points, to avoid redundancy,
there should be another two edges sharing the end points with this one. So when we
translatdrotate a rectangle plane, not all of the boundary edges can be translatedhotated
accordingly, because there are shared resources (i.e. no distinction between the face and
its boundary). While in our schema, to translatelrotate a rectangular plane we only need
to translatelrotate the centre and all its boundaries, due to the fact that there is no point
where all the boundaries intersect. But this is still a trade-off, we sacrifice the efficiency
for elegance (e.g. we need an extra 4 vectors space to store a cuboid).
2.2 Design of the DSQL
2.2.1 DSQL and the Basic Spatial Functions
DSQL is the main user interface of our object-oriented data model used for retrieval,
manipulation and computation which could answer spatial and non-spatial queries of a
geographic nature. It is used in posing ad hoc queries for different what-if scenarios which
are crucial in GIs applications wherein the user needs to specify different conditions that
have to be met for a geographic objects to selected. For the design of the Block World
query language, we will use a sort of relational SQL syntax, e.g. insert, delete and select,
except that we change the update to be several commands with spatial features (e.g. move
and rotate). Further we provide several spatial and non-spatial predicates:
I. Spatial Predicates: On-Top-Of, On-Bottom-Of, On-Left-Of, On-Right-Of,
On-FrontOf and On-Back-Of.
II. Non-Spatial Predicates: topcolor, bottomcolor, frontcolor, backcolor, leftcolor,
rightcolor, type and name.
With the above commands and functions (predicates), an application oriented (Block
World) query language will come to life.
In the literature, manipulation of an object is often considered as an operation applied
to the objects. We maintain that even though the operation is applied to an individual
object, the user should be allowed to select the object for operation, not only by its object
ID, but also by some spatial or non-spatial criteria. For example, the user should be able
to specify "the block next to that big green sphere" for motion. This representation is of
course a database viewpoint. Indeed, a traditional database query language (e.g. SQL)
tends to treat the database updating functions in the same manner as retrieval functions.
Thus, we include these functions at the query language level, and the details involved
in updating the spatial location of object are transparent to the user. The issue is how
to design an interface for the user which permits unambiguous mapping into updating
functions at the geometric level. We do so with the concept of a virtual object.
Mathematically, there are two kinds of object motions in the space: translation and
rotation. Loosely speaking, translation of an object results in movement of the object
without modifying the object's orientation. In this thesis, we use a more common term
for this translation (i.e. Move). A complete specification of the motion Move requires
the knowledge of the spatial coordinate to which the centre of the object in question is
moved. This coordinate information can be derived in one of three ways:
I. The value of the coordinate supplied by the user.
II. The position relative to the current position of the object (e.g. move the object
3 metres to the left).
III. A position relative to another object(s) (e.g. the top of a cuboid with a yellow
top).
The first option is ruled out since the user is assumed not to have any knowledge
of any exact location in the 3D space. To describe precisely the new location (andlor
orientation) where the object is to be moved to, we create a virtual object, which is
exactly the same as the object, except that it is assumed to be situated in the desired
new location. Thus, this new location can be specified via a topological relationship
between the virtual object and another object in the Block World, which may even be
the object to be moved.
Rotate, as an object motion, is treated in a similar way to Move. Since any rotation
of an object in space can be considered to be spatially equivalent to one combination
of Move and Rotate about a line through its centre, we need only consider rotations of
a block about the x, y and z-axes through the centre of the block. This means that the
centre of the block will not be moved during the rotation.
2.2.2 Syntax of DSQL
The following is the syntax of our DSQL query language Data Manipulation
Language (DML):
<Command-Name> <Object-Variable> <Command-Modifien
WHERE
<Conditions>
GO
<Command-Name> :: = INSERT I DELETE I SELECT I MOVE I ROTATE
<Object-Variable> :: = ID
<Command-Modifier> :: =
NULL I TO <Object-Variable> I BY NUM METERS I
BY x DEGREES ALONG <coord>
<Predicate-List> :: =
<Predicate> I <Predicate-List> LOGIC-OP <Predicate-List> I
'(' <Predicate-List> ')'
<Expression> :: =
NUM I ID I STRING I FUNCTION-NAME '(' <Function> ')'
<Function> :: = <Object-List> I FUNCTION-NAME '(' <Function> ')'
Other features of the query language that are included in a typical query language
may be included as well. For example, there can be aggregate functions applied to a set
of values, such as Count, Average, Maximum and Minimum. Existential and universal
qualifiers are also allowed.
We also allow the version control command in the DSQL. The user can specify from
which point we would check out a version of Block World, and create a version history
for all the later queries. Further when the user wants to go back to the global version,
just stop the version and do a merge; this will be discussed in Chapter 6.
Version Command:
CHECKOUT
MERGE
The following is the Data Definition Language (DDL):
CREATE ID <Public> <Protected> <Private> <Index> <Supertype> <Subtype>
<Public> :: = PUBLIC: <Attribute> <Method>
<Protected> :: = PROTECTED: <Attribute> <Method>
<Public> :: = PRIVATE: <Attribute> <Method>
<Attribute> :: = ATTRIBUTE ':' d)efn-List>
d)efinition> :: = TYPE ID
<Method> :: = METHOD ':' <Method-List>
<Method> :: = <Function> I VIRTUAL <Function>
With the help of a DDL, the user could create his own class definitions and store
them into a data dictionary. This interface is actually a developer's view of the DSQL.
2.2.3 Examples
Let's consider some examples.
Example 1: Move the cuboid named 'Apollo' to be on top of a cylinder with the
yellow top face.
move 01 to 02
where
type(ol)=Cuboid and name(ol)= "Apollo " and
On_Top_Of(o3)=o2 and
type(o3)=Cylinder and color(top(o3))= "yellow"
go
From Example 1, you may be puzzled at why an additional object is involved. The
point is we need a virtual object to define the destination of the move or insert. The
function On-Top-Of in Example 1, even though it has the same appearance with those
spatial predicates, are actually functions for updating. The semantic meaning of the
update function On_Top_Of(03)=02 is: the virtual object 02 is on top of 03, but it is
to be replaced by source object 01, to put it another way, 01 will be placed to a virtual
position (02) which is on top of 03. The spatial predicate On-Top_Of(o3)=02 only
means 02 is on top of 03.
Example 2: Rotate the cuboid to the left of the block which is on top of a cuboid
with a red top face and to the right of a block with a green left face, around the x axis
by 90 degrees.
rotate 01 by 90 degrees along x
where
type(ol)=Cuboid and On-L~f_Oflo2)=ol and
On-Top-Oflo3)=02, and On-Right-Ofl04)=02 and
color(top(o3))="red" and type(o3)=Cuboid and
color(lefr(o4))= "green "
go
Rotate could be done along any axis.
Example 3: Insert a pyramid to the left ock which is on top of a cuboid with
a red top face and to the right of a block with a green left face.
insert 01
where
type(ol)=pyramid, and length(ol)=lO and width(ol)=8 and
height(ol)=5 and bottom-center=(20.0, 50.8, 0.0) and
color(bottom(ol))= "red" and colorCfmnt(ol))= "blue" and
color(back(ol))="yellow" and color(lefr(ol))="pink" and
color(right(ol))= "black"
go
Insert could specify a relative position, like On-Top-Of, instead of a bottom-center
predicate.
2.2.4 Mapping Block Object Level to Geometric Object Level
There are several types of mapping data modeling constructs between the block object
level and the geometric object level. In other words, the following constructs, at the block
object level, must be mapped into constructs at the geometric object level, and vice versa:
(1) block ID, (2) block, (3) non-spatial function and (4) spatial function and DSQL. For
the rest of this section, we will consider each item individually.
(1) Block ID: A block in the Block World is represented by a Vertex object at the
geometric level, by the notion of block center. Block center is a sort of block ID, which
distinguishes a block from others.
(2) Block: Next, we are going to show how the whole block or its representation at
the geometric level, can be recalled given its block ID. Consider as an example a block
object which is a cylinder. From Example 1, we know that the function Top maps the
block ID into the geometric objects ID of one DISC object. Therefore the top face of the
cylinder is easily located. With the boundary representations of these faces, the block can
be assembled together from these faces, since each face is stored as an independent object
with its own spatial coordinate. Of course, there is an alternative way of representing
components of a composite object (i.e. the position of a component being relative to the
center of the object (or parent component)). The merit of this scheme is that when the
object is moved, and the object is always moved as a whole, only the spatial position of
the centre need be updated. In our scheme, the spatial positions of all the components
must be recomputed. The main advantage of our scheme is that in computing spatial
relationships of the objects, it is very likely that only some components of the objects are
involved, and absolute coordinates of these components are essential to the derivation of
the spatial relationship (e.g. to place a cylinder on top of a cuboid, one need only to
determine a point which is directly above the centre of the top face of the cuboid, at a
distance half as high as the height of the cylinder.) For this computation, the knowledge
of the absolute location of only the top face , not the whole cuboid, is necessary.
(3) Non-Spatial Predicates: All the non-spatial, block level functions that contain
blocks in their inputloutput function arguments will become geometric level functions,
by applying the Center function to all the block types.
(4) Spatial Predicates: Spatial functions at the block level must be translated on
an individual basis by the query processor of the B-Kernel into a program that runs
on the geometric level. Likewise, a DSQL query will be processed by the B-Kernel
to turn it into a program, except that more complicated techniques will be required if
efficiency is an important factor. We will only generate a small processing plan which
contains function calls to a geometric level function implementer, and use a dynamic
linking and loading strategy to execute the calling functions to process the query. This
will be discussed in Chapter 4.
Chapter 3 Experimental Environment and Graphical User Interface
In this chapter, we will describe the experimental environment of ow performance
testing. The database systems used for ow experiments are ObjectStore (OODB) and
Sybase (Extended Relational), which are two or the most popular commercial systems.
3.1 Systems
All the experiments are done using the client-server architecture. The server is a
Sun41780 CPU server and the client is also a Sun41780 server with the display set to a
local workstation. The reason why we picked a CPU server as our client machine is that
it could provide us with larger memory and swap space than some workstations. The
workstation used in our experiments is only for graphic display.
On the server side, we have in total 64M of memory and 128M of swap space. The
network is a Sun NFS running TCPIP. The client machine also has 64M memory and
128M swap space. The workstation is a Sun Sparcl with a colour monitor.
3.2 ObjectStore
ObjectStore (Version 1.2) is the database system on which we first develop the DSQL
system. We will talk about some of its main features.
ObjectStore is an object-oriented database system supporting persistence orthogonal
to type, transaction management and associative queries [l l] . The data model is not in
1NF (first normal form), as objects may have embedded collections. We will mainly
talk about its architecture and associative queries, which are closely related to our DSQL
system.
3.2.1 Persistent C++ Programming Language
ObjectStore is designed to provide a unified programmatic interface (persistent C++)
to both persistently allocated data and transiently allocated data, with object-access speed
for persistent data usually equal to that of an in-memory dereference of a pointer to
transient data [8].
With ObjectStore, neither translating nor copying is needed between disk-resident
data and programming transient data. Persistent data is just like ordinary heap-allocated
data: once a pointer is assigned to it, the user can use it in the ordinary way. This avoids
the problem of so-called impedance mismatch. ObjectStore automatically takes care of
locking, and also keeps track of what has been modified [8].
ObjectStore makes the interface to persistently allocated objects to support all of the
power of the host language (C++). This is in contrast to the traditional data management
capabilities of language such as SQL (incompleteness), which are much less powerful
than a general-purpose programming language.
The only difference between accessing persistent data and transient data is that
persistent data can only accessed within a transaction. While a database is open, transient
data can be accessed anywhere within a program.
3.2.2 Memory Mapped 110
There are a number of very frequent operations in any object-oriented system. Ap-
plications send messages to objects, by presenting the system with the logical identifiers
of the objects. The system must be very efficient in determining and dispatching the cor-
responding methods. Furthermore, the system must determine the location of the objects
very fast, the objects may or may not reside in memory. This in turn means that, if the
objects are on disk, the mapping of logical identifiers of objects to their physical address
must be done very fast. We will see how ObjectStore achieves better performance in
optimizing the very frequent operation of mapping the OID to physical location.
Tom Atwood separates the architectures of current OODB products into two broad
categories :
I. First generation (indirect mapping based).
II. Second generation (memory mapped).
The idea of the first generation architecture is to keep large tables which maintain
the location of the objects on the disk. As you need the objects you go through the
(hash) tables and get the segment containing the objects in to the main memory. But
there are problems:
I. Even if the object is present in the main memory, an access (hashing) takes a
minimum of 6 1 0 instructions to get the object. This is so because every object
is indirectly mapped through a table.
II. To keep pointers to the other objects you need a minimum 64 bits rather than the
normal 32 bits pointers for the main memory access. The extra bits are needed
to indirectly map the objects via the tables.
Being the second generation of OODB, ObjectStore has the memory-mapped architec-
ture, which uses a technique called "pointer swizzling". In this scheme, all intra-segment
memory references are stored as offsets from the segment boundary. All references to
objects outside the segment are "swizzled" so that they generate an operating system
(0s) fault whenever they are accessed. Also, each segment keeps track of all other
segments which can be referenced by a pointer within this segment. All intra-segment
references are done by looking at the offset of the segment in the main memory. How-
ever, any inter-segment reference will generate a fault because of "swizzling", and the
fault handler will determine the location of the object to be accessed. If the object is in
a segment not in main memory, then the segment is brought in the main memory. So,
in this approach, all intra-segment references take one machine instruction (rather than
6-10 for the 1st generation.) and on the average (if objects are clustered properly) the
performance is very fast.
3.2.3 Set-Oriented Query Language
In relational DBMS', queries are expressed in a special language, usually SQL. SQL
has its own variables and expressions, which differ in syntax and semantics from the
host language. Bindings between variables in the two languages must be established
explicitly. ObjectStore queries are more closely integrated with the host language. A
query is simply an expression that operates on one or more collections (e.g. set, list and
bag) and produces a collection or a reference to an object.
ObjectStore provides a collection facility in the form of an object class library.
Collections are abstract structures which assemble arrays in traditional programming
languages, or tables in relational DBMS. Unlike arrays or tables, however, ObjectStore
collections provide a variety of behaviours, including ordered collections (lists) and
collections with or without duplicates (bags and sets).
This collection enables us to attach a set of object pointers to each DSQL predicate
to get the final query result, which is also a set of desired objects.
3.3 Sybase
Sybase is one of the most popular and powerful extended relational database systems.
It has a typical client-server architecture, in which the server takes care of most of the
work in persistent data accessing.
Sybase has its own buffer management which is independent of the buffer manage-
ment of the operating system. It has a quite powerful query optimizer for the general
purpose query optimization. Buffer management and query optimization are the most im-
portant reasons why Sybase can have much faster speed over most of the other relational
systems.
The host language can connect to the Sybase server (remotely) using db-library calls,
where the client program sends the SQL to the server and waits for the result coming
back. Upon receiving the result, the client program will have to bind the information
into the host language data representation. Translation of different data representations
and an extra copy from the Sybase server buffer to client machine buffer seem to be the
bottleneck for complex object processing.
3.4 Graphical User Interface
In this chapter, we will discuss some issues of the graphical user interface in Block
World. The Graphical User Interface (GUI) is one the most popular topic in software
engineering, and it seems that software could not survive without a GUI. Besides our
DSQL based user interface, we build up our GUI using X Windows.
3.4.1 X Window Interface
A very good explanation of the X Window System, taken from Jie Zhang's thesis [19]
is as follows: "The X Window System, or X, is a network transparent window system
and a widely accepted standard for workstations. Because X permits applications to be
device independent, applications need not to be rewritten, recompiled, or even relinked
to work with new display hardwares. X is based on a client-server architecture, in which
the X server maintains a bitmap display screen on a workstation or an X terminal and
distributes inputs from the mouse and keyboard to the X clients. Clients can run locally
on the same machine as the server does. However, they can run transparently on remote
machines, which might or might not have the same architecture and operating systems
as the server's. X is fundamentally defined by a network protocol, the details of which
are masked by an interface library called Xlib."
With the above features of X Windows, we can not only create an interface using Xlib
and X widgets, but we could also combine with other X Window based graphic packages,
from the library level or end user interface level, to build up much more sophisticated
graphical user interface. In our Block World, we combined an X window system with
a graphic package HOOPS (see HOOP User's Guide for details), X Window based, to
build up more a user-friendly interface, shown in Figure 5.1.
1 move 01 to 02 I
01 : type (01 =Pyramid, 02: ontopof(03)=02, 03: color(top(o3))="red"
.... ....... ........ ... ........... ....... ...... ..........
message :
display time = 2.45 aec. query proceasing = 3.25 sec.
Figure 3.1 Block World
In Xlib and X widgets, developing widgets, menu entries, pop-up windows etc. can
be generated with great ease, but are lacking in 3-D features. HOOPS, however, has
3-D capabilities. Once we combine these two to build up the interface, we will have
nice looking widgets and a 3-D display. Our DSQL graphical user interface consists of
a HOOPS display in an X canvas showing Block World scenery, a pop-up window for
query input, a message window and some other control menus. Whenever we type in an
ad hoc query, all the processing results will emerge through the HOOPS display.
Further, we can also combine our system with some other packages (e.g. SAS) but
this connection is not good. Since other systems may not provide a library, they may
be accessible only through a specific interface, you could no longer consider it to be
part of your system. But with an X Windows architecture, a unique X server will take
charge of the display, and you have a window manager standing alone, thus all the X
based graphical packages could be combined together to set up a better interface (e.g.
you could press a button in Block World window to pop-up a SAS pie chart using an
operating system command, even if this package does not reside locally, you could still
use remote procedure calls to start it up and handle the network communication as well.)
From the above examples, we could see that X Windows makes it possible to combine
heterogenous X-based packages to build up more powerful graphical user interfaces. This
seems to be a major trend in the future, because in most of the cases, we would rather
use the off-the-shelf package than build up a system from scratch, even though we will
have to compromise because of the specificity of the interface of heterogenous packages.
3.4.2 Graphical Display
From the HOOPS manual: "HOOPS is an X-based system for creating interactive
graphic applications. Since nearly all graphics applications are database problems,
HOOPS is, first and foremost, a database which stores information about which objects
to draw, where they should be displayed, and how they should be rendered. The system
provides the application program with tools for modifying, querying, searching and
displaying the database. The HOOPS database is organized as a tree-shaped hierarchy,
resembling a file directory tree. Related elements are grouped together in segments,
which are the units of organization within the database. All information stored in the
database can be changed; geometry can be edited, attributes can be modified and the
hierarchy can be reshaped."
In the DSQL system, we provide a user with a view of the whole Block World,
and reflect the changes of the database on the 3-D graphical display, done by HOOPS
(e.g. when you type in a query to move a pyramid on top of a cuboid, you will see the
corresponding changes emerge on the graphical display.)
In query processing, the graphical display update will be done along with the database
update. But there may be problems in the object representation (i.e. the database
and graphical package may have a different representation of the object) In order to
reflect the database information, data conversion is unavoidable. In our Block World,
blocks and geometric objects have a boundary representation, while HOOPS may use
a different representation suited for 3-D display. To display a block, we should first
extract information of the block object in Block World to construct the parameters used
by the display routine in HOOPS, and then have HOOPS decompose the parameters into
its internal representation. Let's see the following example for details.
void HOOPS-define-cuboid
(id, 1, w, h, top-col, botm-col, front-col, back-col, left-col, rightsol) {
create a segment for this id in HOOPS libaray;
for each face f {
// each face is defined by 4 endpoints.
create the child segment for f;
1
1
void HOOPS-display(id, center, normal) {
open the segment of this id in HOOPS libaray;
rotate it according to given normal;
translate it to given position (block center);
1 void HOOPS-create-cuboid
(id, 1, w, h, top-col, botm-col, front-col, back-col, left-col, right-col, cen, n) {
I/ insert definition of cuboid with unique id into HOOPS libaray.
HOOPS-define-cuboid(1, w, h, top-col, botm-col, front-col,
back-col, left-col, right-col);
/I linking the cuboid definition in libaray to display at given position.
HOOPS-display(id, cen, n);
1 void Cuboid::Display () {
HOOPS-create-cuboid((int) this, Length(), Width(), Height(),
top->Color(), bottom->Color(), front->Color(), back->Color(),
left->Color(), right->Color(), block->center, top->Normal());
1
There is a translation needed to pass the block information in GIs to the HOOPS
display, even though HOOPS also use the boundary representation, but we have to
compromise for simple interface in HOOPS display. Another concern is that we will
have to recreate a block in HOOPS, whenever we pose an update query to Block World.
The reason is is that we should ask HOOPS to reflect the current state of our database,
not just show a correct motion on the display. This will create more .overhead for the
display. When we think of the translation and duplication overhead, we naturally may ask
a question as to how we can eliminate this overhead. If we could integrate the HOOPS
database with our own database, the overhead will be gone. We will not go into it here,
and will only show the weight of graphical display in query processing.
Table 3.1 Display vs. Retrieval (On Sun1480 with 64M Memory)
From the above table, you may find it surprising that display time is already
comparable to database retrieval. Thus to improve the performance of the graphical
display is another important issue in query processing.
Table 3.2 Display vs. Retrieval (On SunISparc 1 with 8M Memory)
To provide network transparent and device independent features, X Windows has a
larger overhead than other local window systems (e.g. SunView) and thus requires a more
powerful platform. Since HOOPS is a X-based graphical package, we would like to test
how much the performance drops when we run our system on a less powerful platform - SunISparc 1 with 8M memory. You could see from Table 5.2.3, the performance for the
HOOPS display drops significantly. Combined with the cost in database retrieval2, the
query processing becomes unbearable. Since X Window is network transparent, we could
have a remote X client displaying on the local machine. So, to avoid poor performance,
we suggest people use a more powerful CPU server remotely as the client machine for
those systems requiring a powerful platform (e.g. X Windows and Objectstore).
This will be described in detail in the next chapter.
39
Chapter 4 Implementation of DSQL
In this chapter, we describe the processing of a DSQL query and discuss various
strategies to optimize the performance of queries. Some of the strategies are similar
to those adopted by relational database systems, such as query compilation and select
first [I]. There are other strategies that are specific to the DSQL, or rather to, queries
containing functions. Here we consider how these functions are processed when there
are OR connectives in the WHERE-clause of the DSQL. We also consider the processing
sequence when there are two or more spatial predicates present.
4.1 Outline of Query Processing
Despite the differences in the data models, object-oriented queries may be evaluated
in a manner similar to relational queries. This observation is not really surprising. It
is well known that the object-oriented database equivalent of the relational selection
operation is simply the retrieval of these instances of the target class and the retrieval of
these instances requires retrieval of the instances of other classes, which are recursively
referenced as values of target class attributes.
In DSQL there are five major steps to process a query: (see Fig 4.1)
I. Pose the ad hoc query to YACC (Yet Another Compiler-Compiler) to generate a
parse tree. We use YACC so that we only need to give the BNF of the query
language and it generates the parser programming to do syntax and grammar
checking automatically.
ZZ. Pass the parse tree to the query processor to generate a processing plan. A query
optimizer is involved to obtain an optimal plan for higher performance.
ZZZ. Use the Unix system facilities (dynamic linking and loading) to compile the
processing plan into a shared library (called query.so).
ZV. The file query.so is mapped into a number of function calls to the DBMS.
With Unix dynamic linking and loading facilities (dynamic linking editor), plans
(functions) in shared library (query.so) could be mapped into system function
calls to DBMS.
V. The function calls to the DBMS are processed and the query result is sent back
to the end user.
Applying a compilation strategy for the processing of an on-line ad hoc query is
often considered a difficult task, but with the help of dynamic linking and loading in
Unix, we can process the ad hoc query very efficiently. We have a constant overhead (2
seconds) in query compilation, for most queries.
preprocessor
End User
ad hoc query .................................. .................................. .................................. .................................. ........ =
dynamic linking & loading
Figure 4.1 Flow Chart of ad hoc Query Processing
4.2 Query Processing Plan
The generation of a query processing plan (Step 11) has three major steps:
42
I. Classify predicates by objects: Since each object in the query will have its
corresponding predicates, we have to classify all the query predicates by objects.
11. Optimize the parse tree: The position of predicate nodes in the parse tree will be
adjusted according to a proper optimization strategy. See section 3 for details.
III. Each of the objects in the query are related, there exists a dependency graph for
all the related objects. Query processing order is bottom-up in the dependency
graph.
Classify predicates by objects;
For each object do optimization on its associated predicates tree;
Decide the order to process each object;
Generate the query plan;
1 Figure 4.2 Query Processing Plan Algorithm
Let's give an example of DSQL query and look at the generated processing plan
and how to evaluate it.
Example 4.1
select 01 where
On-Top_Of(o2)=01 and (name(ol)='YpoUo" or type(ol)=Cuboid) or
On-Bottom-Of(oS)=ol and On-LejtPOf(03)=ol and On-Right-Of(o4)=ol and
co1or(front(o2))="red" and
color(top(o3))="yellow" and
color(fefr(o4))="green" and
color(right(oS))="pink"
go
This query will select the cuboids named "Apollo" and on top of a red front face
block, or below a pink right side block and to the left of a yellow top block and to the
right of a green left side block.
First we classify the predicates by object. We may easily see that the first 6 predicates
are associated with block 01, and the remaining ones are corresponding to 02, 03, 04
and 05, respectively.
The parse tree for block 01 will be as following, and will be optimized following
the rules in the next section.
The processing order will be 02*03*04*05*01, and the parse tree is shown in Figure
4.3 will be translated to the processing plan. Now let's look at the processing plan, and
how to evaluate it efficiently.
Figure 4.3 Parse Tree with Proper Symbol Names
n ~ e ( ~ l ) = ~ A p p l l o " type(ol)=Cuboid On~~eft~0f(o3)=olOn~Right~Of(o4)=ol
Processing Plan: void DyConditionO {
frontcolor(p7, 02, "red");
topcolor(p8, 03, "yellow");
leftcolor(p9, 04, "green");
rightcolor(pl0, 05, "pink");
name(p2, ol)="Apollo";
type(p3, ol)=Cuboid;
1
void DyQuery() {
Intersect(p2, p3, p l l , 01);
On-Top-Of(p1, 01, 02);
Intersect(p1, p l l , p12, 01);
OnBottom-Of(p4, 01, 05);
On-Left-Of(p5, 01, 03);
On-RightLOf(p6, 01, 04);
Intersect(p5, p6, p13, 01);
Intersect(p4, p13, p14, 01);
Union(pl2, p14, p15, 01);
1
Evaluation:
void Process() {
foreach (a-block, B1ock::extent)
mapping to DyConditionO with Dynamic Linking Editor;
mapping to DyQueryO with Dynamic Linking Editor;
1
The final result will be associated with root p15.
To evaluate the processing plan efficiently, we try to finish processing all of the
non-spatial predicates in one scan (DyCondition();) of the database (i.e. for each block
in Block World, test with all the non-spatial predicates, and attach it to those objects
with a true value). That is definitely faster than considering these predicates separately.
This processing plan has already been optimized, and we will discuss the optimization
strategy in the next section.
4.3 Optimization
Before we study the details of query optimization, let's look at the spatial predicate
processing function first.
Spatial Predicate: On_Top_Of(o2)=0 1 ==> Function void OnTop-Of(o 1, 02);
void On-Top-Oflpredicate p, block-set 01, block-set 02) {
/I retrieve the set from variable name from symbol table.
foreach (bl, 01)
foreach (b2, 02)
if (bl not on top of b2) then attach bl to p;
}
If we ignore the query optimization, the query can be processed. Let's look at how
the following query can be processed.
select 01 where
On-Top-Of(o2)=01 and name(ol)='54pollo" and
color(top(o2))="red"
go
Assuming the total number of blocks is N, the number of blocks with a red top face
is N16 (6 faces of a cuboid), and the number of blocks named "Apollo" is 1.
The processing plan will be: process 02, find those blocks on top of 02, and intersect
with those blocks named "Apollo".
The processing cost is: N + N*Nl6.
However, if we confine all the possible blocks on top of 02 to have name "Apollo",
then the processing will drop sharply to 2*N.
4.3.1 Applying Non-spatial Constraints to Spatial Predicates
It is well known that for query optimization of relational DBMS' the most important
rule is "select before join". Our spatial predicate (e.g. On-Top_Of(o2)=ol) is a nested
loop, and obviously we would like to attach more constraints to 01 and 02 before entering
the loop.
Let's look at one example of how to apply constraints (non-spatial predicates) to
spatial predicates.
Example 4.2:
select 01
where
On-Top-Of(o2)=01 and color@ont(ol))="red" and
(type(ol))=Pymmid or name(ol)='%pollo" and
color(top(o2))="green"
go
This query is to find all objects that have a red front, green top face, and is a pyramid
or a red front face and is on top of a block with a green top.
Figure 4.4 Parse Tree Before Applying Constraints for 01
First classify the predicates, pl, p2, p3 and p4 to be associated with 01, p5 to 02.
The Logical formula for the predicates of 01 will be like this:
P l A ( ~ 2 V P 3 ) A ~4
Without optimization, the processing plan for the above query would be:
I. Process p5 of dependent object 02.
II. Find 01 satisfying p2 or p3.
III. From 01 in 11, find the subset satisfying pl.
ZV. Intersect 01 with those satisfying p4.
It is obvious we did not apply the constant p4 (color(front(ol))="red") to minimize
the size of 01 before testing the spatial predicate (nested loop). We could involve a
function called LowerDown to apply the constraints to spatial predicates (i.e. lower
down a sub-tree with pure non-spatial predicate(s)). After optimization, the optimal
parse tree will be as follows.
type (01) =Pyramid name (01) =u~pollo"
Figure 4.5 Optimal Parse Tree After Applying Constraints to Spatial Predicate
( ~ 2 V P 3 ) A ~4 A PI
Thus the following processing plan will bring us better performance.
I. Process the predicate p5 of dependent object 02.
11. Find 01 satisfying condition p4 and (p2 or p3).
111. From 01 in I1 find the subset satisfying pl.
More contracts are attached to spatial predicate OnTop-Of(o2)=ol, i.e. the size of
01 is reduced before entering the nested loop in function On-Top-Of(o1, 02).
The optimization rule when spatial predicates are joined with non-spatial ones is:
always try to apply as many as possible non-spatial predicates to constrain the spatial one.
4.3.2 Applying Constraint When OR Operations Are Present
In the last example, there is an OR operation in the query, but it has no effect on the
spatial predicate. Let's look at the following example, and try to find a way to optimize it.
Example 4.3:
select 01
where
(On-Top-Of(o2)=ol and type(ol)=Pyramid or color~ont(ol))="red")
and name(ol)="Apollo" and
color(top(o2))="green"
go
This query is to find those blocks named "Apollo" such that it is either a pyramid on
top of a block with a green top or a block with a red front block.
ontopof(o2)=ol type (01) =Pyramid
Figure 4.6 Parse Tree with OR Operation Obstacle Applying Constraints
The logical formula for this query:
( P I A ~2 V P 3 ) A ~4
The processing plan should be:
I. Process the predicate p5 of dependent object 02.
11. Find all the 01 satisfying p2, and from that obtain the subset satisfying pl.
111. Union 01 from I1 with those satisfying p3.
ZV. Intersect 01 from I11 with those satisfying p4.
The OR operation causes us some trouble in applying the constraint name(o1) =
"Apollo" to spatial predicate On-Top_Of(o2)=ol. However, we still can apply the
constraints using logic computation, and the parse tree will become like this:
Figure 4.7 Optimal Parse ' h e After Applying Constraints Through OR Operation
The new formula will be:
( P I A P2 A p4) V ( ~ 3 A p4)
There are actually two steps in applying constraint name(ol)="Apollo":
I. Rotate the tree about the AND node to raise the OR node one level up. The left
child of the new AND node will be a virtual node, which indicates a linkage to the
original sub-branch of the conjunction. In our example this would correspond
to p4.
II. Using the same method as in the last example, without an OR operator, to apply
constraint, as shown in Figure 4.4.
The processing plan here is quite straightforward:
I. Process the predicate p5 of dependent object 02.
ZI. Find all the 01 satisfying p2 and p4, and from that obtain the subset satisfying pl.
III. Union 01 from I1 with those satisfying p3 and p4.
IV. Intersect 01 from I1 with 01 from 111 to get the final result
The optimization rule is: when spatial predicates are joined with an OR operation,
always try the conjunction form (logical).
4.3.3 Propagation of Constraints Through Spatial Predicates
There are times we will have too many spatial predicates, and few or even no non-
spatial predicates present. How could we still apply constraints to the query processing?
There are normally three methods to process the AND operation "On~Top~Of(o2)=ol
and On-Bottom-Of(o3)=o 1":
1. Parallel (intersection method)
Find the blocks which satisfy On-Top-Of and On-Bottom-Of respectively, and
intersect these two sets of results to get the final query result.
foreach a-block in BlockWorld do
foreach block2 in 02 do
if a-block is on top of block2 {
save a-block to temp-setl;
break;
1 foreach a-block in BlockWorld do
foreach block3 in 03 do
if a-block is on bottom of block3 {
save a-block in temp-set2;
break;
1 result-set = temp-set1 & temp-set2;
2. Sequential
First find the blocks satisfying On-Top-Of, then in the result set, start testing the
On-Bottom-Of predicate to get the final query result.
foreach a-block in BlockWorld do
foreach block2 in 02 do
if a-block is on top of block2 {
save a-block in temp-set;
break;
1 foreach a-block in temp-set do
foreach block3 in 03 do
if a-block is on bottom of bloc
save a-block in result-set;
break;
I
3. Nested
For all the blocks test if it satisfies the On-Top-Of predicate, if so, go on and test
the OnBottom-Of for this object, otherwise, pick the next block to start again.
foreach a-block in BlockWorld do
foreach block2 in 02 do
if a-block is on top of block2 {
foreach block3 in 03 do
if a-block is on bottom of block3 {
save a-block in result-set;
break;
1
break;
1
Let's examine these processing strategies through a concrete example.
Example 4.4:
select 01
where
On-Top_Of(o2)=01 and On-Bottom-Of(o3)=01 and On-LefrfrOf(04)=ol and
color(top(o2))="green" and
nume(o3)='%pollo" and
type(o4)=Pyramid
go
This query is to find the blocks which are on top of blocks with green tops and below
blocks with the name of "Apollo" block and to the left of a pyramid.
Figure 4.8 Parse ' h e with More Than One Spatial Predicate
Parallel Method:
I. Process the predicates p4, p5 and p6 of dependent object 02, 03 and 04,
respectively.
II. Find those objects satisfying pl.
III. Find those satisfying p2.
IV. Find those satisfying p3.
V. 01 should be the intersection with 11, I11 and IV.
In this strategy, the processing time will be at least doubled if we added one more
spatial predicates in the query.
Sequential Method:
As we know from the query, the three predicates are related (i.e. they are all
describing the feature of 01). So, once we finish processing the predicate, we should
reduce the size of 01, not the whole set of blocks (i.e. apply the constraint from first
predicate to the next). Let's assume through one constraint, the size of the possible result
will be reduced by 80%. Using this strategy the total access to Block World (N) will be
N*(0.2N) + (0.2N)*(0.2N) + (0.2*0.2N)*(0.2N), which is much lower than that without
optimization (0.6N*N). The complexity comparison is O(N*lnN) vs. O(N*N). Thus, we
have obtained a better processing plan. It is true that the processing order of these spatial
predicates will have a great influence on the performance, but we will not discuss this
here. The solution for this more or less depends on the statistical information of the
database, which will bring in extra costs for each query processed.
I. Process the dependent object 02, 03, and 04.
ZZ. Find those objects satisfying On-Top_Of(o2)=01.
ZZZ. From I1 find the subset of blocks satisfying On~Bottom~Of(o3)=ol.
ZV. From I11 find the subset of blocks satisfying On-Left-Of(o4)=ol, and this the
final result of 01.
Nested Method:
I. Process the dependent objects 02, 03 and 04.
ZZ. For each a-block in Blockworld, if a-block is on top of one of 02 & below one
of 03 and to the left of one of 04, then put a-block in the result set.
We can also prove the complexity of Nested Method is O(N*lnN), if the mapping
is 1:l. But if the mapping is m:l, the Nested method is obviously worse than the
Sequential method. In our Block World, the mapping for spatial predicates is 1:1, so
these two methods are equivalent in complexity. But due to the reason for keeping
consistency with the non-spatial predicate, we choose the Sequential Method for our
processing strategy.
Table 4.3 Comparison of 3 Optimization Strategies in Query Processing
4.4 Version Control
When CAD applications were first developed, they were intended to support individ-
ual engineers working on distinct projects. Today, as design complexity increases, these
applications more and more need to support cooperative work by a number of engineers
on the same design. As a result, a requirement has emerged for configuration management
facilities that allow earlier states of a design to be recorded, and that allow alternative
designs to be simultaneously available. The conventional locking scheme will make you
wait for your co-worker to finish, because no dirty read is allowed. But with version
control, you and your co-worker could each check out your own version and work on
them simultaneously. There might be incompatible changes, and the merging of these
versions could be a complicated matter, but versioning is more favorable than locking
in some cases, especially in CAD areas. Also with version control, you can easily undo
your present committed version, and return to a previous one.
We have applied version control to our Block World, and thus provide a better multi-
user environment. For example, as shown in Figure 6.1. Two users are cooperating in
building a scenery. They both check out a version in their own workspace, and do the
following queries.
move 01 to 02
where
type(ol)=Pyramici,
On-Top_Of(03)=02,
type(o3)=Cylinder
go
user2:
move 01 to 02
where
type(oZ)=Frustrum,
On-Top_Of(03)=02,
type(o3)=Cuboid
go
After these queries, they decide to merge the two separate versions into a new one.
global workspace vetion 1
user1 workspace Version la user2 workspace Version lb
merged version
Figure 4.9 Checkout Branch and Merge of Versions
60
In Objectstore, each user can create his own version history inside his workspace,
in which everything is private and all the modifications to versions are invisible to other
workspaces. But it is still possible to resolve a trans-version object in other versions of
the same workspace, or resolve the versioned object in the current version from other
workspaces. The first solution will make it possible to traverse the version history of
your own workspace, and the second one will be useful for merging the trans-version
object (i.e. you may have to find out the status of the merged object by consulting all
the current trans-version objects in each branch).
Let's come to the version control of our DSQL. In our system, we will consider the
whole set of blocks as a versionable object, as well as each single block. Upon entering
the system, there will be a private workspace automatically allocated for each user. For
each update query a new version of the scene will be create, which will allow you to
backtrace the version history. So we will have to add two additional operations before
and after the function:
block-config->checkout_branch(user);
query~rocessing();
block-config->checkin();
Where the block-config is a persistent configuration pointer, initialized when the
database is created, which serves to group together objects that are to be treated as a
unit for the purposes of versioning.
When we decide to undo the current version, and backtrace the version history, we
just have to reuse the old versions in the history and forget the current one.
void BlockTrace() {
block-config->predecessor->()->use();
using the old version to modify the Block World scenery.
block-conjig->forget();
J
The most important task in version control is definitely the merge of version branches.
In Block World, we have to create a passive monitoring process to handle the merge,
and end users need only to send a signal (UNIX socket) to wake up the merge process.
Once the merge is done, all the end users will be notified by a UNIX signal or socket,
and they can then carry on work using the new merged version.
void Merge() {
foreach version branch V (workspace=user-x) {
workspace::set~current(global);
configalt = user-x->resolve(blockkconfig);
set-alt = configalt->resolve(extent);
block-config->merge(configalt);
foreach (a-block, extent)
foreach (block-alt, set-alt)
if (same-object(a-block, block-ah))
a-block->Modify(block-ah);
else
InsertToGlobal(block-alt);
workspace: :set-current(user-x);
block-config->forgetbranch();
1
Where the InsertToGlobal(block_alt) function constructs an insert query based on
block-alt and poses the query to DSQL, so that the new block created in version branch
will be inserted into global workspace during the merge.
Since the merging of several versions is a background process, the efficiency is not
that important. It is easy to imagine that we will encounter more problems using the very
sophisticated merging facility provided by the DBMS than some using simple operations.
We devise another merge process which only collects all the valid queries3 executed in
each version branch, and reprocess all the queries in the global workspace. Thus we may
have a bug-free merging process standing alone. And without the needs for graphical
display, the efficiency is not too bad.
The undo queries should not be included.
Chapter 5 Performance Comparison
In this chapter, we will describe some of the variations in implementation of the
DSQL on a relational platform. In each case, the performance is compared to other cases
as well as an OODBMS implementation.
We will discuss in this chapter two major performance problems of the relational
approach in DSQL, table decomposition and impedance mismatch.
5.1 Implementation Effort in DSQL for Relational DBMS
The relational approach for DSQL is quite different from the object-oriented approach.
In the object-oriented approach, important factors, such as object identifier, composite
object and data clustering and caching, are the reasons why object-oriented database
systems could have much better performance. But another more important reason why
a relational database system has very poor performance is the so-called impedance
mismatch for the programming data model and database model, while object-oriented
DBMS' provide a seamless integration for these two, and hence better performance.
The table decomposition is always a controversial topic. The pure relational approach
requires normalization for each table, while in engineering application environments
where complex objects are present, the normalization concept is not favoured. In theory,
less normalized tables will have better clustering for composite objects, and hence better
performance.
The key point in the relational approach is: (1) how to translate the DSQL query into
relational SQL and (2) how to make use of relational tables for persistent data handling.
All the spatial and non-spatial predicates will be implemented as relational db-library calls
operating on flat tables. You may easily notice the impedance mismatch of the application
data model and database data model, i.e. Block and Geometric Objects with hierarchical
structure vs. relational flat and normalized tables. While in the object-oriented approach,
a unique data representation is present both for the application program (transient) data
model and the persistent data model of database.
We have done three kinds of performance comparisons based on Sybase.
I. Pure Relational approach (Strictly Normalized).
II. Less Normalized approach.
III. Relational Approach with PreLoading.
In this chapter we will first introduce two relational approaches, the pure (or strictly
normalized) relational and the less normalized one, and describe our DSQL system based
on these instead of an object-oriented approach. The purpose for the Less Normalized
approach is to apply the complex object clustering in the object-oriented concept to the
relational concept, expecting better performance. Then we will describe the PreLoading
approach to integrate the relational system with all the possible object-oriented features,
such as object identifier, client caching and data clustering for the composite object. And
the most important thing we want to stress, from the performance point of view, is the
effect of the "impedance mismatch".
All the comparisons will be done based on schema design, query processing and
performance benchmarks. And our interpretation of the performance result will be given
for each case. After all these comparison, we will show why we chose an object-oriented
DBMS as our basic platform for DSQL implementation.
5.2 Design of Relational Schema
In this section we will describe the schema both in Pure Relational and Less
Normalized approach, and point out the major difference between them.
Let's use the Cuboid as an example to show this difference:
5.2.1 Pure Relational (PR) Approach
Cuboid Table
Rectangle Table
Figure 5.1 Schema Definition for Pure Relational Concept
The Cuboid Table is from the Block Level, and the Rectangle and Edge Table are
from the Geometric Level. The whole block object is decomposed into three levels
of tables for normalization purpose. The field top in Cuboid links to the field id in
Rectangle and the field edge-1 links to the field id in Edge table. To retrieve all of the
cuboid's information, a join operation will be applied to all three tables on related fields.
To speed up the join operation, primary indexes are on id in each table, and secondary
indexes on the join fields, such as top and edge-I.
5.2.2 Less Normalized (LN) Approach
To obtain high data clustering, table normalization is a main obstacle. We will try to
combine some tables to sacrifice normalization for high clustering. The Face and Curve
tables are the first targets to be combined. After combining these two tables, we will
have two levels of tables for Block and Geometric Objects, respectively.
Cuboid Table
w "BT Combined
Reactangle Table Edge Table
Figure 5.2 Schema Definition for Less Normalized Relational Approach
In this approach, we combine the two tables in the Geometric Level into one, and
obtain better data clustering, which is closer to the composite object clustering in the
object-oriented model. The retrieval for the whole block object will have fewer tables
to be joined, and possibly better performance.
5.3 DSQL Query Processing
The general DSQL query processing steps are: query parsing with YACC, generating
a processing plan, mapping the plan into database function calls and sending back the
result. The difference for each approach is mainly from the execution of database function
calls, including spatial and non-spatial predicates processing. Let's look at the difference
in PR and LN approaches.
Pure Relational and Less Normalized Approach are much like the same in query
processing, except they are different in a Sybase SQL query to retrieve face information.
5.3.1 PR (Normalized) Approach
As we have seen in Chapter 3, the query processing plan will be mapped into the
function implementer built on top of the DBMS. In object-oriented database systems, all
the spatial and non-spatial predicates implemented as member functions for each class
are due to its unique data representation. But in a relational approach things become
much more complicated.
Each predicate processed should have the same basic steps:
I. Construct a corresponding SQL query and pose it to the database server (remote).
11. Wait for the SQL result.
III. Bind the SQL result into the local data structure.
The non-spatial processing is relatively simple, here we only show the SQL query
to the Sybase server. For performance considerations, a stored procedure is used to save
query parsing and compiling
Ex. 5.1: Find all the cuboids with a given colour top.
create procedure TopColor @col varchar(l0) as
select b.id from cuboid b, rectangle f
where
b.top = $id and $color = col
Ex. 5.2: Find all the cuboids with a given name.
create procedure Name @n varchur(l0) as
select id from cuboid where name = n
go
Because of the incompleteness of SQL, we could not count on Sybase to process a
complicated predicate like On-Top-Of. A client program is involved in such processing.
To process a spatial query On-Top-Of(o1, 02), which is basically a join operation.
We first retrieve, from Sybase, the bottom rectangle face of block block-01 in set 01,
then for each block block-02 in set 02 get its top rectangle face from Sybase. If this pair
of top and bottom faces are overlapped and attached to each other, then we could say
block-01 is on top of block-02, and store the block-01 in the result set. After scanning
all the blocks in set 01, we could get a subset of blocks, each on top of at least one
block in set 02. Here is the algorithm.
On-Top-Of (01, 02) {
Lookup the info of 01 and 02 in symbol table;
for each topBlock in set of 01 {
Retrieve the face bottom = GetBoundaryFace (topBlock, "Bottom");
for each bottomBlock in set of 02 {
Retrieve the face top = GetBoundaryFace (bottomBlock, 'Top");
if face bottom is overlapped with face top
record this match, topBlock is selected;
1
1
1
GetBoundaryFace (BlockID, faceside) {
Based on face side and block type construct the SQL;
pose the SQL query to the Sybase server;
bind the SQL result to local data structure;
return the new face;
1
The above functions are the same for both the Pure Relational and the Less Normal-
ized approach, the only differences are in the SQL query.
A join is required to retrieve block faces (e.g. to get one top face we will need to
join the rectangle table and the edge table):
Ex. 5.3: retrieve a top face of cuboid.
create procedure CuboidTop @block-id as
declare @top-id int
select @top-id = top from Cuboid where id = @block-id
select * from quadplane f, edgecircle e
where $id=@top-id and e.id = $el
..... (repeat for other three curves)
go
The reason why we use 4 separate SQL statements instead of one with an OR operator
is that the OR operation will be 100 times worse than 4 single SQL statements, due to
the features of the query optimizer in Sybase.
5.3.2 LN (Less Normalized) Approach
In the Less Normalized Approach, since we have to join the rectangle and edge table
statically (in a table structure), no join operation is required to get a top face.
Ex. 5.4: Retrieve a top face of a cuboid in LN approach.
create procedure LessNoramlizedCuboidTop @block-id as
declare @top-id int
select @top-id = top from Cuboid where id = @block-id
select * from less-normalized-quudplane
where id=@top-id
go
In the new rectangle table, all the information for one face will be stored in one tuple,
and definitely in a page. While in the normalized one, the same amount of information
is separated into two tables, and possibly the query for the rectangle face will require
more disk 110.
5.4 Performance Comparison
In this section we will do the following comparisons:
I. Spatial Predicate comparison in which a single type and assorted types of blocks
are investigated as well as disk UO.
ZZ. Non-spatial Predicate.
Let's first get a overview of all the DSQL operations.
All the query conditions for non-spatial query comparison is:
I. One Non-spatial Predicate: color (top (01)) = "red".
ZZ. Two Non-spatial Predicates: color (top (01)) = "red" and color (front (01)) =
"green".
All the query conditions for spatial query comparison is:
I. One Spatial Predicate: On-Top-Of (02) = 01 and color (front (02)) = "red".
II. Two Spatial Predicates: On-Top-Of (02) = 01 and OnBottom-Of (03) = 01 and
color (front (02)) = "red" and color (top (03)) = "green".
Table 5.4 Non-spatial Predicate Comparison Based on 500 Cuboids
Table 5.5 Spatial Predicate Comparison Based on 500 Cuboids I
Since move, rotate and select are not much different in performance, we will continue
the comparison on only the select DSQL.
5.4.1 Spatial Query Comparison
It is well known that cold and warm comparisons sometimes make a big difference,
especially when client machine caching is applied. The following results are comparisons
between a cold and warm environment.
In the block world, the Warm means we first call one function to access all the
objects in a database once, then test the spatial and non-spatial queries. In the warm
stage, some objects may be cached, some may be swapped out already. The Cold
means we will execute the spatial and non-spatial queries with no objects in the database
cached. The importance for testing both Warm and Cold cases is that they usually make
a nontrivial difference.
Table 5.6 Warm & Cold Comparison (Select With 1 Spatial Predicate on Cuboids)
Since the last comparison is only on Cuboid, we may want to try assorted blocks to
see if it makes a difference. The following example is on 4 kinds of blocks, Cuboid,
Cylinder, Frustrum and Sphere.
Table 5.7 Warm & Cold Comparison (Select with 1 Spatial Predicate on Assorted Blocks)
Based on the above result, people may be shocked at the result, even that they may
expect an OODBMS would be a lot better than a RDBMS in our scenario. But still this
is only one part of the story, for a complicated predicate, the OODBMS does amazingly
well. But what about the simple query with only non-spatial predicates? We will see
the comparison in next topic.
Another surprising result is that the LN Approach is worse off than the PR approach.
What's wrong? The normalized table format with many join operations could be better
than the less normalized table with no join operation, which is completely unbelievable.
We decide to make use of the statistical information in Sybase to investigate the mystery.
Following is the timing and physical disk 110 accumulated from each Sybase SQL query.
time io time io time io time io I I I I
Table 5.8 Performance Comparison with Sybase Statistics I/0 Turned On
75
Normalized
Less Normal
315.7
187.2
1279
1097
1007
635.1
11079
8947
5308
3427
89354
53729
20221
13684
375023
229443
Another surprising result comes out. It is easy to imagine the performance of the
normalized approach to be worse, but it should not be that much different. While this
seems to be an incredible result when turning on the statistics 110 in a less normalized
one. We will try to explain it later.
5.4.2 Non-Spatial Query Comparison
From this result, we could get to know where the object-oriented database system
is best suited. For the simple query or some other business application, object-oriented
databases may not be promising.
Normalized
Less Normal
ObjectStore
5.5 Discussion on Performance Comparison
Object-oriented database systems are best suited for our DSQL for the following
reasons:
Table 5.9 Warm & Cold Comparison (Select with 1 Non-spatial Predicate on Cuboid)
100 200 500
warm
0.09 1
0.13
0.072
warm
0.32
0.46
0.31
warm
1.31
1.79
1.21
cold
0.12
0.17
0.42
loo0
cold
0.49
0.58
0.89
cold
2.13
2.98
4.47
warm
2.79
4.73
2.45
cold
4.41
6.17
9.23
5.5.1 Relational Key vs. Object Identifier
As we mentioned before, two important features of the object-oriented DBMS are
the notions of a complex object and an object identifier.
On the one hand, each object has its own identity represented by a unique OID. On
the other hand, the notion of complex object allows the aggregation of heterogeneous
objects as a single unit. [3]
In the DSQL object-oriented approach, the complex Block object has its sub-
objects (attributes) Boundary Face from Geometric Class, and is linked together as a
graph structure. The references to face object are made explicitly through face OID
links. Further, object-oriented systems, like ObjectStore, have been developed largely
independently of any consideration for very large databases (i.e. they assume that all
objects reside in a large virtual memory). This means that object identifiers have been
used as the sole means of specifying desired objects, which allows the dereference to
Block and Geometric object with great ease and amazing performance.
A major advantage of object-oriented DBMS' is that many relational join operations
can be prevented when retrieving complex objects; instead of joining tuples from different
relations based on some attribute values, objects are simply accessed through their ODs.
In the relational approach of DSQL, when we want to process a spatial query, the
only persistent identifier for Block is block id (i.e. key in relational tables). Given a
blockID, whenever we call GetBoundaryFace(blockID), we have to do 1 select and 4
join operations which is the most expensive SQL operation. While in the object-oriented
approach, given a block object identifier, with pointer swizzling in ObjectStore, we could
get its address in virtual memory at once. Traversing through the Block Object and
Geometric Object hierarchical tree, the information of the face boundary will be obtained
at maximum speed. No join operation is needed, only navigation of a class hierarchy.
There is only one operation like this, the difference may be trivial, but the spatial predicate
is just a sort of join and this operation will be repeated O(N*N) times. So it is not difficult
to explain why an object-oriented database system processing a complicated DSQL is
more than 100 time faster than a relational one.
The accessing of a Block object through the block key, a remote SQL query to
Sybase, and client side data binding emphasize the problem of impedance mismatch in
a relational approach. In comparison, the object access in Objectstore is so natural and
efficient just because of its close integration of database schema and programming data
model, shown in Chapter 2.
5.5.2 Client Caching and Server Caching
People may raise a question immediately after we said the join operation is one of
the major reasons why the relational approach is so slow. What about data caching in a
Sybase server (10M buffer size), -it is highly possible that join operations are done in
the buffer (memory), if the join attributes are properly indexed?
It is true that the relational database system is trying to do all the joins in the buffer
for high performance consideration, and Sybase really does a very nice job in query
optimization. Sybase is a typical client-server architecture, and all the client performance
depends heavily on the server's efficiency. This is a so-called centralized processing,
while the client program is relatively small, requiring very little memory. But one thing
we should notice, the relational database server is developed for a multi-user environment.
Even though its resources are considerably large, each user has a strict limit.
Again, client caching for the relational approach is not suitable in DSQL spatial
predicate processing. Sybase could also provide a client side caching (buffering), when
a large number of resulting tuples are returned from the server, the client could have a
buffer pool for those resulting tuples, and make use of the data in the client buffer pool.
High performance is gained accordingly. But our case is completely different, what we
have is a small piece of returning data and extremely high frequency usage for each SQL
(i.e. this SQL is executed thousands of times in one spatial query processing). Thus
Sybase's client buffering could not bring us high performance.
Let's go back to ObjectStore when client caching is used, the database server is only
involved whenever a page fault occurred in the client program. ObjectStore is a typical
CLIENT-server architecture, and the server's efficiency is no longer a critical factor, as
the client machine is normally workstation. This is basically decentralized processing,
the client program is quite large and requires more memory. When data is fetched into
memory, it is cached in the client machine (local workstation). Since workstations are
mostly for single-users, the application buffer size for client caching is therefore larger
than server caching, even the total buffer size of the server is definitely much larger.
So with DSQL, client caching is also one of the important reasons for ObjectStore
outperforming Sybase. We can see from the following table how many times a boundary
is crossed in spatial predicate query processing.
Table 5.10 Crossing Boundary in Relational Database System
5.5.3 Data Clustering
Object-Oriented vs. Relational From the efficiency point of view, one of the most
important methods is to use the complex object as a unit of clustering in the database, so
that a large collection of related objects may be retrieved efficiently from the database.
The typical operations performed on the hierarchy of complex objects are the navigation
through links and the retrieval of the ancestors/descendants of a given node. [3]
Relational Approach
Block
(composite object)
Vertex Vector
Object-Oriented Approach
one segment
Block
cent n ces (top.
Face
(dependent objects)
Figure 5.3 Composite Object Clustering
The boundary representation of geometric objects is clearly a hierarchical data
81
structure, so for normalization purposes several levels of tables should be created, in
Sybase, sacrificing the benefit of data clustering. While in an OODB (Objectstore), all
the data members objects are clustered during the object construction, and hopefully one
disk I/O could retrieve the complex object if it could fit into one page. When a composite
object's (cuboid) instance is created, all dependent objects (e.g. faces4 and centre) are
created accordingly and clustered in the disk with the cuboid's other data members (e.g.
name). Due to its small size, all the objects, composite and dependent, could be stored in
one page. So the retrieval will be done in one disk I/O, while the non-clustered approach
will require much more YO.
Normalized vs. Less Normalized In the Pure Relational approach of DSQL, we
have the maximum possible clustering in each table for performance reasons, but the
decomposition of three levels of tables for normalization purposes, stop us from any
more performance improvements. That is the reason why we have come up with another
relational approach, Less Normalized, simply by reducing the degree of normalization
(i.e. combine the Face table and Curve one into a larger table and hence better clustering).
But the result is quite unexpected. Why?
Sybase's query optimizer normally will assign more resources to the stored procedure
in a normalized approach, since join operations are involved. However, the stored
procedure in the less normalized approach has no join operation at all, and naturally
a Sybase optimizer will consider it to be a non-critical query, and hence assign less
resources to it. That is the reason why we could not get better performance in the less
normalized approach, as seen in Table 5.9.
Which is also a composite object.
When we turn on the Sybase statistics I/O, a completely different result comes out.
The problem, we suspect, is related to server caching. As we mentioned before, the server
is designed for a multi-user environment, and each client process has a limit of the server
resources. Even though there is only one client process running, the server will still be
unwilling to assign more resources to it than it limits. Due to different requirements,
analyzed by the query optimizer, the stored procedure in the normalized approach will
definitely obtain more resources (e.g. buffer) than that in the less normalized one, and
hence better performance. But when the statistics I/O of Sybase is turned on, for general
purpose, a fixed amount of resource will be assigned to the SQL query. Since the
first stored procedure is much more complicated, 5 select statements, than the second,
the statistics process for it will have a heavier load and may even use up some of the
resources assigned to the first stored procedure. So the first stored procedure will be
delayed by less resources and also more statistics information handling. The second one
is different as it may receive more resources than before, thus better performance.
Irrespective of the actual outcome of the investigation, one thing is quite clear.
The query optimization techniques used by a relational database system have their own
limitation. In the absence of application specific information, it cannot produce an optimal
query processing plan. Secondly, as the server caters to a large number of clients, with
a high variation of workload, its objective of design is to provide the largest possible
throughput for all the transactions, not the performance of a single query.
5.5.4 Complexity of Query
The only part when the object-oriented approach is not satisfactory is the cold stage
for simple (non-spatial predicate) DSQL processing.
The client-SERVER architecture is best suited for simple on-line SQL processing
(e.g. business applications). While the CLIENT-server architecture requires a relatively
longer time for the warm up of less powerful client machine. That is why we get worse
performance of the DSQL in this scenario. But once the client machine has been warmed
up, the performance is greatly improved, and even produce a better performance than
a client-SERVER one.
5.6 Relational Approach with PreLoading
In the above section, we have discussed several factors that contribute to the
performance advantage of the OODB approach. It is clear that we could not improve the
performance of the relational approach by the traditional methods (e.g. pre-join). In this
section, we describe an experiment that attempts to narrow the performance gap.
Borrowing the idea of ObjectStore Memory Mapping and Pointer Swizzling, we
come up with the idea to use client caching by preloading all the data set into the client
machine's virtual memory and handle the object by its virtual memory address instead
of the relational key. This is called PreLoading.
In the Block World, we would like to see that given a block ID which is unique,
we should be able to link it to an object address inside of virtual memory. Whether
this object is in memory or in swap space is clearly beyond our interest, and could be
handled by the 0s. With the help of preloading, all of these can be accomplished, and
hence better performance and more concise coding. Also, in the preloading strategy, data
clustering is achieved by constructing the composite object in the consecutive addresses
in virtual memory, just like what ObjectStore does in data clustering. The retrieval of
one block object may at most require one disk 110.
5.6.1 Query Processing
The schema for PreLoading is like a hybrid of relational and object-oriented ap-
proaches, in a sense that the database server side is still a relational database, while the
client side the DSQL is processed in exactly the same as the OODB approach. A virtual
object-oriented database view is presented to the DSQL OODB query processor. This
virtual DB will provide a block pointer, given a read block command, see Figure 5.4.
virtual-db
Read
blockgt block-id (key)
(virtual me
Figure 5.4 Relational Key Links to Object Pointer in Virtual Memory (read)
A virtual database system can be defined as following:
class virtwl-db {
private:
PointerConverter pointcvt;
MemoryManager memoryMgr;
public:
P Provide a block pointer in client machine, given a block key */ Block* ReadBlock(int block-id);
P Update the block info in database from the block info and key */ void WriteBlock(Block* theBlock, int block-id);
P Insert a new block into database and keep track of its pointer */ void WriteNewBlock(Block* theBlock);
P Delete a block in database, and free its space in client machine */
void DeleteBlock(int block-id);
1
The only difference of the PreLoading method to the object-oriented one is just that
we have to explicitly call the ReadBlock, WriteBlock, WriteNewBlock and DeleteBlock
to do block handling (e.g. if we want to access the name of block, we will do { Block*
a-block = ReadBlock (block-id); a-block->Name() ... ), not just like in Objectstore as
we could do direct access through a Block pointer.)
As you see in Fig. 5.4, a virtual database server is provided for users to read, write,
delete and insert (e.g. read operation will send a block-id (key in Sybase) to virtual-db).
The virtual-db sever will invoke the pointer converter to hash for the object pointer in
virtual memory. Upon success, return the pointer, otherwise the memory manager invokes
db-library calls to pose the SQL (stored procedure) to Sybase to obtain all the necessary
information of the block object, which are a set of records, finally allocate space in the
virtual memory (heap) and recursively bind information to the block (complex object)
and its boundary, finally the pointer converter will maintain the link of the block key
to the block pointer of virtual memory in the hash table. Since it is obvious we could
not allocate space in virtual memory (swap space and memory) all the time, due to the
limitation of swap space, once the swap space has overflowed, first-in-first-out (FIFO)
strategy is used for objects replacement. Thus we create a linkage between the block
key (persistent) and its transient pointer in virtual memory, avoiding accessing Sybase
to retrieve a block all the time.
Block* ReadBlock (int blockid) {
1. Invoke the pointer converter to hash for the block pointer;
2. If success then return the block pointer in virtual memory;
3. Upon failure, invoke the memory manger to send SQL to remote Sybase;
4. Wait for the result of SQL;
5. Allocate virtual memory space for the composite block object.
6. Recursively bind the SQL result to block object, including sub-objects;
7. Invoke the pointer translator to maintain the link (key to VM pointer);
8. Return the pointer;
1 WriteBlock (Block* theBlock, int blockid) {
1. Convert the information of theBlock into SQL queries;
2. Send the SQL to Sybase to update the old block info;
3. Wait for the SQL to finish;
1 WriteNewBlock (Block* theBlock) {
1. Convert the information of theBlock into SQL queries;
2. Send the SQL to Sybase to insert this new block;
3. Wait for the SQL to finish;
4. Keep this new link of key to pointer in hash table;
1 DeleteBlock (int block-id) {
1. Send the SQL to Sybase to delete this block;
2. Wait for the SQL to finish;
Remove the link of key to pointer in hash
Deallocate the space of this block in client
Performance Comparison
table;
machine;
The performance comparisons for the preloading strategy are:
Table 5.1 1 Performance Comparison During Initialization
Table 5.12 Non-spatial Predicate Comparison Based on 500 Cuboids
89
Table 5.13 Spatial Predicate Comparison Based on 500 Cuboids
5.6.3 Discussion
The PreLoading method seems to solve the problem of impedance mismatch between
database schema and programming data model, and thus we could apply all the efficient
methods to data manipulation. Client Caching, OID Accessing, Data Clustering built on
top of a relational database could bring us an amazing improvement of performance with
a reasonable cost. This shows how much the object-oriented concepts are important to
engineering applications.
Our Preloading Strategy for linking block-id (key) to block pointer in virtual memory
also could be considered as an emulation of memory mapped 110 architecture. We are
trying to preload all the objects in the virtual memory at once and upon a page fault the
operating system will take charge of swapping in the page we need. But our strategy
still could match with the real memory mapped architecture for two reasons:
DISK SPACE
Objectstore PreLoading
program heap
Figure 5.5 Indirect Mapping Based vs. Memory Mapped
program heap
Initialization of the DSQL system:
I. Initialization Cost: Create all the blocks in the virtual memory. During initializa-
tion, all the blocks read in from the data space on disk will be placed in virtual
memory, which includes real memory and swap space, so an extra disk copy
(from data space to swap space) is unavoidable. It is obvious that the limit for
virtual memory is the swap space size.
II. Swap Space Size: For PreLoading, obviously the limit for virtual memory is the
swap space size, while no swap space is needed for ObjectStore. An object is
swapped into the memory directly from the data space only when a page fault
occurs, which means a dereference to the object pointer in memory was failed and
the ObjectStore server is awakened to swap in the page with the specific object.
So the size of the virtual memory = real memory + data size, and therefore the
virtual memory should be considered limitless.
On the other hand, PreLoading does have some advantages over the OODB approach:
PreLoading: The swap space is either local or remote. When there is local
swap space available, the object could be retrieved locally without any network
communication delay.
ObjectStore: The data space is either local or remote. When data is stored on a
remote disk, there is an overhead of network communication for object retrieval.
Further, there is an overhead of context switching in awakening the OS server
when a page fault occurs.
For the above reasons, we could understand why some results of the PreLoading
method are a little better than ObjectStore. As seen in Table 5.13. Further, the data
operations of ObjectStore are always done remotely, while for the two other relational
approaches with PreLoading the data operations could be done in the local swap space
without any network communication overhead.
However, the Pre-Loading method could not solve the problems of data recovery
and concurrency control, and to develop a whole database system with data recovery
and concurrency control is too extra work. So OODBMS is best suited for this kind of
application with impressive performance and ease of coding.
5.6.4 Summary
From each of the 4 kinds of implementation approaches for DSQL, it is observed
how important the implementation using an object-oriented concept is.
4 + 4
Spatial Query Non-Spatial Query
Objectstore
PR (Normalized)
LN (Less Normalized)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PreLoading
Figure 5.6 Performance Comparison of 4 Approaches in DSQL
Chapter 6 Conclusions
Much work has been done in how to process an ad hoc on-line query efficiently,
and how to apply the version control to engineering applications. The query processor
and optimizer were carefully designed to achieve a better performance. The performance
results show that the customized query optimization strategies could bring much better
performance than the generic one supported by the DBMS. It is easily seen from the
DSQL implementation efforts, an ASQL query processor and optimizer are affordable.
A passive monitoring process will provide version management, especially during
the merging operations. We have built a system with an X Window interface along with
a DSQL interface to Block World, HOOPS (X Window based graphic package) is used
for the 3-D display.
With the implementation effort on DSQL, the benchmark obtain from a more mean-
ingful environment for performance comparison which make use of the spatial query
language. Data clustering and memory mapped architecture are important in the perfor-
mance comparison benchmark, which closely matches the two most important notions of
an object-oriented approach - composite object and object identifier, while none of these
are present in the pure relational approach. Comparisons on the absence and presence of
data clustering or memory mapped feature will give another proof on the importance of
these two features to engineering applications.
After finishing the performance comparison, it is easily seen what the major advan-
tages and disadvantages are in using OODBMS' and relational DBMS' for low level
data management. The impedance mismatch is the real bottleneck in the relational ap-
proach, and the normalization is not always a factor for poor performance in engineering
applications. Further, we have successfully applied most the object-oriented features to
relational databases using PreLoading, and gained almost the same performance result
as in ObjectStore. But since Pre-Loading method could not solve all the problems, such
as data recovery and concurrency control, we still consider OODBMS is a better choice
of platform to implement an ASQL.
From all these performance comparisons, it is obvious that in building a GIs on an
OODB is much better than building on a relational one, with regards to performance and
coding5. There are many important reasons why an OODBMS (ObjectStore) is better
suited for our GIs than a RDBMS (Sybase) might be. Speaking generally, the model
supported by ObjectStore often matches more closely to our data model. This means that
there is less overhead for joining things together which may have been split up due to
normalization. Further, there is much less translation between the data on disk and the
host language's data model when ObjectStore is used.
Using this strategy, the preprocessing for the DSQL will be constantly less than 2
seconds, which only occupies 5-10% of the whole query6 processing time. Thus, we
could meet the performance criteria for an ad hoc query's on-line processing, and provide
a unique query processing strategy for both interactive and embedded usages.
From the development of the DSQL query processor, using a Bi-Level data model
and an OODBMS, the development of an ASQL query processor is easy and could bring
the maximum benefits.
Even though the power of client machine really does matter in ObjectStore.
A query with reasonable complexity, i.e. with spatial predicate(s).
In future research, a R-tree index will be created in the spatial predicate processing,
and the investigation will be done in how indexing affects the query performance
and object-oriented and relational comparison. Since the impedance mismatch is the
bottleneck of query processing in relational systems, another interesting issue would be
the crossing boundary in OODB system. More experiments will be done in how to
improve the query processing in OODB systems.
Reference
[I] Aref, W.G. and H. Samet
Optimization Strategies for Spatial Query Processing.
Proceedings of the 17th international conference on VLDB, Sept, 1991.
[2] Cattell, R.G.G.
Object Data Management - Object-Oriented and Extended Relational Database
Systems.
Addison-Wesley Publishing Company, 199 1.
[3] Cheng, J.R. and A.R. Hurson
Effective Clustering of Complex Objects in Object-Oriented Databases.
ACM SIGMOD 91's Conference Proceedings, Vol 20, June 1991.
[4] Choi, A.Y.L. and W.S. Luk
A Bi-Level Object-Oriented Data Model for Geographic Information Systems.
IEEE COMPSAC, Chicago, October, 1990
[5] DeWitt, D. J.
The Wisconsin Benchmark: Past, Present, and Future.
The Benchmark Handbook - for database and transaction processing systems.
Morgan Kaufmann Publishers 199 1.
[6] Dual, J. and C. Damon
A Performance Comparison of Object and Relational Databases Using the Sun
Benchmark.
OOPLSLA '88 Conference Proceedings, SigPlan Notices, Volume 23, Number 11,
November 1988.
Kim, W.
Object-Oriented Databases: Definition and Research Directions
IEEE Transactions on Knowledge and Data Engineering. Vol 2. No. 3. September
1990.
Lamb, C., G. Landis, J. Orenstein, and D. Weinreb
The ObjectStore Database System.
ACM Communication. Vol 34, No. 10, October 1991.
Luk, W.S. and A.Y.L. Choi
Dynamic Spatial Query Language: A Customized Query Language for Object-
Oriented Database Systems.
IEEE COMPSAC, Tokyo, Semptember, 1991.
Moss, J.E.B.
Object-Oriented as Catalyst for Language-Database Intergration.
Technical Report MIT-LCS-260, PhD. Thesis, MIT.
Orenstein J., S. Haradhvala, B. Margulies and D. Sakahara
Query Processing In the ObjectStore Database System.
ACM SIGMOD 92's Conference Proceedings, Volume 21, June 1992.
Rosenberg, J.B.
Geographical Data Structure Compared: A Study of Data Structures Supporting
Region Queries.
IEEE Transactions on Computer-Aided Design, 4, 1, January 1985
Samet, H.
The Design and Analysis of Spatial Data Structure.
Addison- Wesley Publishing Company, 1990
[14] Snodgrass, R.
The Temporal Query Language TQuel.
ACM Transactions on Database Syetms, Vol. 12. No. 2. June 1987.
[15] Stonebraker, M. and L.A. Rowe
The POSTGRES Data Model.
Proceeding of the ACM SIGMOD Conference, Washington D.C. 1986.
[16] Strousstrup, B.
The C++ Programming Language.
Addison-Wesley Publishing Company, 1986.
[17] Winslett, M. and S. Chu
Using a Relational DBMS for CAD Data.
Technical Report UIUCDCD-R-90-1649, CS, U. of Illinois, Urbana, 1990.
1181 Xidak
Orion System Overview
Xidak, Inc., Palo Alto, CA, 1990.
[19] Zhang J.
Graphic User Display in Geographical Information Systems.
Master's thesis, Simon Faser University, 199 1.
Appendix A Data Model
Block Object Data Model:
chss Block {
private:
Vertex* block-center;
char* name;
. . . . . . public:
char* Name();
Bool On-Top-Of(Block*); // spatial predicate.
Bool On~Bottom~Of(Block*);
. . . . . . void Top-Of(Block*); // update finction.
void Bottom-Of(Block*);
. . . . . . virtual void Transhte(float, float, float); // h, Ay, Az.
virtual void Rotate(char, float); // coordinate and degree.
virtual Face* Top(); // retrieve face.
virtual Face* Bottom();
virtual float Volumn();
virtual char* Type();
class Cylinder : public Block {
private:
Face *top, *bottom, *side;
. . . . . . public:
void Translate(float, float, float);
void Rotate(char, float);
Face* Top();
Face* Bottom();
. . . . . . float Volumn();
char* Type();
1
. . . . . .
Geometric Object Data Model:
class Geo-Obj {
private:
Vertex *center;
. . . . . . public:
virtual void Transhte @oat, float, float);
virtual void Rotate(char, ma t ) ;
Vertex* Center();
class Face : public Geo-Obj {
private:
Vector* normal;
char* color;
. . . . . . public:
virtual void Translate @oat, float, float);
virtual void Rotate(char, float);
virtual void Boundary(Set&); // obtains the boudaries of this face.
float Normal();
char* Color();
class Curved-Face : public Face {
private:
. . . . . . public:
virtual void Translate float, float, float);
virtual void Rotate(char, float),.
virtual void Boundary(Set&);
class Cylind-Surface : public Curved-Face{
private:
float Height(), Radius();
Curve *top, *bottom;
. . . . . . public:
void Translate float, float, float),.
void Rotate(char, float);
void Boundary(Set&);
class Curve : public Geo-Obj {
private:
Vector* normal;
float length;
. . . . . . public:
void Translate float, float, float);
void Rotate(char, float);
Vector* Normal();
float Length();
class Edge : public Curve{
private:
void LeftEnd(Vertex&); // obtains the left end of this edge.
void RightEnd(& Vertex);
. . . . . . public:
Appendix B
A1 Artificial Intelligence
ASQL Application Specific Query Language
CAD Computer Aided Design
DB Database
DBMS Database Management System
DDL Data Definition Language
DML Data Manipulation Language
DSQL Dynamic Spatial Query Language
GIs Geographical Information System
GUI Graphical User Interface
110 Inputloutput
LN Less Normalized
OID Object Identifier
OIS Office Information System
00 Object-Oriented
OODB Object-Oriented Database
OODBMS Object-Oriented Database Managament System
OS Operating System
PR Pure Relational
RDBMS Relational Database Management System
SQL Structured Query Language
YACC Yet Another Compiler-Compiler