formalization of object-oriented database model with rules

17
Formalization of object-oriented database model with rules S Goutas, P Soupos and D Christodoulakis The paper investigates knowledge representation in an object- oriented database management system first within the data model with rules and second in the computational model by using logic. Issues o[" structure, integriO', and retrieval are .[ocused on. The proposed system provides object-oriented coneep/s.[br describing complex structured data, rules .Jar expressing ol~ject-dependent constraints and objeet assockttions, and, .finally, logic for it!['er- ence and retrieval. databases, database management, object-oriented databases, data models, knowledge representation, production rules, queo' languages, logic programming In recent years there has been a substantial influx of ideas in the database management system (DBMS) technology coming from object-oriented programming languages, logic programming, and rule-based systems. This has stimulated a lot of research effort in the area, and a number of object-oriented (OO) models and systems have been put forward ~-4. A proposal that seems to attract attention recently is the integration of OO and deductive databases 5. This paper investigates the use of rules in an object-oriented DBMS (OODBMS). Issues of structure, integrity, and retrieval are focused on. Even though the purpose is not to tackle issues of an OO knowledge-based management system, but rather to deal with the incorporation of rules in an OODBMS, the proposed system can be used as a knowledge base because it provides objects for describing complex struc- tured data, rules for expressing object-dependent con- straints and object associations, and, finally, logic for inference and retrieval. A major issue in OODBMSs nowadays is the degree to which knowledge is explicit or embedded. Knowledge about object types is expressed mainly in the form of methods and in that sense it is embedded. When, for instance, attributes and methods are used to express rela- tionships between objects, this representation lacks qua- lities such as naturalness, simplicity, uniformity, and understandability. Rules, however, which are the most popular form of knowledge representation in expert systems, can serve as a vehicle for expressing functionali- Computer Engineering Dept and Computer Technology Institute, University of Patras, 26500 Patras, Greece ties of objects declaratively with all the previously men- tioned attractive features. Another major issue that emerges from knowledge representation is that of integrity constraints. Most exist- ing OO systems 6-9 typically rely on the inherent con- straints ~° of the type system to assure consistency in the sense that a type constrains the permissible methods that cfin be applied to an object of that type t~. To overcome the limited constraint specification facility provided by inherent constraints alone, OO systems must be provided with a mechanism for specifying explicit constraints j°. Such constraints express explicitly value ranges, depen- dence, and any other type of restriction that cannot be expressed by the type system, for any kind of method and not only for update ~2. Here again rules can give an answer as in typical 'ifX then Y' rules, X mr]st be true for a situation to infer action Y. Yet another well recognised problem is that the role of declaratively specified knowledge in the form of con- straints must be re-examined in the sense that in an OO environment where any operation may be user defined, the system cannot automatically determine which oper- ations can see the effects of which others, or what the side-effects of a given operation might be t3. This comes as a result of the absence of explicit connection between constraints and the pertaining operations. To minimize checking, as many constraints, related to an operation, as possible should be in the same description as the operation itself. The formalism of rules again provides the solution. Based on the preceding discussion, the authors intend to explore the applications of rules to the OO paradigm as a means of expressing knowledge that is uniform throughout the type lattice but needs to be customized for each object type with integrity constraints. This knowledge is not limited to the domain of an object type 2, but also includes associations with other object types. More specifically, associations between instances, between object types, and between object types will be considered in different databases, grouping of instances, creation and deletion of object instances, and any other activity that would require a uniform approach to assure consistency and transparency. Finally, based on the representation of associations between objects, which as already mentioned can be made by rules, the use of a logical query language will be discussed. The combination of logic and object often- vol 33 no 10 december 1991 0950-5849/91/100741-17 © 1991 Butterworth-Heinemann Ltd 741

Upload: s-goutas

Post on 26-Jun-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Formalization of object-oriented database model with rules

Formalization of object-oriented database model with rules

S Goutas, P Soupos and D Christodoulakis

The paper investigates knowledge representation in an object- oriented database management system first within the data model with rules and second in the computational model by using logic. Issues o[" structure, integriO', and retrieval are .[ocused on. The proposed system provides object-oriented coneep/s.[br describing complex structured data, rules .Jar expressing ol~ject-dependent constraints and objeet assockttions, and, .finally, logic for it!['er- ence and retrieval.

databases, database management, object-oriented databases, data models, knowledge representation, production rules, queo' languages, logic programming

In recent years there has been a substantial influx of ideas in the database management system (DBMS) technology coming from object-oriented programming languages, logic programming, and rule-based systems. This has stimulated a lot of research effort in the area, and a number of object-oriented (OO) models and systems have been put forward ~-4. A proposal that seems to attract attention recently is the integration of OO and deductive databases 5. This paper investigates the use of rules in an object-oriented DBMS (OODBMS). Issues of structure, integrity, and retrieval are focused on. Even though the purpose is not to tackle issues of an OO knowledge-based management system, but rather to deal with the incorporation of rules in an OODBMS, the proposed system can be used as a knowledge base because it provides objects for describing complex struc- tured data, rules for expressing object-dependent con- straints and object associations, and, finally, logic for inference and retrieval.

A major issue in OODBMSs nowadays is the degree to which knowledge is explicit or embedded. Knowledge about object types is expressed mainly in the form of methods and in that sense it is embedded. When, for instance, attributes and methods are used to express rela- tionships between objects, this representation lacks qua- lities such as naturalness, simplicity, uniformity, and understandability. Rules, however, which are the most popular form of knowledge representation in expert systems, can serve as a vehicle for expressing functionali-

Computer Engineering Dept and Computer Technology Institute, University of Patras, 26500 Patras, Greece

ties of objects declaratively with all the previously men- tioned attractive features.

Another major issue that emerges from knowledge representation is that of integrity constraints. Most exist- ing OO systems 6-9 typically rely on the inherent con- straints ~° of the type system to assure consistency in the sense that a type constrains the permissible methods that cfin be applied to an object of that type t~. To overcome the limited constraint specification facility provided by inherent constraints alone, OO systems must be provided with a mechanism for specifying explicit constraints j°. Such constraints express explicitly value ranges, depen- dence, and any other type of restriction that cannot be expressed by the type system, for any kind of method and not only for update ~2. Here again rules can give an answer as in typical ' i fX then Y' rules, X mr]st be true for a situation to infer action Y.

Yet another well recognised problem is that the role of declaratively specified knowledge in the form of con- straints must be re-examined in the sense that in an OO environment where any operation may be user defined, the system cannot automatically determine which oper- ations can see the effects of which others, or what the side-effects of a given operation might be t3. This comes as a result of the absence of explicit connection between constraints and the pertaining operations. To minimize checking, as many constraints, related to an operation, as possible should be in the same description as the operation itself. The formalism of rules again provides the solution.

Based on the preceding discussion, the authors intend to explore the applications of rules to the OO paradigm as a means of expressing knowledge that is uniform throughout the type lattice but needs to be customized for each object type with integrity constraints. This knowledge is not limited to the domain of an object type 2, but also includes associations with other object types. More specifically, associations between instances, between object types, and between object types will be considered in different databases, grouping of instances, creation and deletion of object instances, and any other activity that would require a uniform approach to assure consistency and transparency.

Finally, based on the representation of associations between objects, which as already mentioned can be made by rules, the use of a logical query language will be discussed. The combination of logic and object often-

vol 33 no 10 december 1991 0950-5849/91/100741-17 © 1991 Butterworth-Heinemann Ltd 741

Page 2: Formalization of object-oriented database model with rules

tation has already attracted attention 4,t~7 due to the attractive features that both paradigms exhibit. Further- more, the computational model of OODB systems is an open issue as it is not included in the list of their essential features and logic is an attractive possibility. The result is by no means a typed logical system because data are not simply typed but encapsulated as independent objects that communicate through message-passing. Apart from the declarative nature, the well defined semantics and the simple notation that the query language gains by using logic help to demonstrate that it can lead to a mechanism for creating automatically new associations between objects, based on query monitoring.

OVERVIEW OF DATA MODEL BASICS

This section discusses the basic concepts of the data model. This model is composed of what the authors consider to be essential structural components of rep- resentation in application areas such as computer-aided design/computer-aided manufacturing (CAD/CAM), office information, artificial intelligence (AI), etc. The authors have focused on simplicity and have used as few basic constructs and concepts as possible. After all, the abundance of concepts with some of them overlapping is one of the major criticisms against OO systems.

Objects

A major issue of OO systems is the separation between type and extent, and various approaches to the problem can be found in the literature ~s. The separation of type and extent is, according to the authors' view, more flex- ible an approach as each type is not forced to have a unique associated extension. More specifically, a type may not have an associated extension, thus constituting a structural and behavioural abstraction, or it may have multiple extensions, e.g., intermediate results, to avoid repeated computation during query processing.

The authors ' model distinguishes between two object types, collection objects and type objects. Collection objects are types with an associated extension and thus are equivalent to classes as defined by Buneman and Atkinson 2s. Type objects are types without an associated extension.

Basic principles The world is viewed as an object lattice. There are three types of objects:

• instances, which represent concrete beings • collection objects, which describe collections

instances of the same type • type objects, which describe types

of

For the rest of the paper all three types are referred to as objects; reference to a specific type is made wherever this is required.

Objects are observable through their properties. Properties are of two kinds: attributes and rules. Attri-

butes are properties to which values can be assigned, while rules are constraints on the allowed values and combinations of values of attributes.

Attributes and rules are not objects. So there is a clear distinction between objects and their properties. Further- more, properties cannot exist independent of objects. In this sense there is no complete homogeneity. If there was complete homogeneity then rules and attributes should have been viewed as objects. But if rules are viewed as objects, then they should have rules, and so on. Similarly, if attributes are objects, they should have their own attri- butes, which are objects, and so on. Therefore, it is necessary to stop arbitrarily at a certain level to break the circularity ~9. As a clear distinction is made between objects and their properties it is obvious where the circu- larity breaks.

Object types are either basic or user defined. The basic object types constitute the substratum of a specific appli- cation by providing the primitive properties of the appli- cation. User-defined object types are the means for des- cribing an application from various viewpoints and at various levels of detail.

Object types are defined at distinct levels. Existing object types and basic object types constitute each time the history of the modelling process. The properties of an object type can be of two kinds; those inherited from the history and those defined at the object type.

The instances of a collection object share the same properties with the collection object. As instances repre- sent concrete beings their attributes have an associated value.

The set of instances associated with a collection object constitutes the extension of the collection object. The extension of a type object is determined by its ISA asso- ciations 2° with other collection objects; it is called virtual extension and the component instances virtual instances. In other words, even though type objects serve as a modelling abstraction tool and as such they do not carry any real data, they provide a view to any real data that correspond to instantiations of their attributes that are inherited by collection objects.

Every object has a unique identity, which is indepen- dent of the content of the object it identifies. This identity is a user-defined name as far as type or collection objects are concerned and an alphanumeric identifier automati- cally generated by the DBMS in the case of instances.

Formal notation and definitions Definition 1 Let T = {z~, z2,..., ~,} be a set of attributes, P(T) the power set of T, and ToP(T) a subset of T. For each attribute zieT, there is a set dom(zi), called the domain of zi, containing the values t,)jeN, t!i is the j th value of the attribute z~. []

As a matter of fact the domains of zi correspond to the domains of the predefined types integer, real, character, string, and Boolean. Furthermore, these predefined types are not treated as objects in the context of the data model presented in this paper and as explained in the previous

742 information and software technology

Page 3: Formalization of object-oriented database model with rules

subsection; this is where the circularity as to what is an object breaks.

Definition 2 Let T be a set o f attributes. The following are definitions:

• The state s(T) o f T is defined as s(T) = {t~/[t~/¢ dora(r3, i=1 ..... n /x j¢l~l}

• An action A: s(T)-+s(T) to be a mapping on the state s(T) o f T.

• A condi t ion c(T')IT' c T to be a logical expression with variables that correspond to attributes in T'.

• The set o f basic operat ions A~ as the set o f all actions that can be applied on all states of attributes in T

A, = {Aidom(A) = U s(T) and TeP(TE.

As a rule the triple R = (rname,c(T') ,A) where T' _~T, rname is a user-defined character string representing the name of the rule, c(T') is a condit ion, and A is an action. The set o f all rules on T is denoted as N(T). A basic rule R~ is a rule such that:

R, =(rname, c(T'),A)IAcAI~. [5

To clarify the above definition consider the following example:

Let T be the set of attributes {x,yl and let dora(x) = dora(I), dora(y) = dom(ll), where I is the type integer and 111 the type real. Suppose it is required to define a rule called INCREMENT that increments x and y if the value of x is less than the value of),. Then A: ( x , y )~ (x+ l ,y+ 1) is the action, c(x,y):x <y is the condition, and R: (INCREMENT, ~x<y), A) is the rule.

The concept o f rules specifies 'a l lowed ' operat ions on specific states as well as the condit ions under which these operat ions are allowed. This enforces integrity con- straints on states by allowing certain operat ions on certain states so that the operat ions always lead to lawful states.

In this definition the not ion o f basic operat ions appears as mappings applicable to all states. Consider these basic operat ions as the operat ions provided by a common , underlying persistence mechanism available to every attr ibute and set o f attributes. Such operat ions can be: insert instance, delete instance, replace value o f attri- bute, etc.

Definition 3 Let T e P(T) be a set o f attributes and r ( T ) _ N(T) a set o f rules on T. The set o f attributes T with its rules r(T) is called basic object type %(T)

o~(T) = (name,T,r(T)).

name is a character string unique for every o~(T). The set BO(T) = {(T,r(T))ITcP(T) and r(T)~_ N(T)} is the set of all basic object types defined over T. []

The set o f basic object types BO(T) is the set o f primitives

that can be used when modell ing a specific application. The not ion o f the basic object type is equivalent to that o f the mixin flavour o f the Flavors system 2~. A basic object type cannot be instantiated, but its properties can be combined to construct user-defined object types. Thus separate properties defined in separate basic object types can be chosen. For example, the basic operat ions men- tioned before are defined at a special basic object called SYSTEM. The properties o f SYSTEM are included in every user-defined object automatically.

BO(T) forms the basis, level 0, on which the construc- tion o f new user-defined object types is based. That is, a user-defined object type at level 1 O ~ has a set of attri- butes K' and a set o f rules r~(K~')IK~'~K~. The set of attributes K ~ contains attributes found in basic object types as well as user-defined attributes. Similarly, the set of rules r ~ (K t') contains rules o f the basic object types and user-defined rules. To be more formal, consider the following definition.

Definition 4 Let BO(Tk) = /% (Tk)lk = I ..... L /x T~ e P(T)}, BO(Tk)e B O (T) be a set o f basic object types. A user-defined type object or simply type object O~ at level 1 is the triple (name, K L, r~(KV)) where name is a user-defined char- acter string that is unique in the database and K ~, r~(K v) are defined as follows.

• K ' = AI UA~,where : • A[ c -o~_~Tk, • A ~ _ T - U ~._l Tk

• rl(K ' ') = r~ U r~ where: • r~ = {R~/=(rname, ci(Lt/), A/)[.j = 1 ..... /, Ll/ _ K I'

and A i c A¢}, • r ~ = l R , . = ( r n a m e , c,,(LJ,.), A,,)lv = I ..... w, L t,, _~ K 1'

and dorn(A,.) = s(K0}, K" ___ KL

In the above definition K ~, the set o f attributes o f O{ is defined as the union of the set A ~ o f attributes defined in BO(Tk) and the set A ~ of user-defined attributes. Simi- larly, the set o f rules P (K v) is the union of the set r~ of basic rules, and the set r~ of user-defined rules.

The definition o f a type object at level m is similar to the definition o f a type object at level 1. The only differ- ence is that for the level m definition, the attributes and rules o f objects defined at the previous m - 1 levels can be used in conjunct ion with the attributes and rules o f the basic objects.

Definition 5 A type object O7 at level m is the triple (name, K", r"(K"")) where:

K" = A~ U A ' , • A ~ _ [ U ~-=,Tk] U [ U ~'_--~ Kk],

m H m - - I l k ' k • A 2 ~ - T - Uk=l-'~ -- O~=lTk

and rm(K " ') = r~' tJ r~,

• r'~ = {R~j = ( r n a m e , q(LT), Ai)b/ = 1 ..... 1, L~' _ Km' and Ai¢ AI3},

vol 33 no 10 december 1991 743

Page 4: Formalization of object-oriented database model with rules

• ~ --- {R~ = (rname, c,(L~), A,) Jv = 1 ..... w, L~ _c Km' and dora(A,) = s(Km)}, Km' _~ Km. []

A type object describes a type that cannot be instan- tiated, but its properties can be used to describe other types. As already mentioned, in addition to type objects there are also collection objects, which are object types that can be instantiated. Collection objects inherit a set of basic operations that enable their instantiation, where- as type objects do not. All instances of a collection object share the same properties, which are defined at the collec- tion object. Each attribute of an instance has an asso- ciated value.

Definition 6 A user-defined collection object, or simply collection object, at a level m is similar to a type object of that level augmented with its extension. That is, a collection object O~ at level m is the set:

{(name, K '~, r~(K-')) e (K-)}

where:

(name, K% r=(K-'))

e(K-)

is the definition of O~, K m and r '(K m') are defined exactly as in type objects at level m, and is the extension of O~, e(K ~') = {Ij(K'0ljd~l}, defined as the set of all instances of O~. An instance Ij(K") is the set of attributes defined at O~ together with their associated value, Ij(K m) = (idj, {(~7, t31i = 1 ..... n and j = 1 . . . . . k } ) , K ~' = {~,x~' ..... z~'},t, is the value of the attribute x7 and idj is an alphanumeric identifier produced by the DBMS automatically, unique for every instance. []

Definition 7 The union of the sets K m and r~(K ~') of an object O ~ is called the set of properties x(O ~) of O ~, x(O ~) = K m U r~(Km'). []

Proposition 1 All object types at level m can have properties defined at any of the m levels preceding level m, but they cannot have properties of object types defined at level m or at any subsequent level.

Proof Definitions 4, 5, 6, and 7 lead easily to the above proposi- tion. []

Theorem 1 Let 0 be a set of object types and I the binary relation

m m+l " O" x(OT+I)} on 0. {(O,,Oj ). x'(O7) -- X ( , ) a n d z ' ( O 7 ) c Then 0 is a directed acyclic graph under I.

Proof Definitions 4, 5, and 6 show that the properties of object types are composed by combining properties that have

been defined at existing object types with new properties. The binary relation I consists of pairs of objects that belong to different levels and share common properties. As this relation is antisymmetric it can be represented by a directed graph. According to proposition 1, this graph is acyclic. []

Theorem 1 provides the basis for the notion of inheri- tance, that is, the ability of object types to have proper- ties defined at other object types. From the preceding discussion, it has been seen that the definitions of object types form a multiple inheritance hierarchy. Now two known types of inheritance schemes are focused on. First, inheritance that indicates specialization, the so- called ISA 2°.22,2~, and, second, partial inheritance 24,25, called TYPE_OF, where some properties are inherited and others are suppressed. The formal definition of ISA and TYPE_OF inheritance is continued using the notation introduced in definitions 4, 5, and 6.

Definition 8 Let O ~ = (name, K ~, ~(Km')) and O m+ ' = (name, K '~+ ', r m÷'(K m+'')) be user-defined objects (type or collection) defined at level m and m + 1, respectively. Then:

• O m+l ISA O" iff A'[ '+' = K ' , and r m + l ( K r a + l ' ) =

r~(K-'), • O "+' TYPE_OF O m iffAp +l c K ' , and rm+t(K m+l') c

rm(Km'). []

Definitions 4, 5, and 6 distinguish between type objects, which serve as a modelling abstraction apparatus, and collection objects, which are the real data carders. Nevertheless, type objects should provide a view to data that belong to their specializations through ISA. So the extension of a type object consists of parts of instances of collection objects. Such parts contain only attributes and rules inherited from the type object through ISA. To distinguish this type of extension from that of a collec- tion object, it will be called a virtual extension. Similarly, the instances of a type object are called virtual instances.

Definition 9 Let O~ be a type object at a level m and {C~c;Ji = 1 ..... ~. and j = m + 1 .... ,m + k} a set of collection objects such that CVc~ ISA 0~, Vi = 1,...,L and Vj = m + 1 ..... m + k, then OT has a virtual extension e(K m) -= U ~/=l( U %+--~+t e(K{)). A virtual instance IT(K m) of O7 is any instance in

[]

All the notions discussed so far are depicted in Figure 1. Circles represent type objects, while semicircles represent collection objects. The attributes of the objects are listed next to the objects, As can be seen, both inheritance schemes are depicted. The sea-plane object inherits the waterlevel attribute from boat through TYPE_OF, as can be seen in the definition of sea-plane in Figure 2, and all the attributes of plane through ISA. As already men- tioned, the basic object SYSTEM is connected via ISA with every user-defined object automatically; these ISA

7 4 4 information and software technology

Page 5: Formalization of object-oriented database model with rules

B01

BASIC OBJECTS

0 0 B02 SYSTEM

0 B03 B04

level 0

ISA\ f . .'~ name customer TYPE_0F level 1

drive steering

drag-coefficient

ISA

ISA ISA

~ n o . payload _of_passengers

ISA

hull

medium level 2

ISA

wingspan fuselage

level 3

TYPE_OF

Figure 1. Overview of data model basics

links are not depicted in the Figure. The properties of the other basic objects can be inherited explicitly using either ISA or TYPE OF.

Rules

As already mentioned, the degree to which knowledge representation in an OODBMS is declarative or proce- dural is an important one. OODB systems tend to be procedural because of concepts such as methods 26. Meth- ods describe how an object carries out its operations in much the same way as procedures do. This is quite adequate for programming languages where emphasis is placed on what behaviour the objects exhibit, whereas in databases the behaviour of objects extends to other kinds of declarative semantics. More specifically, there is a need to associate constraints to operations declaratively and not to hide them inside the bodies of methods.

Having taken all that into account, the authors have adopted rules to associate functionality with constraints. They have chosen a basic set of methods that a database object should have, and each such method has had a rule attached to it with the corresponding conditions that must be satisfied so that the method preserves the inte- grity of the data it manipulates. Apart from rules involv- ing basic operations, new rules can be defined by the programmer incorporating existing or user-defined methods.

Rules have already been defined in definition 2 and, as can be seen in Figure 2, have the following general form:

ISA

float

level 4

rule-name: ((condition) (action))

A unique rule-name is required for each rule, so that the rule can be referred to later by the user. Rules are inter- preted using the data-driven or forward-chaining approach where the condition must be matched to the data to infer the method.

The basic operations that are associated with objects and their instances and can be used in the action part of a rule include methods to insert or delete an object instance, replace the value of an attribute of an object instance, as well as to group the instances of an object. Grouping the instances of an object is beneficial for retrieval efficiency provided that it promotes logical organization of object instances. These predicate-derived groups of instances are called aggregations. The defini- tion of an aggregation can be seen in Figure 2; its name is HIGH_SALARY. Furthermore, the set of basic oper- ations includes methods to retrieve all instances of an object, test whether an instance belongs to an object, link instances of two objects, eliminate a connection between two object instances, retrieve all instances that are con- nected (linked) to a given instance, and finally test whether a connection exists between two instances. The insert, delete, and replace methods are meaningful only in collection objects as type objects do not own instances.

A S S O C I A T I O N S B E T W E E N O B J E C T S

Relationships between hierarchically structured objects are widely recognised semantic constructs in OO data

vol 33 no 10 december 1991 745

Page 6: Formalization of object-oriented database model with rules

ISA ~/ v " ~ ISA

WORKS

ISA / \ ISA

PRIVATE-

SECRETARY

ISA ISA

ISA

ISA/ ISA

.___• ISA

TYPE_OF X

,2 I S A

SCHEMA design_dept.

TYPE design dept: (dept name: str, objective: (str).

TYPE staff IS-A design_dept: (name: str, age: int, empl_date: int, salary: int) WORKS: (t (RELATE project n)) HIGH_SALARY: ((salary > = 4000) (AGGREGATE)).

COLLECTION secretary IS-A staff: (languages: int) PRIVATE_SECRETARY: ((languages > = 2)

(RELATE engineer 1)). COLLECTION engineer:

IS-A staff: (degree: str, speciality: str, position: str) PRIVATE SECRETARY: project_leader)

((position =

(RELATE secretary 1)). TYPE project

IS-A design__dept: (name: str, customer: str) WORKS: (t (RELATE staff n)).

COLLECTION engine IS-A project: (h_power: int, capacity: real, aspiration: str,

Figure 2. Definitions of data model basics

valves: int, cylinders: int).

COLLECTION vehicle IS-A project PART OF engine PART OF body: (drive: str, steering: str, drug_coeff: real).

COLLECTION body ISA project: (door no: int, weight: real, material: str).

COLLECTION car ISA vehicle: (passeng_no: int).

COLLECTION truck ISA vehicle: (payload: real).

TYPE craft ISA project: (medium: str, propulsion: str).

COLLECTION boat ISA craft: (waterlevel: real, hull: str).

COLLECTION plane ISA craft: (wingspan: real, fuselage: str).

COLLECTION seaplane TYPE_OF boat ISA plane: (waterlevel: float:

boat!waterlevel, real).

746 information and software technology

Page 7: Formalization of object-oriented database model with rules

models. They serve to express interactions between objects in a natural way. As far as the authors know, interactions are dealt with in two ways, either as instance variables and methods of classes or as concrete objects with fixed description and reference to other objects. The second approach 27 29 is rather static as relationships are objects and their creation and deletion involves schema evolution issues, but it makes relationships apparent at the conceptual level. The first approach 6-3°, on the other hand, is flexible because it is dynamic, but increases the complexity of representation as it buries associations and constraints between objects inside the implementation code.

An approach that combines the advantages of both the aforementioned ways of expressing relationships is that of rules because, first, it provides the flexibility of instance variables and methods given the equivalence between them and attributes and rules and, second, it provides simple representation as rules constitute a for- malism for the declarative description of knowledge.

As already mentioned, a rule consists of a condition and an action. The condition of a rule expressing object association specifies the constraints that must be satisfied to have a valid association. These constraints depend to a large extent on the values of the attributes of the objects. Nevertheless, the association itself is based on object identity, which is independent of the content of an object. So provided that changes of the content of the instances of an object type do not violate the condition of an association, this association is not affected by these changes.

Object sharing is the accessibility of objects by other objects, and in that sense the use of rules to express relationships incorporates the idea of object sharing. More specifically, first, the definition of a rule at a type or collection object includes explicit reference to another type or collection object and, second, the instances of the object where the definition lies can, according to that definition, be associated explicitly, via the object identity, with the instances of the object that appears inside the definition of the rule. Note that this differs from the standard approach, where the attributes of objects are assigned to the object identities of other objects 31.

Associations at object level

The authors view relationships as binary associations between objects following the example of the binary data model s2. As already mentioned, relationships are repre- sented by rules, and as these rules apply discipline to the relationships, the relationships are called disciplined relationships. There are two types of disciplined relation- ships according to whether they represent an association that is known to both the associated objects or to one of them. In other words, when there is a rule in the defini- tion of an object that associates it with another object and there is another rule in the other object that expresses the reverse association, then there is a bilateral disciplined relationship (BDR). When there is only one

rule expressing the association, then there is a unilateral disciplined relationship (UDR).

The knowledge represented by a disciplined relation- ship extends down to the level of instances where it is instantiated as references between object instances. One instantiation of a BDR between two instances is a pair of references. There is a reference from one instance to the object identity of the other instance and vice versa. An instantiation of a UDR is a single reference. These refer- ences are transparent with respect to the user. The instantiation of a disciplined relationship between two object instances is called an assertion. The action that maintains the references between the instances of the related objects is called RELATE and has two argu- ments. Assuming that objects O~ and Or are related to a disciplined relationship R and regarding the rule that defines the relationship at O~, the first argument of RELATE is the name of the object to which O~ is related, that is, 02, and the second argument is the maximum number of instances of object 02 that can be associated with an instance of O~ via the disciplined relationship R.

De[inition 10 Let rl(K'l) and r2(K'2) be the sets of rules of objects O~ and 02, respectively.

A unilateral disciplined relationship UDR(O1,O2) between O~ and 02 is a binary relation

UDR(OI,O2) = I(O,,O_0DP, e r ,(K' ,)}

where

P, = (rnamei, C~(K'3, RELATE(O2m)), m = 1 .....

A bilateral disciplined relationship BDR(ObO2) between Oi and 02 is a symmetric binary relation

BDR(O,,O2) = {(O,,O.,), (O,,O,)l:3P, c r,(K',) A 3P,c r_.(K'2)}

where

P~ =(rname,, C.(K',), RELATE(O2m)), m = 1 ..... P, = (rnamej, C2(K'2), RELATE(OIn)), n = 1 ..... ~ and rname, = rnamej. []

For example, the disciplined relationship P R I V A T E SECRETARY in Figure 2 is a one-to-one BDR between the objects secretary and engineer. WORKS is a many- to-many BDR between staff and project. The optional parameter of RELATE has been omitted in both rules so there is no restriction regarding the associations between instances.

The type of disciplined relationship used to associate two objects is not arbitrary, but is based on the type of situation under consideration. A U D R is used in two cases. First, when an object is to be associated with itself, like the P A REN T relationship given later in Figure 4. Second, to represent the PART-OF relationship that expresses the relationship between a complex object and a component object; this matter will be covered more extensively in the next subsection. In every other case a BDR has to be used.

vol 33 no 10 december 1991 747

Page 8: Formalization of object-oriented database model with rules

Definition 11 Let DR(OI,O2) be a disciplined relationship between Ot and 02, I , (K0 be an instance of Oj, and I2(K2) be an instance of 02.

An assertion LUDR(II,I2) of a unilateral disciplined rela- tionship UDR(OI,O2) is the binary relation LUOR(Ij,12) = {(I,,I2)}

An assertion LBDR(I~,I2) of a bilateral disciplined rela- tionship BDR(O~,O2) is the symmetric binary relation LaDR(I~,I2) = {(II,I2), (I2,10} []

From definition 10 it is seen that for two objects to be involved in a bilateral relationship, two rules with the same name must be defined, one in each object descrip- tion. This may appear as rather unnatural, but if the complete definition of the relationship was placed in both objects it would mean that constraints for both objects should be defined in one rule. That would com- promise information hiding because the involved objects should know each other. As far as the user is concerned relationships are primarily meant to express a bond between two objects, so it is not important to have the full constraint at one place as constraints are automati- cally monitored by the system. Furthermore, it is enough for an object, as a viewing mechanism, to contain infor- mation about its relationships with constraints that involve only whatever has been defined at that object.

Definition 12 Let 0 = {Oil/ = 1 ..... n} be a set of objects and the binary relation AR -- 0 x 0: (Oi,Oj) e AR ¢:" 3UDRi(Oi, O i) x/ 3BDRi(O~,Oi). It is said that an object Oi e 0 is reachable from another object O i e 0 iff 3 0 z , 02 ..... O, c 0: AR(O~,O,),A~(O,,02),..., a~ (0, ,0,) . []

Proposition 2 If Oi is reachable from O/then Oj is reachable from Oi if there is no PART-OF relationship between any two objects in the chain from Oi to Oi.

Proof Straightforward because the PART-OF relationship is a UDR. []

Complex objects Composite objects are an important principle of object orientation and are traditionally based on the notion of nested objects 3,4. This approach is not powerful enough to represent the PART-OF relationship between objects, but an extensive treatment of the subject has been made 3~,33. The PART-OF relationship 32 expresses, in a declarative manner, that every instance of an aggregated object has associated with it instances of its component objects. In other words, it expresses two things between an object and its components, first taxonomy and second reference between instances. Reference between an instance of an object and the instances of its components is much more a relationship than a mere constraint through inheritance rules as it must be clearly specified at any moment which instances are components of which

instance. Furthermore, the existence of the hierarchical relationship imposed by the nesting of the parent object and its components tends to cripple the independence of the components. The components tend to be treated as attributes o f the parent object and not as independent entities that have been brought together. This type of special behaviour of the PART-OF can be better expressed in a declarative form as a special UDR, which is referred to as the PART-OF disciplined relationship. It is a UDR to maintain taxonomy because if it was a BDR it would be impossible to distinguish the parent object from the component object.

PART-OF is a special type of disciplined relationship because its semantics is known and not user defined. Otherwise, it is treated by the DBMS as a disciplined relationship, and assertions between the instances of the parent object and the instances of its components can be created explicitly by the user. As PART-OF is a discip- lined relationship it can be inherited so the structure of a complex object can be inherited by another object. Consider for example the definition of the collection object vehicle in Figure 2 where the collection objects engine and body are components of vehicle. No action or condition is specified because the semantics of PART- OF is known to the DBMS and does not need to be defined b3) the user. The collection object car which ISA vehicle inherits the structure of vehicle and thus the engine and the body are also PART-OF car. So the following assertions can be made:

PART_OF(car(.name = "uno"), engine(.name = "fire")) PART_OF(car(.name ="uno"), body(.name ="3-door"))

This representation of PART_OF is advantageous to the standard approach in two cases 33. The first case is when building complex objects bottom up, that is, assembling existing objects to create a complex object. The second case is when keeping the component objects after the deletion of the parent object.

Associations at schema level

Now that all the features of the data model have been defined it is possible to define the schema of a database.

Definition 13 Assume a directed graph G = (N,E), and let:

the set of nodes N be N = NT U Nc, where the type nodes Nv and the collection nodes Nc are represented graphically as circles and semicircles, respectively. the set of edges E be E = EuD R t3 EBDR U E~SA U ETVpE, where the set of bilateral relationship edges EBDR con- sists of pairs of directed opposite edges between pairs of distinct nodes, the set of unilateral relationship edges EuD R consists of single directed edges between pairs of nodes, and the set of ISA edges E~sA and TYPE_OF edges EVVpE consists of single edges between pairs of nodes.

A graph G where each type node corresponds to one type

748 information and software technology

Page 9: Formalization of object-oriented database model with rules

object, each collection node to one collection object, and there is a one-to-one correspondence between ISA edges TYPE OF edges, unilateral relationship edges, and bila- teral edges, to ISA, TYPE_OF, UDRs, and BDRs, is a schema iff the following properties hold:

. There is one and only one node in N, called root node, such that at least one path of ISA and TYPE_OF edges leads to it from every other node in N.

• There is no cyclic path of ISA and TYPE OF edges.

The origin of a n EuD R is the object type in the definition of which is the rule that expresses the UDR. []

A schema is a self-sufficient module that can serve as a component for modular design. That is, an application can be partitioned into interacting, logically independent parts and then each part can be independently repre- sented by a schema. These separate schemas can then be interconnected via disciplined relationships to form the environment of the application. In such an environment the tight hierarchical organization of objects is not followed between schemas, but instead the looser discip- lined relationships approach is adopted for flexibility without loss of consistency. This modular design approach, i.e., with loosely coupled modules, enables the formation of distributed environments in a variety of configurations. For example, a possible configuration is an environment consisting of schemas on different sites on a local area network. Another alternative is an environment distributed over different users of a multi- user system. Any combination of the previous two confi- gurations is also possible.

Definition 14 An environment E is a graph (V, I~-DR) where V is a set of nodes each one corresponding to a schema and EDR a set of relationship edges between any two schemas in V. []

The example in Figure 3 depicts the environment of an airline. There are three schemas, namely, personnel, pas- senger, and flight. The notation flight@flight refers to the object flight at schema flight. Disciplined relationships can be also used to interconnect distinct databases in a similar way as they serve to interconnect different sche- mas of the same database.

O V E R V I E W O F D A T A M A N I P U L A T I O N

The basic idea behind the data model presented in the previous sections is the representation of constraints on objects and especially on relationships between objects, in a way orthogonal to ordinary method development using simple rules (condition action pairs). Relationships are important semantic constructs of the model because they express interactions between objects. A formalism that is based on relationships between objects is Horn clauses 34 and as such it has provided a basis for relational databases and database theory, especially for expressing queries and defining views 35,36.

\ i"t, e r . / k local / W

SCHEMA personnel TYPE personnel:

(age: integer, month fl ighthours: integer, name: string, phone: string)

ASSIGNED: ((month_flight_hours < 20)(.age < 45) (RELATE flight@flight 1)).

COLLECTION hostess IS A personnel: (languages: string).

COLLECTION pilot IS A personnel: (planes: string, f l ighthours: integer).

SCHEMA flight TYPE flight:

(plane_type: string, serial_no: string, destination: string, arrival: string, departure: string)

ASSIGNED:(t (RELATE personnel@personnel)); BOOKED:(t (RELATE passenger@passenger)).

COLLECTION international IS A flight:

(destination_country: string timedifference: integer).

COLLECTION local IS_A flight.

(destinationcity: string).

SCHEMA passenger TYPE passenger:

(name: string, method of payment: string, agent: string) BOOKED: (true(RELATE flight@flight)).

COLLECTION business IS_A passenger:

(company: string no_of flights: integer).

COLLECTION economy IS A passenger.

Figure 3. Example of environment of airline

vol 33 no 10 december 1991 749

Page 10: Formalization of object-oriented database model with rules

TYPE person: (name: string, age: integer, address: string) MIDDLE-AGED: ((age > 40)(age < 65) (AGGRE-

GATE); LIKE: (t (RELATE game)); LIKE: (t (RELATE self)); PARENT: ((age > 15)(RELATE self 2)).

COLLECTION man IS_A person:

(height: real, weight: real),

COLLECTION woman IS_A person:

(height: real, weight: real).

COLLECTION game: (name: string, players_no: integer, referee_no: • integer)

LIKE: (t (RELATE person)).

person

man woman game

\ - ~ ~ w4 L)PARENT \ g`

LIKE LIKE

C

14/4

/ \ m3 0 • w3

• • • •

m 1 Wl m2 w2

Figure 4. Example of environment (a), plus extensional data (b), and tree that contains all instances linked due to PA REN T (c)

Bringing the two together, namely, the authors' model and Horn clauses, gives a powerful formalism that treats rules in object type definitions as logic predicates. The advantage of such an approach is that as rules express constraints all interactions are always certain to keep objects in a consistent state. To put it differently, this approach combines imperative and declarative rules. Imperative rules (condition action pairs), as has already been stated in the preceding sections, allow for the declarative specification of the conditions under which a certain action is to be taken. Declarative rules (Horn clauses), on the other hand, provide a logical declaration of what the user wants rather than a detailed specifica- tion of how the results are to be obtained.

The formalism employed is simple and can be easily extended to provide all the features of Prolog. It enables the expression of queries on the intentional data that comprise base predicates corresponding to rules of objects and derived predicates. The extensional data, the facts, are the assertions. Note that the assertions are direct references between object instances, which as a consequence limits the range of a binary relation that corresponds to a disciplined relationship from the cross- product of the instances of two objects to a relatively smaller subset that contains the actually related

instances. This has another major advantage, the mini- mization of the number of joins required for the evalu- ation of logic queries, which is a well recognised source of delay. Figure 4(b) depicts the extensional data pertain- ing to the environment of Figure 4(a).

The data-manipulation language proposed here does not turn the system into a typed logical system. It rather serves the manipulation of data organized according to a model that provides far more than typing - - encapsula- tion. All objects are independent and transactions are carried out through message-passing. The objects com- municate asynchronously by exchanging messages through the blackboard 37. This process is transparent and is supervised by the administrator. The administra- tor is the tool responsible for evaluating queries and for that purpose it produces and receives the messages required for obtaining the answers. An overview of this process can be seen in Figure 5.

Syntax Consider the environment depicted in Figure 4 where man IS-A person and woman IS-A person and there are disciplined relationships LIKE(person, person) and LIKE(person, game). Furthermore, because of the ISA,

750 information and software technology

Page 11: Formalization of object-oriented database model with rules

I ISA / ~\TYPE_OF schema

blackboard ~ administrator I Figure 5. Overview of data-manipulation process

LIKE is inherited by man and woman. A query to find a game that both Mary and John like is:

? LIKE(man(.name=john), game(.name=X)), LIKE(wo- man(.name = mary), game(.name = X)).

To find the Xs that answer this query it is necessary to find any instance of the object type game that is asso- ciated both with the instance of the object type man where .name = john and with the instance of the object type woman where .name = mary.

From the above example, the only problem introduced by using structured terms in queries is their negative impact on the simplicity of pure Horn clauses, because it must be syntactically described that variables and con- stants refer to certain attributes of certain object types.

Although the query language is logical, it has some features that come from the object-oriented paradigm. So, database queries are questions that can be answered only in the context of an object. To enforce such a constraint, each object type can have a set of stored queries, which are derived predicates. Stored queries are in turn used to answer queries in the same way as rules are in logic programming.

The following definitions describe the structure of facts, queries, and stored queries. Facts correspond to the data in the objects of the database and are repre- sented as ground clauses with no variables. The predicate names of facts are names of rules found in the definition of object types.

Definition 15 Let O~, 02 be two object types, B the name of a rule in both Ot and 02, defining a disciplined relationship, and G the name of a rule in O~.

A fact is either:

B(O,(Y0, O2(Y2))

or

G(O,(Z))

where Yj, Y2, and Z are bound and have the form:

A "~= t(.x~ op zk)

where xk is an attribute of an object o~, op is one of the operators: ~<, >~, = 5 , > , < , and zk is a constant. []

For example, to express the fact that john likes mary who lives in paris, in the database of Figure 4 and using the disciplined relationship LIKE, which is expressed by the rule under the name LIKE in the definition of the object type person, it is necessary to write:

LIKE(woman(.name = mary, .address--- paris),man(.name- =john)),

This states that one instance of object man such that the value of the attribute .name of man is john, and one instance of object woman, such that the value of the attribute .name of woman is mary, and .address is paris, are connected under the disciplined relationship LIKE. Note that as the rule LIKE relates the object type person with itself there is no need to specify two rules to express the disciplined relationship LIKE.

The fact that John is middle-aged can be stated, by using the aggregation MIDDLE-AGED, which is expressed by the rule with the name M I D D L E - A G E D in the definition of person, as:

MIDDLE-AGED(man(.name =john)).

Definition 16 Let O be a set of object types O = {Ovl v = 1,...,IX}, Ov ~- (Kv, rv(K/)), A~ the name of a rule in any two objects O,, Om defining a disciplined relationship, and G~ the name of a rule in O~, where 1 ~< n ~< Ix,m ~< IX.

• The form of a query is:

• the form of a stored query is:

Q(O~X,), ok(x2)):- A&~Pi 1 ~< l, k ~< I~

where P~ or F~ can be of the form:

A,(O~(Y,I), O~(Y,2))

or

G,(O,(Y,,))

Ai is either another stored query or a rule name; Gi is a rule name. If Xs and Ys are free they have the form .x~ = Zi, where xi is an attribute of an object O~ and Z~ is a variable. If they are bound they have the form that appears in the previous definition. []

Q(OI(XI), Ok(X2)) is the head of the query and A~=~Pj is the body of the query. Note that the heads of stored queries must have exactly two variables to preserve the discipline of pairs imposed by relationships. If more vari- ables were allowed the character of a database could be altered by developing through stored queries a quite different intentional database.

Now consider as an example a query to find the games that all men, called John and who are over 30, like:

vol 33 no 10 december 1991 751

Page 12: Formalization of object-oriented database model with rules

? LIKE(man(.name =john, .age > 30), game(.name = X)).

For yet another example consider the following stored query of object type person expressing that men admire any middle-aged woman who plays football:

ADMIRE(man(,name = X),woman(.name = Y)):- LIKE (woman(.name = Y),game(.name = football)), MIDDLE- AGED(woman(.name = Y)).

To find the women that John admires the following must be written:

? ADMIRE(john,X)

The restriction to two variables in the heads of stored queries does not reflect the amount of data shown when the query is asked. For example, ? ADMIRE(john,X) does not mean that the answer of the query will be the value of the attribute name of an instance of the object type woman, but rather the values of all the attributes of the instance.

Views

An O 0 system inherently provides views on its data as encapsulation is a fundamental feature of such a system. An additional viewing mechanism can add flexibility to the system by allowing the data of an object to be viewed in a variety of ways dictated by security, simplicity, con- venience, etc.

Views in relational databases are defined by queries. On the other hand, mechanisms suggested for views in OO systems involve the definition of new objects that inherit parts of the behaviour of other objects to provide a different view on the data of the original object 38. The use of a query language for defining views is more flex- ible in principle, because the schema of the database does not change by adding new objects that represent views, and so it provides a mechanism that is orthogonal to data definition.

The viewing mechanism proposed here is simple and employs the query language. Each object type has attached to it certain stored queries, which cannot be inherited by other object types, and certain rules. A stored query involves only object types that are reachable by the object type where the stored query has been defined. A query asked by the user is a message that can only be answered by the object or objects that know how to answer it by making use of their stored queries and rules. Therefore, a set of stored queries can define a view to the instances of an object type or to the instances of a set of related objects. Access to data is eventually poss- ible only through views, because, as already mentioned, queries are messages that can be answered by the objects that have the required view.

Definition 17 The potential view of an object type O is the union of the instances of O and the instances of all the object types

that are reachable from O via any sequence of disciplined relationships. D

As every query is a question put to an object it has a context that is defined by the potential view of the object. Now a set of stored queries can in turn limit the potential view of an object by allowing a subset of it to be viewed; this is the view of the object.

Definition 18 The view of an object O is a set of stored queries that facilitates access to the instances of the object itself and to the instances of other objects that are reachable from O. []

Evaluation

The keystone of the evaluation technique is the assertion as described in definition 11. There, an assertion is defined as a binary relation between two object instances. An assertion can be thought of as the instance of a relationship that exists between two object types. So the evaluation of queries is actually the traversal of graph structures between related object instances linked via object identity. For example, consider the ANCESTOR stored query, which is defined recursively using the dis- ciplined relationship PARENT(person,person).

ANCESTOR(person(.name = X),person(.name = Y)):- PARENT(person(.name = X),person(.name = Z)), ANCESTOR(person(.name = Z),person(.name = Y)).

ANCESTOR(person(.name = X),person(.name = Y)):- PARENT(person(.name = X),person(.name = Y)).

As PARENT is a disciplined relationship, for every vir- tual instance of person involved in PARENT there is a tree of linked instances. For example, Figure 4(c) depicts the tree that contains all the instances that are linked due to PARENT. Now, the answer to the query:

? ANCESTOR(person(.name =john),person(.name = X)).

is simply the traverse of the tree the branches of which are PARENT assertions. The root of the tree is an instance of person such that .name=john. The nodes of this tree are the ancestors of john.

Before reaching that stage each query must be ana- lysed to contain only base predicates. This is achieved using a variation of the connection graph 39.4° called modified connection graphs (MCGs). The connection graph has been used because it transforms recursion into iteration, which in turn enables static determination of data. The statically determined data do not have to be stored anywhere separately but can be determined by tracing the appropriate pointers. The modification was made to enhance its use in the query language proposed here.

Definition 19 An MCG is a quintuple (N, Eu, L, C, Ev) where

752 information and software technology

Page 13: Formalization of object-oriented database model with rules

10

6 I 11 , 7 !i] ancest°r] r2[il parent ]il ],

Figure 6. Modified connection graph .for A N C E S T O R stored query

• N is a set of nodes; each node represents a variable or constant of a literal.

• E, is a set of edges connecting corresponding argu- ments of unifiable literals.

• L is a set of literal partitions such that partition l belongs to L if it contains exactly those terms belong- ing to a literal.

• C is a set of clause partitions such that partition c belongs to C if it contains exactly those literals belong- ing to a clause.

• E, is a set of edges connecting nodes that correspond to occurrences of the same variable in a given clause partition.

Edges exist between terms of unifiable literals clearly indicating substitutions. Edges also exist between all occurrences of the same variable within the same rule. Figure 6 depicts the M C G for the A N C E S T O R stored query.

The following shows an outline of the algorithm that traverses the M C G in the case of one bound argument in the head of a rule.

Algorithm 1 1.

l(a), Pass the bound argument X of the head of a rule to the next term, indicated by the substitution edges, by following a path from X to the free argument Y of the head of the rule. The path should at least pass through one literal partition corresponding to a base predicate; when that happens the new value yielded by the base predi- cate is passed on.

I (b). If two or more alternative edges are encountered then if one of the edges leads to the head of a rule

then follow that edge else if there are more than one such heads of rules

then choose the edge that leads to the one that has the free argument Y. else if all alternative edges lead to base predi-

cates than traverse all the alternative paths;

.

those that reach Y must yield the same value for Y.

(*for every path, in the following, apply l(b)*) I f there is a path that satisfies the condition of step l from X to itself then follow it and return to step 1 with the new value of X, repeat until there are no more data. else if there is a path that satisfies the condition of step 1 from Y to itself

then follow it and with the new value of Y return to step 1 by putting X in the place of Y and Y in the place of X, repeat until there are no more data.

else if there is a path from both X and Y leading to X and Y respectively

then follow them concurrently to produce X, Y pairs, repeat until there are no more data.

[]

For example, step 1 of the algorithm for the M C G of Figure 6 for the query ?ANCESTOR(person( .name- =john),person( .name = X)) yields the sequence of edges 10,8,9, 11 and step2 yields 1, 2, 4, 10,8,9, 11; 1,2,4, 10, 8,9,11;. . . .

The M C G is constructed by the administrator, which in turn traverses it and in the process produces the appropriate messages to the objects in the database. An overview of this process is depicted in Figure 7 using Petri nets, a powerful formalism to describe concurrent systems4L (Figure 7 is an elaboration on Figure 5.) As can be seen, only the object types communicate through the blackboard and not every individual instance. The instances are accessed through their object type. This is more efficient than every instance being accessed through the blackboard, given the volume of instances in the database.

DEDUCED RELATIONSHIPS

The well defined semantics and the simple, well under- stood notation of Horn clauses combined with the con- cept of disciplined relationships led to a scheme for the evolution of database schemas in terms of new object inter-relations based on transaction monitoring. Com- monly used, simple patterns appearing in queries are identified and mapped on to the database schema and on to the data. The purpose of monitoring queries is primar- ily to adapt the database schema to the requirements of the end-user. The pattern sought is simple to avoid an excessive computat ional burden and provide a practical identification method. Its form corresponds, via the dis- ciplined relationships involved, to a sequence of inter- related objects, which in turn is replaced by a new discip- lined relationship between the first and the last objects of the sequence. The advantage of this approach as opposed to saving specific queries 42.43is that a query pattern provides an abstraction over a number of queries and is more useful in query optimization, more appropri- ate to be included into the database schema, and more efficient with respect to storage.

vol 33 no 10 december 1991 753

Page 14: Formalization of object-oriented database model with rules

read , J ~ blackboardO J

message ~ f from l

blackboard 6 I

search " data 7

send to blackboard

messag input

put into buffe

read blackboard

fetch from buffe~

message out

read - ~ : blackboard 0 message~L - from ---I--- blackboard ~ ~

answer _ ~ _

~ g ; r u c t ( ~ , ~ l

~ ; r s e ~

send to blackboard

OBJECT-TYPE BLACKBOARD ADMINISTRATOR

On I I I1_ I L.-- -J

'~'" D M I N IST RATOR

BLACKBOARD

Figure 7. Overview of process of constructing modified connection graph

The queries are analysed after being transformed to graphs. Each nonrecursive query is represented by a graph, after having substituted every derived predicate with the corresponding conjunction of base predicates. Then the query graph is searched for a chain query pattern. A chain query pattern is a subquery consisting of a sequence of predicates that do not have more than

two variables outside the subquery referring to the predi- cates of the subquery and all the predicates have a dis- tinct set of terms. Such a pattern, as will be shown in the sequel, is a trail of the graph that contains all the edges of the graph or if the graph has point connectivity one it is a trail containing all the edges of a component of the graph augmented with the cutpoint. The subgraphs that con-

754 information and software technology

Page 15: Formalization of object-oriented database model with rules

tain the edges and nodes of the trails that have the properties mentioned above are compared with pre- viously extracted and saved subgraphs to determine whether the resulting deduced relationship will be recorded or not. If the number of times that such a subgraph is found reaches a threshold value then the predicates of the query corresponding to the subgraph are replaced by a deduced predicate, which in turn can be stored in the database and become a base predicate referred to as a deduced relationship. In this way know- ledge frequently produced during query evaluation can be saved and used without re-evaluation. It can also be used directly in queries as it is recorded automatically on to the database schema.

The definition that follows specifies how a query graph is constructed and the definition after that gives the form of a chain query pattern.

Definition 20 Assume a directed graph G = (N,E), and let N be the set of nodes and E the set of edges where E = Eo 0 Es, Eo is the set of pairs of opposite edges between pairs of distinct nodes and Es is the set of single edges. A graph G is a query graph if:

• Each node corresponds to the set of variables of a query that refer to the same object.

• Each pair of edges in ED represents a predicate of the query that corresponds to a BDR. Such a pair of edges connects the nodes that correspond to the objects involved in the predicate and is labelled by the name of the BDR. An edge in ED represents a predicate in the query corresponding to a UDR; it connects two nodes that correspond to the objects involved in the predi- cate and the direction of the edge is the same as that of the UDR. Again the edge is labelled by the name of the UDR. []

Definition 21 A chain query pattern (CQP) of a query ?PI(OI(Y0,O2(Y2)), P2(O2(Y2),Os(Y3)) ..... Pn(O2~- ~ (Y2n-i),O2~(Y2~)) is a subquery ?Pro+ l(Om + l(Ym+ J,Om+ 2 (Ym+2)), P,,+2(O~+2(Y,,+2),O,~+3(Ym+3)),..., Pm+k (O2(m+k)-I (Y2(m + k)- 1(O2(m + k)(Y2(m + k))) where 2 ~< k m + k < n, ps are base predicates, all the pairs of terms O2(,~ +j~-](Y2(m +j~-t), O2(,n+j3(Y2(,n+j3 ) of the predicates are unique within the subquery, and the variables Om+2(Ym+2) ... O2(m+k)-t (Y2t~÷k)-l) do not appear in any other predicate of the query outside the subquery. The subquery implies the derived predicate Q(Om+|,O2(m+k)) called deduced predi- cate, which defines a deduced relationship Q between O,~+~ and O2(m+k). The deduced relationship Q is the composition Pro+ 10 P~+2 o ... o Pm+k of the relationships that correspond to the predicates of the subquery. []

As mentioned earlier, a CQP is a trail on a query graph that contains all the edges of the graph or if the graph has point connectivity one it is a trail containing all the edges of a component of the graph augmented with the cut- point. The algorithm that follows searches a query for a

CQP; it constructs the query graph G and looks for a trail described above.

Algorithm 2 1. Construct the query graph G. 2. Find and label all nodes that are connected with

more than two relationships. 3. Let O~, Oj be the objects involved in the head of the

query. 4. Let k be the number of edges of G that correspond to

distinct predicates. 5. If k >/2 and there is a trail of length k from Oi to Oj or

vice versa such that edges corresponding to the same predicate appear only once then this trail is a CQP and terminate the algorithm, else 5(a). if the

then 5(a) l.

5(a)2.

5(a)3.

5(a)4.

5(a)5.

point connectivity k(G) of G is 1

Following a path from Oi find the first labelled cutpoint O¢1 of G. Let Goi be the component of G, derived after the removal of Oc3 from G, that includes Oi and Goj be the component of G, derived after the removal Oa from G, that includes Oj. Construct two subgraphs GI and G2 of G where Gi is derived from G after the removal of Goi and G2 after the removal of Goj. Let kj be the number of edges of Gi that correspond to distinct predicates and ks the number of edges of G2 that corres- pond to distinct predicates. Let k m = max(kl,k2) and kn = min(kl,k2). If in the subgraph with km(k~ >i 2) there is a trail of length km from Ob to O, and vice versa

(*Oh corresponds to O~ in Goi and to Ocl in Goj*) (*O, corresponds to Oc~ in Go~ and to Oj in Gos* )

then this trail is a CQP and terminate. else if in the subgraph with k, (kn >t 2) there is a trail of length k~ from Ob to Oe and vice versa

then this trail is a CQP and terminate else let G = G2 and return to step 5(a)

else terminate []

Theorem 2 Algorithm 1 finds in a query a CQP with the maximum length. []

Proof If the algorithm has terminated successfully then it has detected a trail that is a CQP because:

• It passes from all edges of the query graph or a sub- graph only once, so it corresponds to the conjunction

vol 33 no 10 december 1991 755

Page 16: Formalization of object-oriented database model with rules

of all the predicates that are represented by the edges. The predicates in this conjunction appear only once. Thus the trail corresponds to a subquery with more than two distinct predicates. It either covers all the edges of the query graph or of a component of the graph which by definition has only one node in common with the other components (point connectivity one), so the subquery that corres- ponds to the trail does not have references from predi- cates outside the subquery.

The CQP found by the algorithm has maximum length because first, in the case of the whole query graph, the CQP has maximum length as it covers all the edges of the graph. Second, if a CQP is found in a component graph and assuming that there is a CQP with one or more additional nodes that are not cutpoints of the query graph then a cutpoint would be an internal node of the trail. So there would be reference from predicates outside the CQP, which contradicts the definition. []

As already mentioned, a deduced predicate Q(Y~,Yk) expresses a relationship between variables Yi and Yk and as these variables refer to attributes of objects O~ and O,,, respectively, then it expresses a relationship between O~ and Ore. So a deduced predicate is saved as a disciplined relationship Q between Ot and O,,. The name of Q is not automatically given by the system, but the user is asked to provide a meaningful name.

The task of forming a deduced relationship comprises the definition of the new relationship and the creation of deduced assertions. The rules of a deduced relationship are called deduced rules and their constraints deduced constraints. The definition is similar to that of any discip- lined relationship with the only difference that its con- straints must be extracted from the constraints of the relationships that participate in the query pattern. So the set of predicates pertaining to a deduced rule of an object, which is a subset of the set of all predicates of the query pattern, consists of those predicates that contain attributes 'known' to that object. An attribute can be known to an object either through inheritance or by explicit definition. So if C is the set of all constraints in a query pattern, C = tJ Coi, and CQt, CQ2,, are the sets of constraints pertaining to the deduced relationship of objects O~ and O2,, respectively, then Co~ _~ C, Co2,, _ C, where CQ~ and CQ2" are defined as: CQI = {c(Zk)[zkeTot}, CQ2n = {c(zz)lzteTo2,} and To~, To2, are the sets of attributes of objects Oj and O2,, respectively. For example, the query pattern MARRIAGE(man,woman) , DRIVES- (man,car) indicates an indirect relationship Q between woman and car. If this pattern appears frequently it is worth while to establish a direct relationship between the two objects to declare this interaction on the database schema for direct future use. If the user names Q, USE, the new deduced disciplined relationship in the database schema is USE (woman,car).

The deduced assertions of a deduced relationship also form automatically and consist of all the links that can be formed between the instances of O~ and 02, by tracing

the links between the instances of the rest of the objects in the query pattern. A deduced relationship after it has formed can be treated as an ordinary disciplined rela- tionship. So it can be used as a base predicate, in building queries and in creating new assertions. In that sense, a deduced relationship can have deduced assertions and ordinary assertions. Deduced assertions are always dependent on other assertions, and the existence of these assertions is essential for the existence of deduced asser- tions. So when an assertion is removed and a deduced assertion is dependent on it, the deduced assertion is removed also.

C O N C L U S I O N S

Rules incorporated into object orientation make an effective problem-solving combination that addresses the issues of declarative versus embedded knowledge rep- resentation, explicit integrity constraints, complex objects, associations at object and schema level, and schema evolution. Logic provides an attractive computa- tional framework that is compatible with the rules and facilitates the identification of the intentions of the users as far as new associations between objects are concerned. These ideas have been discussed in the frame of an inte- grated OO system. This system has been implemented as a prototype under the name of MEDOUSA on top of the Joshua artificial-intelligence development environment on a Symbolics Lisp Machine. A new version is currently being implemented in Common Lisp with Flavors. The description of the data definition language of the DBMS under implementation in Boyce-Codd normal form can be seen in the Appendix.

A C K N O W L E D G E M E N T

The authors thank the referees for their constructive comments.

R E F E R E N C E S

1 Kim, W and Lochovsky, F (eds) Object-oriented concepts, databases and applications ACM Press (1989)

2 Su, S, Krishnamurthy, V and Lain, H 'An object-oriented semantic association model (OSAM*)' in Kumara, S, Soys- ter, A L and Kashayap, R L (eds) A1 in industrial engineering and manufacturing: theoretical issues and applications Amer- ican Institute of Industrial Engineering (1988)

3 Bancilhon, F, Biggs, T, Khoshafian, S and Valduriez, P 'FAD: a powerful and simple database language' in Proc. 13th VLDB Conj'. Brighton, UK (1987) pp 97-107

4 Ait-Kasi, H and Nasr, R 'Logic and inheritance' in Proc. Syrup. Principles of Programming Languages St Petersburg, FL, USA (1986) pp 219-227

5 Bailou, N, Chou, H e t al. 'Coupling an expert system shell with an object-oriented database system' J. Object-Oriented Prog. (June/July 1988) pp 12-21

6 Banerjee, J, Kim, W, Kim, H J and Korth, H 'Semantics and implementation of schema evolution in object-oriented databases' in Proc. A C M SIGMOD Conf. San Francisco, CA, USA (1987) pp 311-322

7 Fishman, D, Beech, D et aL 'Iris: an object-oriented data-

756 information and software technology

Page 17: Formalization of object-oriented database model with rules

base management system' ACM Trans. Office Inf. Syst. Vol 5 No 1 (January 1987) pp 48-69

8 Manola, F and Dayal, U 'PDM: an object-oriented data model' in Proc. Ist Int. Workshop on Object-Oriented Data- bases Pacific Grove, CA, USA (1986) pp 18-25

9 Maier, D, Stein, J, Otis, A and Purdy, A 'Development of an object-oriented DBMS' (Proc. 1st ACM Conf. OOPSLA) ACM SIGPLAN Notices Vol 21 No 11 (1986) pp 472-482

10 Brodie, M 'Specification and verification of data base semantic integrity' PhD thesis Dept of Computer Science, University of Toronto, Canada

11 Nikhil, R 'Functional databases, functional languages' in Atkinson, M, Buneman, P and Morrison, R (eds) Data types and persistence Springer-Verlag (1988) pp 51~7

12 Agrawal, R and Gehani, N 'ODE (Object Database and Environment): the language and the data model' in Proc. ACM SIGMOD Conf. Portland, OR, USA (1989) pp 36-46

13 Bloom, T and Zdonik, S 'Issues in the design of object oriented programming languages' in Proc. ACM OOPSLA Conf. Orlando, FL, USA (1986) pp 441-451

14 Fukunaga, K and Hirose, S 'An experience with a Prolog- based object-oriented language' in Proc. ACM OOPSLA Conf. Portland, OR, USA (1986) pp 224-231

15 Bancilhon, F 'A logic-programming/object-oriented cock- tail' SIGMOD RecordVol 15 No 3 (September 1986) pp 11- 21

16 Koschmann, T and Evans, M W "Bridging the gap between object-oriented and logic programming' IEEE Software (July 1988)

17 Eli6ns, A 'Extending Prolog to a parallel object oriented language' in Proc. IFIP Conf. Decentralized Systems Lyon, France (1989) pp 357-371

18 Buneman, P and Atkinson, M 'Inheritance and persistence in database programming languages' in Proc. ACM SIGMOD Conf. Washington, DC, USA (1986) pp 4-15

19 Wand, Y 'A proposal for a formal model of objects' in Kim, W and Loehovsky, F (eds) Object-oriented concepts, data- bases and applications ACM Press (1989)

20 Mylopoulos, J, Bernstein, P and Wong, H K T 'A language facility for designing database-intensive applications' ACM Trans. Database Syst. Vol 5 No 2 (June 1980) pp 185-207

21 'Reference guide to Symbolics-Lisp' Symbolics manual 996025 (June 1985)

22 Brachman, R 'What IS-A is and isn't: an analysis of taxono- mic links in semantic networks' Computer Vol 16 No 10 (October 1983) pp 37-41

23 Korth, H and Siiberschatz, A Database system concepts McGraw-Hill (1986)

24 Stroustrup, B The C+ + programming language Addison- Wesley (1986)

25 Snyder, A 'Common Objects: an overview' ACM SIG- PLAN Notices Vol 21 No 10 (1986) pp 19-28

26 'Report on the object oriented database workshop: seman- tic aspects' addendum to Proc. OOPSLA '87 Conf. Orlando, FL, USA (1987) pp 45-50

27 De Troyer, O, Keustermans, J and Meersman, R 'How help- ful is an object-oriented language for an object-oriented database model?' in Proc. 1st Int. Workshop on Object- Oriented Databases Pacific Grove, CA, USA (1986) pp 124- 132

28 Goutas, S, Soupos, P et al. 'The use of the object-oriented approach in the GRASPIN DB' in Proc. 4th Esprit Conf. Brussels, Belgium (1987) pp 361-374

29 Rumbaugh, J 'Relations as semantic constructs in an object- oriented language' in Proc. ACM OOPSLA Conf. Orlando, FL, USA (1986) pp 466-481

30 Carey, M, DeWitt, D and Vandenberg, S 'A data model and query language for EXODUS' in Proc. ACM SIGMOD Conf. Chicago, IL, USA (1988) pp 413-423

31 Kim, W, Chou, H T and Banerjee, J 'Composite object support in an object-oriented database system' in Proc. OOPSLA '87 Conf. Orlando, FL, USA (October 1987) pp 118-125

32 Tsichritzis, D and Lochovsky, F Data models Prentice Hall (1982)

33 Kim, W, Bertino, E and Garza, J F 'Composite objects revisited' in Proc. ACM SIGMOD Conf. Portland, OR, USA (1989) pp 337-347

34 Clocksin, W F and Mellish, C S Programming in Prolog Springer-Verlag (1981)

35 Brodie, M and Jarke, M 'On integrating logic programming and databases' in Kerschberg, L (ed) in Proc. 1st Int. Work- shop on Expert Database Systems Benjamin/Cummings (1986) pp 191-207

36 Rybinski, H 'On first-order-logic databases' ACM Trans. Database Syst. Vol 12 No 3 (September 1987) pp 325-349

37 Tsichritzis, D, Fiume, E, Gibbs, S and Nierstrasz, O 'KNOs: knowledge acquisition, dissemination and manipulation objects' ACM Trans. Office Inf. Syst. Vol 5 No 1 (January 1987) pp 96-112

38 Manola, F 'Object model capabilities for distributed object management' TM-0149~96-89-165 GTE Laboratories Inc. (30 June 1989)

39 Hensehen, L and Naqvi, S 'On compiling queries in recursive first-order databases' J. ACM Vol 31 No 1 (January 1984) pp 47-85

40 Kowalski, R 'A proof procedure using connection graphs' J. ACM Vol 22 No 4 (October 1975) pp 572-595

41 Genrich, H J and Lautenbach, K 'System modelling with high-level Petri nets' Theoret. Comput. Sci. Vol 13 (1981) pp 109-136

42 Nicolas, J M and Yazdanian, K 'An outline of BDGEN: a deductive DBMS' in Proc. IF[P Congress "83 North-Hol- land (1983)

43 Shmueli, O, Tsur, S and Zfira, H "Rule support in PRO- LOG' in Kerschberg, L (ed) Proc. 1st Int. Workshop on Expert Database Systems Benjamin/Cummings (1986) pp 247-269

APPENDIX: DESCRIPTION OF DATA DEFINITION LANGUAGE OF DBMS

schema :: = SCHEMA schema_name object list object list : := {object_definition} objectdefinition :: = keywordl object_name

[{ keyword2 object_name }]: (attribute_list)

[{ rule }]. attribute_list :: = { attribute_name : type ,} keywordl :: = TYPE I COLLECTION keyword2 :: = IS-AI TYPE_OF I PART OF rule :: = name : condition action condition :: = cond I(not cond ) cond :: = (expr)l (expr op expr) I (t) action :: = (act_name arg_list) arg_list :: = { arg }

Boldface type denotes keywords, curly brackets '{ }' indicate repetition, square brackets '[ ]' indicate optional grammar symbols, and the vertical bar T separates alternatives.

vol 33 no 10 december 1991 757