sql extensions and algebras for object-oriented query languages · 2002. 12. 3. · sql extensions...

SQL Extensions and Algebras for Object-Oriented

Query Languages

Carolina Culik, Randall Mack and Dave Shewchun

Contents

1 Introduction 3

2 Query Languages 3

2.1 What is a Query Language? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1.2 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1.3 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 RELOOP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.2 Accessing Nested Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.3 Constructing New Entities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2.4 Nested Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2.5 Some Additional RELOOP Facilities . . . . . . . . . . . . . . . . . . . . . . . 92.2.6 Aggregate Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 OSQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3.2 Referencing Objects Through Keys . . . . . . . . . . . . . . . . . . . . . . . . 112.3.3 Using Functions in Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.4 Accessing Nested Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.5 Aggregate Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Algebras for Object-oriented Database Systems 13

3.1 An Algebra for EXTRA/EXCESS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.1.1 The EXTRA Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.1.2 The EXCESS Query Language . . . . . . . . . . . . . . . . . . . . . . . . . . 143.1.3 The Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.1.4 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.1.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2 An Algebra for ENCORE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.2.1 The ENCORE Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.2.2 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1

2 Culik, Mack and Shewchun

3.3 An Algebra for RELOOP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.3.1 RELOOP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.3.2 The O2 Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.3.3 Algebra Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.3.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.4 An Algebra for OODAPLEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.4.1 OODAPLEX Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.4.2 OODAPLEX Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.4.3 OOAlgebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.4.4 OOAlgebra Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.4.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4 Overall Conclusions 25

SQL Extensions and Algebras for OODBMS 3

1 Introduction

The query language used to access a database and the underlying algebra form one of the mostimportant components of a database management system. We will discuss the issues involved in thedesign of a query language, examine several SQL-like query languages and review several possiblealgebras for the object-oriented model.

The emphasis in this paper is on practical, user-friendly models. Query languages which areattempts at extending SQL to the object-oriented model were chosen because of the large populationof users already familiar with SQL, as well as its simplicity and formal basis.

2 Query Languages

In this section we introduce the concept of a query language, explaining some of the issues whichaffect its design and use. We analyze in depth the query languages RELOOP for the O2 [CDLR89]object-oriented database management system (OODBMS) and OSQL [FAC+89] for Iris OODBMS.

2.1 What is a Query Language?

2.1.1 Purpose

Query languages are used for four main purposes [BCD89].

1. for interactive queries

2. to access the database from a programming language

3. for simple programming tasks

4. to simplify complicated programming tasks

However, these four modes of use do not apply equally to all query languages. In general, object-oriented query languages are intended to merge smoothly with the programming language. Thismeans that a programmer should be able to access the database directly from the host language,removing the need to use the query language for this purpose. On the other hand, traditional SQLembeddings only permit modification of the database through SQL queries and the lack of closeties between the two languages will preclude using SQL for programming shortcuts.

Despite the variety of uses to which a query language can be put, we will concentrate on itsaptness for phrasing interactive queries. From the perspective of object-oriented query languages,that is certainly the most fundamental application. Any other purpose can be seen as more of auseful side-effect.

2.1.2 Characteristics

An ad hoc query language should have several characteristics: it should be possible to simply andconcisely express reasonable queries; the run-time system should be able to find a good strategyfor obtaining the result; and it should be independent of any specific application. As well, theinteractive query language need not be as powerful as a complete programming/query languagecombination. Simplicity is more important.


2.1.3 Design

There are many issues involved in the design of a query language. A sample of some of the morecritical ones might be [BCD89]:

• Should encapsulation be supported?

• What form will the answer take?

• Will the query language be integrated into the host language?

• Should queries be explicitly typed?

From the point of view of an ad hoc query language for an OODBMS, we can provide tentativeanswers to these questions.

Encapsulation

Encapsulation is the concept of packaging a set of methods with any given item of data, whereonly the methods will be visible to the user. Essentially this notion is just abstract data typesas applied to an object-oriented language, and current wisdom says that the use of abstract datatypes is a good thing.

From this we can see that for queries buried in a programming language, encapsulation is amust. However, for interactive queries it is not so obvious. The main reason for encapsulation (toenforce modularity in programs) does not apply here. It is also easy to come up with exampleswhere encapsulation would significantly hamper the user. For example [BCD89], a user does notwant to query a stack through push and pop operations, he wants to be able to see the entire stack.

It appears that the best approach would be to enforce encapsulation for embedded queries, butrelax this restriction at the interactive level.

Form of the Answer

There are essentially three options for how to represent/store the value returned from a query.

1. Queries can only return information in the form in which it currently exists in the database.

2. Query results belong to a meta-class which has pre-defined methods permitting displayingand printing.

3. A new class and methods permitting access to all of its fields are dynamically defined forevery query result.

Option one is too restrictive for many purposes. Option three requires more overhead than mostOODBMS’s can spare. This leaves option two as the most reasonable compromise, but re-using theresults can cause problems if encapsulation is enforced. We see a tradeoff between encapsulationand the power/efficiency of the language.


Embedding the Query Language

The most important thing to remember about this topic is that one of the major objectivesin the design of OODBMS’s is to remove the impedance mismatch found with relational systems.Assuming that problem is avoided, we have one remaining choice. Whether to make the querylanguage a subset of the programming language, or merely ensure that it mixes well enough thatcalls from within the programming language are natural. While either is acceptable it would appearthat making the query language a subset of the programming language would be preferable. Usingthis approach, there is no need for the programmer to learn two distinct languages. Learning theprogramming language is sufficient.

Typing

Similar to the discussion on encapsulation, we find that explicit typing is good for embeddedqueries, but bad for ad hoc queries. Explicitly typing embedded queries improves the correctnessof a program. For an interactive user however, correctness is not quite as big an issue. Since thetype can be inferred from the structure/class, we do not need to declare an explicit type with thequery. This will increase the clarity and ease the creation of new, ad hoc queries.

2.2 RELOOP

The Altair research group in Rocquencourt, France, is currently implementing O2 [CDLR89], anobject-oriented database management system (OODBMS). O2 supports two languages, each oneadding data description and data manipulation to an existing programming language. These pro-gramming languages, when coupled with the database management facilities, solve the traditionalimpedance mismatch problem. However, they do not provide a very nice way in which to query adatabase.

Therefore, to compete with relational systems, the Altair group is implementing RELOOP, aninteractive query language designed to allow users to specify reasonable queries easily and concisely.

RELOOP is being implemented on top of the O2 OODBMS. O2 supports all of the standardobject-oriented features. Its support of complex objects is in the form of providing three minimalset constructors: sets, lists and tuples. All of these constructors can be applied to each other.For example, the user can create sets of sets, tuples with set valued fields, etc. O2 also has oneadditional distinctive feature – its values. Values differ from objects in the following ways [BCD89]:

• Values do not obey object-oriented principles.

• Their structures are very similar to objects, but they are visible to the user whereas objectsare encapsulated.

• Values have no identity.

• Values are manipulated by operators, not by methods.

For instance, the ‘.’ operator allows for the extraction of a field from its tuple, like in SQL.Iteration is possible over a set value, and two list values can be concatenated.

A Sample O2 Database Schema


Person = [name: [f-name: String, l-name: String],age: Integer,address: [city: String, zipcode: Integer]]

Woman = [name: [f-name:String, l-name:String],age: Integer,sex: String,address: [city: String, zipcode: Integer]]

Woman is a subclass of person.

Man = [name: [f-name: String, l-name: String],age: Integer,sex: String,address: [city: String, zipcode: Integer]]

Man is a subclass of person.

Family = [name: String,mother: Woman,father: Man,children: {Person}]

All the fields of the above classes can be accessed through methods such as the following.

name: Person ⇒ [first-name: String, last-name: String]address: Person ⇒ [city: String, zipcode: String]age: Person ⇒ Integeris-child-of: Person × Person ⇒ Booleanspouse: Woman ⇒ Manspouse: Man ⇒ Womanname: Family ⇒ Stringmother: Family ⇒ Womanfather: Family ⇒ Manchildren: Family ⇒ {Person}

Note that these methods allow complete access to all data; thus, there is no reason to violateencapsulation.

The following is a presentation of the features of the RELOOP language [CDLR89]. All examplequeries access a sample database implemented on the above defined database schema.

2.2.1 Syntax

Since most potential users of the query language know SQL, the designers of RELOOP decided tomake the language SQL-like. Consequently, RELOOP queries will feature the select from whereblock.


We begin with a simple query illustrating the syntactical differences between RELOOP andSQL.

Q1: What are the names of the minors?

select name(p)from p in Personwhere age(p) < 18

Basically, the query can be read and understood much like an SQL query. The select clausedefines the result (the names of some persons). The from clause says where to get the information.Note that in this case, “Person” is a special set value, so it is legal to define the variable p as rangingover the set and representing one of its elements. The where clause specifies the conditions thatmust be satisfied.

The first obvious difference from SQL is the specification of “name(p)” instead of “p.name” inthe select clause. The designers believe that the functional notation is easier to read. Anotheradvantage of this notation is that it is used both for operators and methods, so inexperienced usersdo not need to understand the difference between objects and values.For instance, to get the first name of a person, we first have to apply the “name” method on a“Person” object, and then use the field extract operator to obtain the result. Clearly, the expressionfirst-name(name(p)) is easier to read and use than the expression name(p).first-name.

Second, RELOOP introduces a new keyword, “in” in the from clause. Any expression whichresults in a set value can be used in the from clause. Therefore, in the case of complex expressions,a separator between the variable name and its domain makes the query easier to read.

Finally, a select from where block result will always be a set value. This results from thefact that users are usually trying to retrieve a collection of database entities. However, if theuser knows that they are extracting a single item from the database, they may use the RELOOPoperator “extract” to get the unique item in a set.

2.2.2 Accessing Nested Structures

The following query shows how to access data in a nested structure.

Q2: Find the Families where the husband lives in Waterloo.

select ffrom f in Familywhere city( address( husband(f))) = ”Waterloo”

The expression in the where clause can be seen as a path within the object compositionhierarchy. For each family, we apply to it the husband method which returns a Man object. Wethen apply the address method to the Man object, and then finally the city operator to extract thecity field from the address. We can access any data in the database using some such expression.The result of the query is a set of database objects. The next section will illustrate how the usercan create new entities in the select clause.


2.2.3 Constructing New Entities

The following query shows how to construct tuple values.

Q3: Find the first name and age of the minors.

select [name: f-name( name(p)), age: age(p)]from p in Personwhere age(p) < 18

The tuple operator [ ] allows the construction of tuple values. It takes as arguments fieldnames and their descriptions. A description can be made through any valid RELOOP expression.Therefore, as we will see later, this feature allows us to build nested structures.

As mentioned earlier, the result of a query is always a set value. Therefore, since the aboveselect clause constructs tuples, the final answer to the query will be of the following type: {[name:String, age: Integer]}

2.2.4 Nested Structures

In this section we will illustrate both how to create nested structures and how to flatten nestedones.

The first query shows how to construct a flat structure from a nested one.

Q4: Find the mother of every child.

select [child: child, mother: mother(f)]from f in Family

child in children(f)

An interesting observation is that it is valid to define a variable domain in the from clausethrough the application of a method on a previously defined variable. In fact, any RELOOPexpression which returns a set value can be used for this purpose. This feature allows the useraccess to all levels of a nested structure, and thus, allows easy unnesting.

The last two queries show, conversely, how to build nested structures from flat ones.

Q5: Associate with each woman her minor children’s first names,

assuming that a woman may appear in different families.

make-method minors (woman: mom)select f-name( name(child))from f in Family

child in children(f)where age(child) < 18 and

mother(f) = mom

select [mother: mom, children: minors(mom)]from mom in Woman


The sets of minor children that we wish to construct are not database entities, therefore, theymust be constructed. One way in which we can accomplish this is to create a method on the Womanclass which will collect the appropriate set of children. The above method examines every childin every family and picks those that are under 18 years of age and whose mother is the method’sargument. Once the method is complete, we only need to build a simple query that creates a tupleresult from which we call the method. Therefore, the type of the final result is {[mother: Woman,children: {String}]}

In this example, the nesting was done on an object. However, often we work with values insteadof objects. Since we cannot use methods on values, we must find another way of formulating theabove query.

The following is an alternate RELOOP expression for the above query which can deal with bothobjects and values.

select [mother: mom, children:select f-name( name(child))from f in Family

child in children(f)where age(child) < 18 and

mother(f) = mom]from mom in Woman

The only difference between the two formulations is that in the latter we find the body ofthe method where the method was previously called. It is unclear from the literature written onRELOOP whether recursive calls to methods are permitted.

2.2.5 Some Additional RELOOP Facilities

We conclude our discussion of RELOOP by introducing two of the language’s facilities, the collapseoperator, and the set constructor. The collapse operator flattens a set of sets.

For example, collapse ({{1,2}, {2,3}}) = {1,2,3}The set constructor {} builds a set value from its arguments.The following two queries illustrate the uses of these facilities.

Q6: Find the people that have at least one child.

select mother(f)from f in Familywhere children(f) 6= {}

union

select father(f)from f in Familywhere children(f) 6= {}

This query first finds all the women and then all the men that have at least one child, andthen it merges the results. The union of the results is possible because women and men are bothpersons.


Using the collapse operator, we can also use a different query formulation to obtain the sameresult.

collapse (select {mother(f), father(f) }from f in Familywhere children(f) 6= {})

In this query, for each family with more than one child, a set containing a mother and a fatheris constructed. Therefore, the result of this query is a set of sets. The collapse operator flattensthe result into just one set.

An interesting observation is that this query does not use an instruction such as “distinct” toavoid duplicate values. This is because of the fact that we are manipulating sets in the selectclause, and thus duplicates (and order) have no meaning.

2.2.6 Aggregate Operators

Aggregate operators, like those we are familiar with in SQL(sum, avg, ...), are not yet implemented, but will be available in the final version of RELOOP.

2.3 OSQL

Hewlett-Packard Laboratories are currently developing the Iris [FAC+89] object-oriented databasemanagement system (OODBMS). Iris is a research prototype of a next generation DBMS, intendedto meet the needs of new and emerging applications.

Currently, one of the three interactive interfaces that is provided by Iris is Object SQL (OSQL)[FAC+89]. OSQL is an object-oriented extension of SQL.

A Sample OSQL Database Schema

Person = (first-name Charstring required,last-name Charstring required,age Integer,city Charstring),

Employee = (first-name Charstring required,last-name Charstring required,age Integer,city Charstring,company Charstring,salary Integer),

Employee is a subclass of person.

All the fields in the above classes can be accessed through property functions such as thefollowing.


first-name: Person ⇒ Charstringlast-name: Person ⇒ Charstringage: Person ⇒ Integercity: Person ⇒ Charstringcompany: Employee ⇒ Charstringsalary: Employee ⇒ Integer

Note that the above property functions allow complete access to all data; thus, there is noreason to violate encapsulation.

The following is a presentation of the features provided by OSQL [FAC+89]. Example queriesaccess a sample database implemented on the above database schema.

2.3.1 Syntax

OSQL basically adheres to the SQL query format, but its syntax differs more from SQL than thatof RELOOP. For example, consider the following simple query:

Q1: What are the first names of the minors ?

Select distinct first-name(p)for each Person pwhere age(p) < 18;

An obvious difference is that the from keyword has been replaced by for each.

We will see further differences in OSQL syntax in subsequent sections, especially shortcuts wecan make in a query formulation when calling functions from the query.

OSQL does provide SQL’s “distinct” operator which eliminates duplicates, while RELOOP doesnot. (RELOOP select statements automatically eliminate duplicates).

2.3.2 Referencing Objects Through Keys

Objects may be referenced directly rather than indirectly, through their keys. Interface variablesmay be bound to objects on creation or retrieval and may then be used to refer to the objects insubsequent statements.The following OSQL expression creates an instance of the person object.

Create Person (first-name, last-name, age, city) instancep1 (John, Doe, 25, Waterloo);

Q2: Find the age of the person with identifier p1.

Select age(p1);

First we create a new person and then give this person the unique identifier p1. We can thenuse the identifier when we want to refer to its object as in the above select clause.

Notice that no such feature is provided in RELOOP.


2.3.3 Using Functions in Queries

User-defined functions and Iris system functions may appear in where and select clauses to giveconcise and powerful retrieval.

For example, let us define the function Marriage as,

Create function Marriage(Person p) ⇒ forward;

(forward specifies that the function implementation will be specified later.)This function returns a compound result – the spouse and the date of the marriage.We have already seen the use of a function in the select clause in the previous section. The

following example shows how a function is invoked in the where clause.

Q3: Find Bob’s wife and the date on which they where married.

Select s, dfor each Person s, Charstring dwhere = Marriage(Bob);

Here we are comparing every combination of a Person object and a character string to the resultof the Marriage function when applied to the Person Object “Bob”. Note that the above exampledemonstrates that OSQL permits queries to range over infinite domains.

The above query can be abbreviated to:

Select each Person s, Charstring dwhere = Marriage(Bob);

Or even more simply to:

Select Marriage(Bob);

The last two query formulations stray from the SQL select from where block. The firstabbreviation combines the two keyword phrases Select and for each into one “Select each”phrase. The second abbreviation does not explicitly specify the type of its result, but implies thatit will be of the form of the result of the Marriage function.

Notice that these types of syntactical shortcuts are not possible in RELOOP.

2.3.4 Accessing Nested Data

As in RELOOP, the expression in the where clause can be seen as a path within the objectcomposition hierarchy. All data in the database can be accessed using an expression such as thefollowing:

Q4: Find all families where the husband lives in Waterloo.

Select ffor each Family fwhere city( husband (f)) = “Waterloo”


2.3.5 Aggregate Functions

Several aggregate functions have already been implemented in OSQL. In previous sections we sawfunctions defined on individual objects. There is also a need for functions over multi-sets (setscontaining duplicates).

The following query illustrates the use of the function “average”.

Q5: Find the average salary of all employees.

Select yfor each Real ywhere y = Average (select Salary(x) for each Employee x);

The nested select function returns a mult-set of values which is used as input to the Averagefunction.

Iris semantics are such that nested select statements and function calls are equivalent in re-trievals. Thus, given a function, EmpSal(), that returns the salaries of all employees, the abovequery could be rewritten by replacing the nested select statement with a call to Empsal.

As mentioned earlier, aggregate operators have not yet been implemented in RELOOP but willbe available in the final version. It will be interesting to see how a query such as the above will beimplemented since the RELOOP select statement eliminates duplicates.

3 Algebras for Object-oriented Database Systems

One of the strong points for the relational database model is the existence of a formal algebra forquery processing and optimization. Attempts are now being made to develop a similar algebrafor object-oriented database systems. This section will examine four such algebras: an algebra forthe EXTRA DDL and EXCESS DML [VD91], an algebra for the ENCORE data model [SZ], analgebra for RELOOP [CDLR89], and OOAlgebra, an algebra for OODAPLEX [Day].

3.1 An Algebra for EXTRA/EXCESS

3.1.1 The EXTRA Data Model

In the EXTRA data model [VD91] complex objects are defined as complex structures. Thesestructures are built from tuple, multiset, and array type constructors. Multisets are sets withduplicates. Arrays are one-dimensional only, and of variable length. Methods can also be defined fora structure as a sequence of EXCESS statements. Object identity is supported with the ref keywordwhich specifies a reference to an extant object (ie: an OID). In this way, ref can actually be viewedas another type constructor. Inheritance is handled by allowing one structure to inherit another,which causes all attributes and methods of the structure being inherited to become attributes andmethods in the new structure. Any inherited attribute or method can be overridden in the newstructure definition. Multiple inheritance is also allowed.


3.1.2 The EXCESS Query Language

The EXCESS query language allows for both querying and updating of the database. Any structuredefined in EXTRA can be fully manipulated by EXCESS. EXCESS can also be extended with user-defined functions and operators written in the E language.

As an example, consider the following EXCESS query which finds the names of the children ofall employees who work for a department on the second floor.

range of E is Employees

retrieve (C.name) from C in E.kids

where E.dept.floor = 2

Here, Employees has been created in EXTRA to be a set of Employee structures (objects).The “.” notation is used to access the properties of a complex structure.

3.1.3 The Algebra

The EXCESS/EXTRA algebra is formally defined as a set of n-ary operators that operate on a setof objects, and are closed with respect to this set of objects. Elements of this set of objects arecalled “structures”. A structure contains two components, a schema and an instance. A schema isa digraph with the nodes representing one of the four type constructors (multisets, tuples, arrays,or refs) or a simple scalar value, and the edges representing a “component of” relationship. So,a schema is just a digraph representation of a structure defined in the EXESS DDL. Since thealgebra operates on different “sorts” (different type constructors), it is referred to as a “multi-sorted” algebra.

3.1.4 Operators

The EXTRA/EXCESS algebra contains a large number of operators to handle the various featuresof the EXCESS query language. Instead of having all operators work on sets, as is done in manyalgebras, different operators are defined for each of the four “sorts” within the algebra.Multiset Operators

There are eight operators for multisets:

1. Additive Union: Combines two multisets and does not eliminate duplicates.

2. Set Creation: Creates a new multiset with a single member.

3. Set Apply: Apply an algebraic expression to all members of the multiset, replacing themwith the result of the expression.

4. Grouping: Partition a multiset based an algebraic expression.

5. Duplicate Elimination: Convert a multiset to a regular set.

6. Difference: Subtract the cardinality of set members to determine the result cardinality.

7. Cartesian Product: The regular set-theory Cartesian product.


8. Set Collapse: Takes a multiset of multisets and returns the union of the member multisets.

Tuple Operators

There are four operators for tuples:

1. Projection: The same as the relational projection except it works only on a single tuple.

2. Concatenation: Concatenates two tuples.

3. Field Extraction: Extracts a single field from a tuple. Unlike projection, the result is thefield, not a 1-tuple.

4. Creation: Create a new 1-tuple.

Array Operators

There are nine operators for tuples:

1. Creation: Create a single element array.

2. Array Extract: Extract a single element from an array.

3. Subarray: Extract a range of elements from an array and place them in a new array.

4. Array Concatenation

5. Array Apply

6. Array Collapse

7. Array Difference

8. Duplicate Elimination

9. Cartesian Product

The last six operators are the same as the corresponding set operators except that they preserveorder.OID Operators

Recall that EXTRA/EXCESS OIDs are just references to objects which exist in the databaseindependent of the objects which reference them. There are two operators for OIDs:

1. Dereference: Dereference an OID node within a schema. The OID node is replaced by whatit referenced.

2. Reference: This operator creates an OID for its operand and then inserts this new OID intoa schema.

Predicates

There is one additional operator within the algebra to handle predicates called COMP. Apredicate is specified with the COMP operator and it returns its input exactly if the predicate istrue, or nothing if the predicate is false. Predicates may only appear in this COMP operator.


retrieve (TopTen[5].name, TopTen[5].salary)

πname,salary(DEREF (ARR EXTRACT5(TopTen)))

Figure 1: EXCESS Query and Algebraic Representation

3.1.5 Examples

The EXTRA/EXCESS algebra will now be illustrated with two examples. For these examples adatabase is being used that contains a set of Student objects named Students, a set of De-partment objects named Departments, a set of Employee objects named Employees, and afixed length array of size ten of Employee objects called TopTen. All of these sets and the arraycontain references to objects, rather than the objects themselves. The Student, Employee, andDepartment objects are all tuples.

Figure 1 shows the first example. It contains an EXCESS query that returns the name andsalary of the 5th element of the TopTen array. Bellow the query is the corresponding algebraicrepresentation.

In this example ARR EXTRACT is the array extract array operator, DEREF is the deref-erence OID operator, and π is the projection tuple operator. This expression extracts the 5thelement of the TopTen array with the array extract operator, dereferences it to obtain the actualEmployee tuple, and then uses projection to obtain the name and salary.

Figure 2 gives a second example. Here the query retrieves the names of the departments of allemployees who work in Madison. To simplify the presentation of the algebraic representation, agraphical notation is used. In this notation an arc from A to B is used in place of B(A).

In this example SET APPLY has been abbreviated as S A, andTUP EXTRACT is the tuple extract tuple operator. The first expression in this example (bottommost) converts the multiset of Employee references to a multiset of actual bf Employee tuples. Thesecond expression selects all tuples such that the employee works in Madison. The third expressiondereferences the “dept” attribute of the employee tuples and replaces it with the actual “dept”value. The final expression preforms a projection on the “name” attribute to obtain the desiredresult (department names).

3.2 An Algebra for ENCORE

3.2.1 The ENCORE Data Model

A type in ENCORE [SZ] is an abstract data type. Each type has a name (which is used for typeequivalence), a set of properties, and a set of operations. Operations are functions that can beapplied to instances of the type. A property is a typed object that can be implemented as anactual value, or as a procedure or function that returns an object of the correct type.

Multiple inheritance is supported by allowing the specification of a set of supertypes in thedefinition of a type. A type inherits all of the properties and operations of its supertypes. Properties


retrieve (Employees.dept.name)where Employees.city = ”Madison”

S Aπname(INPUT )↑

S ADEREF (TUP EXTRACTdept(INPUT ))↑

S ACOMPTUP EXTRACTcity(INPUT )=“Madison′′ (INPUT )

↑S ADEREF (INPUT )

↑Employees

Figure 2: EXCESS Query and Algebraic Representation

cannot be overridden in inheritance. However, operators can be overridden as long as the operatorfor the new type has the same argument and output types as the operator from the superclass. Alltypes in encore are subtypes of the type Type.

ECORE also supports parameterized types, with the tuple and set parameterized types builtin.

The objects in ENCORE are simply instances of the type Object or subtypes of the typeObject. Type Object defines some basic operations for objects, such as comparison and type of.All objects in ENCORE have a unique identity.

It should be noted that sets and tuples are objects within ENCORE since they are instances ofthe set and tuple parameterized types. The set and tuple types are subtypes of the top level typeObject.

3.2.2 Operators

The ENCORE algebra has three groups of operators, the basic operators, the set operators, andthe object identity operators. Unlike EXTRA/EXCESS all of the operators in ENCORE operateon collections of objects. Also, while there are twelve operators in total, the group of four basicoperators are the main ones that would be handled by the query optimizer, compared to the largenumber that the optimizer for EXTRA/EXCESS must deal with.

The group of identity operators in ENCORE is of special note. Queries in ENCORE mayproduce new objects that did not previously appear in the database. These objects will have theirown unique object identities. It is possible that a new group of objects could be created whenonly a single object is needed. That is, the objects have different identities but are shallow equal.To handle this, the concept of i-equality is introduced, where i indicates how “deep” a comparison


should be made. The identity operators make use of this i-equality.

The Basic Operators

There are four basic operators defined in the algebra. For the following definitions, S and Rare collections of objects, p is a first-order Boolean predicate, the Ai are attribute names havingtype Ti, and the fi are functions returning objects of type Ti.

1. Select(S, p) = {s | (s in S) ∧ p(s)}

Select takes a set of objects and returns only those that satisfy the predicate.

2. Image(S, f) = {f(s) | s in S}

Image is similar to the apply operators of EXTRA/EXCESS. It applies a function to eachobject in the given collection.

3. Project(S,< (A1, f1), . . . , (An, fn) >) =

{< A1 : f1(s), . . .An : fn(s) > | s in S}

Project is an extension of Image and allows more than one function to be applied to theobjects. For each object in S a tuple is returned which contains the result of applying eachof the functions (fi) to the object. Note that this operator actually generates new objects(tuples) that become part of the database.

4. Ojoin(S,R,A,B, p) = {< A : s,B : r > | s in S ∧ r in R ∧ p(s, r)}

Ojoin is used to create new relationships in the database. For each pair (s, r) where s is inthe collection of objects S and r is in the collection of objects R, it returns a tuple containings and r if they satisfy the specified predicate. In other words, it generates a set of tuplescontaining two objects that have a particular relationship. As with Project these tuples arenew objects that become part of the database.

Set Operators

The algebra contains the following set operators:

1. Union, Difference, Intersection

These are the standard set operators where set membership is determined by object identity.

2. Flatten

This operator takes an object that is of type set of sets and returns an object that is of typeset.

3. Nest, UnNest

These operators perform nesting and unnesting of tuples on a single attribute. This is bestillustrated with an example. The UnNest operator on B on the set {< A : o1, B : s1 >},where s1 is a a set containing o1 and o2, would produce the set of tuples {< A : o1, B : o2 >,< A : o1, B : o2 >}. The Nest operator is essentially the inverse operation.


Identity Operators

The algebra contains two identity operators:

1. DupEliminate(S, i)

This operator keeps only one copy of i-equal objects from a collection of objects.

2. Coalesce(S,Ak, i)

For a collection of tuple objects, this operator eliminates i-equal duplication in the Ak com-ponents (attributes) of the tuples.

3.3 An Algebra for RELOOP

3.3.1 RELOOP

Since the RELOOP query language has already been discussed in this paper, no further mentionof it will be made here. Instead, the reader is directed to the appropriate sections earlier in thispaper.

3.3.2 The O2 Data Model

The O2 data model [CDLR89] consists of a set of classes. Associated with each class is a typeexpression. A type expression is recursively defined and contains atomic type expressions (integer,real, etc), class names, tuple type expressions, and set type expressions. Note that a class namecan appear in a type expression, and this is how inheritance is achieved.

An object in O2 is a value corresponding to some type expression (class). Associated with eachobject is a unique object identifier. For each class, there is an assignment function that returns theset of object identifiers for the objects that are members of that class.

O2 also allows for the definition of methods.

3.3.3 Algebra Operators

There are only a small number of operators in the RELOOP algebra, and all of these operate onsets of tuples (relations). As with OOAlgebra (described in a later section), most of these operatorsare the same as those found in the relational algebra. The operators are presented formally below.In the following ai is an attribute name, vi is a value, and r or ri is a relation (set of tuples).

1. Set Operators

The algebra supports the standard set operators union, intersection, and difference.

2. Projection

The projection operator in the RELOOP algebra is the same as that in the relational algebra.It removes all but a specified set of attributes from all tuples in a set.


3. Selection

The selection operator is similar to that from relational algebra. Given a boolean predicateand a set of tuples, it returns the set of tuples for which the predicate is true. The predicateexpression has as its argument a set of tuple values, and has the form E1θE2, where E1 andE2 are expressions and θ is one of the traditional comparators. Note that a predicate cannotuse the boolean operators.

4. Cartesian Product

The Cartesian product operator is similar to that from relational algebra, but behaves slightlydifferently. A list of attribute names must be given, along with a corresponding list of sets oftuples. The result is a set of tuples containing the cross product of the tuples from the givensets, using the given attribute names. Formally this is specified as:

⊗ < a1, . . . , an > (r1, . . . , rn) =

{[a1 : v1, . . . , an : vn] | ∀i ∈ [1, n], (vi ∈ ri)}

5. Reduction

The reduction operator is used to remove the values from tuples, discarding the attributenames. The result of the reduction operator on a set of tuples is the set of values from thetuples. Formally this is:

4(r) = {v | [a : v] ∈ r}

3.3.4 Example

To illustrate the RELOOP algebra an example will now be given. For this example a databasewith the following properties is used:

• Person = {i1, i2, i3} where i1, i2, i3 are object identifiers.

• age(i1) = 2, age(i2) = 18 and age(i3) = 19

• first-name(name(i1)) = “John”,first-name(name(i2)) = “Joan”, andfirst-name(name(i3)) = “Jack”.

• last-name(name(i1)) = “Doe”,last-name(name(i2)) = “Doe”, andlast-name(name(i3)) = “Smith”.

Figure 3 gives a query to find the first name and age of all of the adults in the database describedabove. Below the query is the corresponding algebraic representation.

The first expression in this example will produce the set of all possible tuples with the type{[p:Person, name:String, age:Integer]}. The second expression is a selection operation on the givencondition and will produce the following tuples:{[p:i2, name:Joan, age:18], [p:i3, name:Jack, age:19]}

The final expression is a projection on name and age and will produce:{[name:Joan, age:18], [name:Jack, age:19]}


select [name: first-name(name(p)), age: age(p))]from p in Personwhere age(p) ¿= 18

R1 ← ⊗ (dom(elt(type(Person))),dom(type(first-name(name(p)))), dom(type(age(p))))

R2 ← σ = 18∧ name = first-name(name(p)) ∧ age = age(p)∧ p ∈ Person> (R1)

R3 ← π (R2)

Figure 3: RELOOP Query and Algebraic Representation

3.4 An Algebra for OODAPLEX

3.4.1 OODAPLEX Data Model

The OODAPLEX data model [Day] is based on types, entities, and functions. An entity is simplyan object in the database and each entity has its own unique identity. Each entity is of a particulartype. The definition of a type includes the name of the type and the signatures of its functions(described below). A type also contains an extent which gives the set of instances (entities) of thetype in the database.

OODAPLEX makes a distinction between types for which instances are not actually manipu-lated (called immutable types), such as integers, reals, and booleans, and types for which instancesmust be explicitly manipulated by users (called mutable types). The aggregate types sets, multisets,and tuples are also supported.

Properties of entities are modeled with functions. A function can accept zero or more inputarguments, and produce zero or more output arguments, and the type for all arguments must bespecified. The signature of a function is a specification of the function name, and its input andoutput arguments. A function is considered to be attached to all types which are mentioned in thefunction’s input arguments.

3.4.2 OODAPLEX Queries

A query in OODAPLEX is simply a function that accepts a set of input arguments, and returnsa set of output arguments. These functions effectively map one collection of entities to anothercollection of entities. Note that the set of input and output arguments is not necessarily givenexplicitly, but may be deduced from the body of the query (function).

The body of an OODAPLEX query is a set of expressions. These expressions contain variables,constants, function names, and predicates. Looping constructs also exist to handle sets, multisets,and tuples.

OODAPLEX also supports database updates by allowing for creation and destruction of entities,


and modification of sets, tuples, and functions.

3.4.3 OOAlgebra

The algebra for OODAPLEX is named OOAlgebra. It is based on, and is very similar to, the PDMalgebra. OOAlgebra is based on the traditional relational algebra, and has many similar operators.

OOAlgebra operates on sets, multisets, and tuples. Here, sets are treated as sets of tuples, andcan also be referred to as relations.

3.4.4 OOAlgebra Operators

The algebra contains two groups of operators, one for tuples and one for sets. These two groupsare described below. In addition to these operators, a large number of aggregate operators such ascount, sum, and average are included in the algebra.Tuple Operators

The following tuple operators exist in the algebra:

1. rename(T,L)

Here, T is a tuple, and L is a list of pairs of labels in the form [Lin1 : Lout1, . . .Linn, Loutn].The rename operator returns a tuple the same as T but with label Lini changed to Louti.

2. project(T, [L1, . . . , Ln])

Project returns a tuple containing only the components from T that are labeled with one ofthe Li’s.

3. select(T,B)

Here, T is a tuple and B is a boolean-valued function. This operator returns T if B(T ) istrue, or null otherwise.

4. product(T1, T2)

This operator produces a tuple that is the concatenation of T1 and T2.

5. apply append

This operator is similar to the apply operators in other algebras in that it applies a functionover each tuple of input arguments. However, unlike the apply in other algebras, apply appenddoes not replace the tuples with the function result, but instead appends the output of thefunction to the input of the operator.

Set Operators

In addition to the basic set operators, union, intersection, and difference, the following setoperators also exist in the algebra:

1. rename(S,L)

Here, S is a set of tuples. This is the same as the rename for tuples except that the renameis performed on each member of the set.


major(the s in extent(student) where ssno(s) = “123456789”)

T1 := rename(ssno(P:person, N:string)[P:S])T2 := apply append(extent(student)(S:student), T1(S:person, N:string))T3 := select tuple(T2(S:student, N:string), equals(N, “123456789”))T4 := apply append(T3(S:student, N:string), major(S:student, D:department))T5 := project(T4(S:student, N:string, D:department), [D])

Figure 4: OODAPLEX Query and OOAlgebra Representation

2. project(S, [L1, . . . , Ln])

This is the same as project for tuples, except that the projection is applied to each memberof the set.

3. remove duplicates(S)

This operator removes duplicates from a multiset.

4. select(S,B)

This is similar to the tuple select operator. The result of the operator is the set of tuplesfrom S for which the the boolean expression evaluates to true.

5. select tuple(S,B)

This operator first performs a select on S. If the result is a singleton set, then the single tuplefrom the set is returned. Otherwise, an error condition is raised.

6. product(S1, S2)

The result of this operator is a set containing each member of S1 concatenated with eachmember of T2.

3.4.5 Examples

The OOAlgebra will now be illustrated through two examples.

Figure 4 gives an OODAPLEX query which returns the department entity for the departmentthe student with social security number “123456789” is majoring in. Below the query is the corre-sponding OOAlgebra representation.

Figure 5 gives a second example. The query in this example returns the name and age of allpeople in the database. The result is returned as a multiset of name, age pairs.


for each p in extent(person)[name(p), age(p)]

end

T1 := apply append(extent(person)(P:person), name(P:person, N:string))T2 := apply append(T1(P:person, N:string), age(P:person, A:integer))T3 := project(T2(P:person, N:string, A:integer), [N, A])

Figure 5: OODAPLEX Query and OOAlgebra Representation

3.5 Conclusions

Each of the four algebras presented here has a unique approach to creating an algebra for an object-oriented database. Two of the algebras, RELOOP and OOALGEBRA are based on the originalrelational algebra. They take the path of extendeding and modifying the relational algebra for theobject-oriented world. Both contain a fairly small number of operators that are closely related tothose found in the relational algebra. While the RELOOP algebra operates only on sets of tuples,the OOALGEBRA operators operate on both sets and individual tuples.

The other two algebras, EXTRA/EXCESS and ENCORE have taken the approach of startingfresh, and not simply modifying the relational algebra. Of these two, EXTRA/EXCESS has themost novel approach. It contains a large number of operators for the different “sorts” supported.Object identity is handled by two special operators, reference and dereference. By comparison,ENCORE also has a fairly large number of operators, except that all of them operate on collectionsof objects. ENCORE also introduces the concept of i-equality to help solve some of the problemsassocited with object identity.

One key question about any algebra is how many operators are required. If care is not taken inchoosing the operators, the algebra may contain a large number of operators that are never usedby the query optimizer. Or, it may lack some operators that would make query optimization mucheasier. The two relational based algebras presented here have taken the “safe” approach. They makethe assumption that the existing relational algebra contains the correct set of operators, and onlymake changes as required by the new environment. Of the other two algebras, EXTRA/EXCESShas the most variety in operators. However, the authors claim that because of the multi-sortednature of the algebra, only certain operators are applicable at certain times. This, operator choice isgreatly simplified, and all operators will be used. While the ENCORE also contains a large numberof operators, there is only a small set of basic operators which would be used by the optimizer,and these are in some ways similar to the relational operators. The additional operators have beenadded only to address certain problems specific to the algebra.


4 Overall Conclusions

The query language used to access a database and the underlying algebra form one of the mostimportant components of a database management system. Extending SQL into a query languagefor OODBMS’s would both make it easier for the many relational database users to move to themore powerful object-oriented paradigm, and would provide a uniform standard across the databasefield. Although we have examined several possible candidates for this component, much researchstill needs to be done before this dream can be realized.

References

[AK89] S. Abiteboul and P.C. Kanellakis. Object Identity as a Query Language Primitive. InProceedings of the ACM SIGMOD Conference, 1989.

[BCD89] F. Bancilhon, S. Cluet, and C. Delobel. A Query Language for the O2 Object-OrientedDatabase System. In Proceedings of the Second International Workshop on DatabaseProgramming Languages, 1989.

[CDLR89] S. Cluet, C. Delobel, C. Lécluse, and P. Richard. RELOOP, an Algebra Based QueryLanguage for an Object-Oriented Database System. In Proceedings of the Second Inter-national Workshop on Database Programming Languages, 1989.

[Day] U. Dayal. Queries and Views in an Object-Oriented Data Model. In Database Program-ming Languages.

[dB90] Jan Van den Bussche. A Formal Basis for Extending SQL to Object-Oriented Databases.In Bulletin of the European Association for Theoretical Computer Science, number 40.February 1990.

[FAC+89] D.H. Fishman, J. Annevelink, E. Chow, T. Connors, J.W. Davis, W. Hasan, C.G. Hoch,W. Kent, S. Leichner, P. Lyngbaek, B. Mahbod, M.A. Neimat, T. Risch, M.C. Shan, andW.K. Wilkinson. Overview of the Iris DBMS. In Object Oriented Concepts, Databases,and Applications, pages 219–250. 1989.

[Fis88] D.H. Fishman. An Overview of the Iris Object-Oriented DBMS. In Proceedings of theIEEE Spring COMPCON, 1988.

[KL89] M. Kifer and G. Lausen. F-Logic: A Higher-Order Language for Reasoning about Ob-jects, Inheritance, and Scheme. In Proceedings of the ACM SIGMOD Conference, 1989.

[SZ] G.M. Shaw and S.B. Zdonik. An Object-Oriented Query Algebra. In Database Program-ming Languages.

[VD91] S.L. Vandenberg and D.J. DeWitt. Algebraic Support for Complex Objects with Arrays,Identity, and Inheritance. In Proceedings of the ACM SIGMOD Conference, 1991.

sql extensions and algebras for object-oriented query languages · 2002. 12. 3. · sql extensions...

Documents