mapping an extended er model to a spatial relational model · grouping, etc.) entity-relationship...

Mapping an extended ER model to a spatial relationalmodel

Areti Dilo

June 22, 2000

Abstract

Observing that many spatial applications, quite often, need more complex spatial objectstypes than the basic ones — points, lines, and areas — this research is dedicated toidentifying what are the most important and mostly needed spatial object types. A setof spatial elements is chosen to cover these needs and formalisation of them is given usingtopology and graph concepts. The formal specifications of spatial elements are refined intospecifications of spatial data types, which can be used in conceptual modelling of spatialapplications not only as data types for the spatial attributes of spatial objects, but also toset constraints in the objects class extension. Considering an extended (with specialisation,grouping, etc.) Entity-Relationship model and a nested relational data model, a set of rulesis defined for the translation of a conceptual schema to a relational schema.

2

Acknowledgment

I would like to express gratitude to The Netherlands Fellowship Programme for providingthe fellowship for this study.

Special thanks to my main supervisor, Dr. Rolf de By, for being patient with all my delaysin the time schedule, and for clarifying the ideas every time I was feeling lost in all kindsof technical details.

Thanks to all the ITC teachers who helped us to increase our knowledge.

A lot of thanks to Angela, Tal, Nirvana, Fusun, Vahit, Romina and all our classmates formaking nice the time we spent together.

Special thanks to Milton, for his willingness to help with comments and answers to all myquestions (small and important ones).

My gratitude to Mohamed, for always being disposed to help with everything.

Arta

3

Contents

1 Introduction 61.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.5 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.6 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Literature Review 92.1 Formal methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 The Z Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Conceptual models for GIS . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3 Data Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 Spatial Elements 293.1 Graph notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2 Spatial Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.2.1 Zero-dimensional elements . . . . . . . . . . . . . . . . . . . . . . . 393.2.2 One-dimensional elements . . . . . . . . . . . . . . . . . . . . . . . 393.2.3 Two-dimensional elements . . . . . . . . . . . . . . . . . . . . . . . 45

3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4 Logical Model 524.1 Graph Design Schemas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.2 Spatial Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.3 Mapping Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5 Conclusions 64

A 66A.1 Topology Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66A.2 Z at work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4

List of Figures

2.1 Sort and Kind hierarchy of a spatial DB . . . . . . . . . . . . . . . . . . . 28

3.1 Graph hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.2 Extended Spatial ER Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 383.3 Examples of line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.4 Examples of line collections . . . . . . . . . . . . . . . . . . . . . . . . . . 413.5 Examples of planar line collections . . . . . . . . . . . . . . . . . . . . . . 433.6 One-dimensional elements hierarchy . . . . . . . . . . . . . . . . . . . . . . 443.7 The relation between one-dimensional elements and graphs . . . . . . . . . 453.8 Examples of region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.9 Examples of region collections . . . . . . . . . . . . . . . . . . . . . . . . . 483.10 Two-dimensional elements hierarchy . . . . . . . . . . . . . . . . . . . . . . 49

4.1 A (partial) hierarchy of spatial data types . . . . . . . . . . . . . . . . . . 63

5

Chapter 1

Introduction

1.1 General

Conceptual modelling offers important advantages compared to direct logical design mod-elling approaches. This is because users may express their knowledge about the applicationusing concepts that are independent of computer terms (concepts). Another important rea-son is that the model is independent of the software tool with which the application willbe implemented. It can also facilitate the understanding of the application model, beinga model that is closer to a general way of thinking, not specific for the particular field ofcomputer science. But, this brings the necessity of translating the conceptual schema to alogical schema, which is the basis for a real implementation in the computer software.The relational model is a firmly established data model, which is implemented in manycommercial relational DBMS packages. The relational model is one of the three major datamodels that are currently used in commercially available DBMSs, the others two being thehierarchical model and the network model.The Entity-Relationship (ER) model is probably the most used conceptual model in thefield of database design. The ER model accommodates a number of well-chosen primitivesto define the information contents of the future database, in a format that is intuitivelyappealing to both specialists and non-specialists, and which also allows the downwardstranslation in the conceptual / logical / physical database schemas to obtain a represen-tative relational database schema [8].

1.2 Problem Definition

What is missing in the spatial domain, is a conceptual model for spatial applications, thatis accepted as a (kind of) standard. Previous work has been done in building conceptualmodels for spatial applications [11], [18], [14], and in formalising the models [13], [12], [7].There is still place for a complete formalisation and separation of implementation issuesfrom the conceptual ones ([7]), or for a more complete list of spatial elements needed inmany spatial applications. Furthermore, (it seems) there is no work done in a formal

6

translation from a conceptual model to a logical representation of spatial data.Accepting the ER model as a good solution for the conceptual modelling phase of spatialdatabase design, and an extensible DBMS based on the relational model, as the spatialdatabase system, the aim of this research is to find a set of rules for mapping an extendedER diagram to the logical schema.A spatial database system deals with different kinds (approaches) of spatial data: the field-based approach, the object-based approach, fuzzy objects, etc. What will be considered inthis research work is the object-based approach to spatial data.

1.3 Research Questions

To help solving the raised problem, the following questions should be answered:

• What are the spatial elements that cover the spatial applications (considered here)?

• How can they be described formally?

• How can they be included (fitted) in the standard ER model?

• What is a proper implementation of spatial elements?

• What rules can be formulated for translating from the conceptual model to the logicalmodel?

1.4 Objectives

To reach the aim of the research, first a set of spatial elements will be defined, whichcover the needs of (most of the) spatial applications. Then, the formal specification ofthese spatial elements will be given, and the role they play in the ER model will bedefined. Accepting a non-first normal form as the data model of the target system (wherethe spatial data will be implemented), a concrete specification of spatial elements will begiven, which is the basis of their implementation as (complex) data types in a relationaldatabase system. Considering the (data) structure of these spatial elements and somecommon (basic) relationships between them, a set of rules will be defined for translatingfrom an ER diagram to a (database) relational schema.

1.5 Approach

What will be done in this thesis work, in an ordered fashion, is the following:

• Defining the spatial elements that can be used to build a conceptual schema forspatial applications.

7

• Looking at some commercial spatial database systems — what spatial elements dothey offer? Are these elements covering them?

• Finding a proper formalization of the spatial elements and defining their role in theER model.

• Refining the formal specifications of the spatial elements into more concrete repre-sentations, which are closer to (computer) implementation.

• Defining rules for mapping from the extended ER model to the logical model.

1.6 Structure of the Thesis

The thesis consists of five chapters. This chapter gave an introduction to the aim of thisresearch, the objectives of the research and the approach followed. Chapter 2 will give ashort introduction to Z, the formal language that will be used for the specifications in thecoming chapters. Later in chapter 2 are resumed some of the papers dealing with spatialdata modelling and data modelling in general. Chapter 3 is dedicated to the formalisa-tion of spatial elements, and chapter 4 is refining the specifications given in chapter 3 inimplementation schemas. It gives clues on how they can be used in modelling of spatialapplications. Chapter 5 closes the thesis with some conclusions.

8

Chapter 2

Literature Review

This chapter consists of three parts: Section 2.1 discusses the importance of formal methodsin software engineering, section 2.1.1 describes the Z Notation, which is a formal languageused for formal specifications; Section 2.2 gives some ideas on what should be done for theapplication of formal methods in GIS, and describes work done in conceptual modellingof spatial applications; Section 2.3 resumes the ideas given in some papers written aboutdata modelling.

2.1 Formal methods

For many years, the software industry has been producing software, employing much effortin programming. But still, the vast majority of computer code is handcrafted from rawprogramming languages by artisans using techniques they neither measure nor are ableto repeat consistently [9]. That brings the danger of having errors in the written code,for which there is no way of controlling except for testing the final software, and it alsobrings the risk of overwork of producing (probably different) code for the solution of similarproblems.

The first of them, testing the software, tells us only that there are bugs in the program (iftesting is good enough to find some of them), but it can never prove that there are no bugsin the software. Putting the program design in algebraic forms, makes it possible to avoidserious mistakes, giving the possibility of controlling (proving) the correctness of programdesign.

The second problem, duplication of written code, can be solved by first structuring theproblem in small parts, writing code for each part, and then using the technology thatsupports interchangeability of software parts, to put together existing and (just) writtencode.

Formal methods help in solving or decreasing those problems. They help in writing robustcomputer programs, because they provide for

• techniques of structuring the problem (the program is supposed to solve), and

9

• an output that is concise, precise and unambiguous.

They consist of:

• Formal specifications — that tell what the system (solution of the problem) shoulddo.

• Verified design — that tells how the system is going to do the job.

The methodology underlying formal methods is that one first specifies the behaviour of apiece of software, then that software is written and one proves whether or not that actualimplementation meets its (formal) specification. This final aspect of formal methods isknown as verified design. This term applies to the relation between a formal specificationand the software component that is written to meet that specification. Clearly, we wantthe software component to satisfy its specification; proof techniques have been developedto enable someone to prove that a software component meets its specification [10].To write formal specifications we use formal languages. The branches of mathematicsmostly used in formal languages, are set theory and first-order logic. The formal languagethat will be used here is the Z notation, the ingredients of which are:

• some basic mathematical types (e.g., set, relation, etc.) and some operators definedon them. Other types can be defined using the basic types. The language providestools for this.

• some more complex structures that are capable of carrying semantics (e.g., schemas)or defining rules or properties (e.g., axioms, general definitions).

• some mathematical laws, rules and proof methods (e.g., mathematical induction)that makes it possible to reason effectively about the way a specified system willbehave.

What is the output of formal methods in describing a system, using the Z notation? Thesemethods define:

• States in which the system can be, and

• operations on states.

Both can be described by Z schemas. The description of the system is given at an abstractlevel by abstract schemas and at a concrete level by concrete schemas, which are closerto computer concepts. Refinement is needed to show the correspondence between the twolevels, abstract schemas and concrete ones. This is an orthogonal process to the systemdescription levels (abstract, concrete and other that can be needed as intermediate) thathas two parts: data refinement and operation refinement.Formal specifications are at a middle stage in the software life cycle. They come afterwriting the requirements document in a natural language (like English or Spanish), andbefore proceeding with the system implementation in a programming language. They are

10

not a replacement of the natural language description of the system, only a complementto it. For being concise and unambiguous they can serve as a reliable reference point forthe persons who write the customer requirements, those who will implement the program,others that will test the results and also for writing the reference manuals of the system[9].

The next section will first describe a standard collection of mathematical symbols, then thefeatures of the Z language, and, at the end, we will talk about data and function refinement.An example of the implementation of a Birthday Book will be presented to illustrate theuse of Z. Some definitions needed for it will be in given in this section, and a more completesolution is given in Appendix A.2.1 This example shows how schema calculus can be usedto modularise a specification and how data refinement is used to relate specifications anddesigns.

2.1.1 The Z Notation

One of the key ideas of Z is that the specification and implementation should be keptseparate. The specification should precisely state what the eventual piece of softwareshould do and not how it is to go about achieving its task. Separating specifications andimplementations results in a separation of the often conflicting task of correctly solvingthe problem in hand and that of building an efficient piece of software. Something that isimportant to realise is that writing a formal specification is quite different from writing acomputer program [10].

A Z specification is a combination of mathematical language statements that strictly defineor prove system properties, and natural language statements that describe what is said ordone by the mathematics used.

Basic Notions

Some of the mathematical concepts used in the Z language are sets (special sets offeredby Z are naturals — denoted by , integers — denoted by �), bags, sequences, relations,functions, etc. Built upon those are other concepts like Cartesian product (X × Y of Xand Y ); the power set (denoted by �X , which is the set of all subsets of X ); finitenessapplicable to sets (denoted by �X , which is the set of finite subsets of X ), and the #operator for the number of elements in a finite set; the nonemptiness concept that isapplicable to sets, bags, sequences (e.g seq1 X is the set of nonempty finite sequences onX ), functions, etc. Definition, properties and operations on some of these will be givenbelow (taken from [9]).

Relations — A relation R from (the set) X to (the set) Y is a set of pairs (x, y) wherex ∈ X and y ∈ Y , thus R ⊆ X × Y . The element (x , y) ∈ R can also be written asx 7→ y ∈ R. X ↔ Y denotes the set of all relations between X and Y .

1The example and its solution is taken from [22].

11

The domain of a relation R is that subset of X , for which each element is related (byR) to at least one element of Y . Its definition in Z is

dom R = {x ∈ X ; y ∈ Y | x 7→ y ∈ R • x}

The range of a relation R is that subset of Y , for which each element is related toat least one element of X . Its definition in Z is

ran R = {x ∈ X ; y ∈ Y | x 7→ y ∈ R • y}

Domain restriction (over A ⊆ X ) is again a relation: all the pairs of R, which firstmembers are elements of A. Its definition in Z is

A� R = {x ∈ X ; y ∈ Y | x 7→ y ∈ R ∧ x ∈ A • x 7→ y}

Range restriction (over B ⊆ Y ) is a relation that has all the pairs of R, which secondmembers are elements of B. The definition in Z is

R � B = {x ∈ X ; y ∈ Y | x 7→ y ∈ R ∧ y ∈ B • x 7→ y}

Domain co-restriction (over A) is a relation that has all the pairs of R, which firstmembers are not elements of A. (Range co-restriction is defined in an analogous way.)

A� R = {x ∈ X ; y ∈ Y | x 7→ y ∈ R ∧ x /∈ A • x 7→ y}

The inverse of R is a relation R∼ : Y ↔ X that is defined from R

R∼ = {x ∈ X ; y ∈ Y | x 7→ y ∈ R • y 7→ x}

The identity relation is id X = {x : X • x 7→ x}The relational composition of R : X ↔ Y and S : Y ↔ Z is

R � S = {x ∈ X ; y ∈ Y ; z ∈ Z | x 7→ y ∈ R ∧ y 7→ z ∈ S • x 7→ z}

The backward composition — With the above definitions of R and S : S ◦ R = R � SOverriding — If R, S : X#Y then R⊕S is a relation that agrees with R everywhereoutside the domain of S , but agrees with S where S is defined:

R ⊕ S = (dom S � R) ∪ S

Relational image — If R : X # Y and A ⊂ X , then R�A� = ran(A � R) is therelational image of A through R.

Functions — A partial function from X , called source, to Y , the target, is a relation thatmaps elements of X to at most one element of Y . The set of all partial functionsfrom X to Y is 2

X � Y == {f : X # Y | ∀ x ∈ X ; y1, y2 ∈ Y •x 7→ y1 ∈ f ∧ x 7→ y2 ∈ f ⇒ y1 = y2}

2see Abbreviations in next subsection

12

A total function is a partial function in which each element of the source is mappedto exactly one element of the target. The set of all total functions from X to Y is

X " Y == {f : X � Y | dom f = X }If x 7→ y ∈ f then y is denoted as f (x ). All operations defined on relations arealso valid on functions (as they are relations with some specific properties). Someproperties of functions are:

Injections — If each element of the domain is mapped to a different elementof the target, then the function is said to be injective. There are partialinjective functions, which symbol is�, and total injective functions, whichsymbol is �.

Surjections — If the range of the function is the whole target, the functionis said to be surjective. Partial surjections have the symbol �, totalsurjections have the symbol �.

Bijections — A function that is both surjective and injective is calledbijective. The symbol for bijection is �.

Finite functions are defined as

X � Y == {f : X � Y | dom f ∈ �X }

Sequences — A sequence is an ordered collection of objects (it can be empty), e.g.,〈a, c, f , d , c〉 is a sequence. If X is a set, then the set of all finite sequences of objectsfrom X is defined by

seq X == {s : � X | ∃ n ∈ • dom s = 1 . . n}where 1 . . n = {i ∈ | 1 ≤ i ≤ n}.Some operations on sequences are:

Concatenation — If s , t ∈ seq X , then s � t ∈ seq X denotes the concate-nation of s and t , i.e., adds t at the end of s .

Head and Tail — if s ∈ seq1 X , then head s is the first element of s , andtail s is the remaining part.

Length — If s ∈ seq X , then #s denotes the length of s , that is the largestinteger for which the associated partial function is defined.

Bags — A bag is an unordered collection of objects in which the multiplicities are im-portant, e.g., �a, c, f , d , c� is a bag, which is the same as �a, c, c, d , f �. If X is a set,then the set of all bags with elements from X is defined as

bag X == X � 1 ( 1 = \ {0})The number of times an element x : X appears in a bag B : bag X is count B x (B ]xis another notation). The fact that an element x : X is in the bag B is denoted byx � B , and the following equivalences are true:

∀ x : X ; B : bag X • x � B ⇔ x ∈ dom B and x � B ⇔ B ] x > 0

13

Definitions

To illustrate definitions, for which syntax will be given here, we will use examples from theBirthday Book. (The Birthday Book is a system that records people’s birthdays, allowingus to add new people and their birthdays, and ask for birthdays of people that are alreadyin the book.) A definition can be one of the following:

A declaration that introduces a new type or new variable of an existing or previouslydeclared type. We will need a type NAME for names of persons, and a type DATEfor their birthdays in our Birthday Book. To declare them we write:

[NAME ,DATE ]

To declare a variable that holds the names of persons with birthdays recorded (thusknown is a set of names), we state

known : �NAME

An abbreviation introduces a new name for some expression. It is of the form

symbol == expression e.g.,Month == {Jan,Feb,Mar ,Apr ,May , Jun, Jul ,Aug , Sep,Oct ,Nov ,Dec}

or of a generic form, which introduces a family of symbols:

symbol parameter == expression

e.g., the empty set � is a subset of every set. In a set X it is defined as

�[X ] == {x : X | false}

Because the predicate part on the right side is always false, there is no element x : Xthat can satisfy it.

An axiom gives an everlasting property (truth). It is of the form:

x : X

P(x )

or of a generic form, based on a type (or some types), e.g., the definition of projectionfunctions for ordered pairs:

[X ,Y ]first : X × Y " Xsecond : X × Y " X

∀ x : X ; y : Y • first(x , y) = x ∧ second(x , y) = y

14

A free type In our example of Birthday Book, we will look for the birthday of a person,and it can be that the person is not yet in our book. For this special case (andprobably others) it is good to have a variable result that tells us what is the situation.The type of this variable can be a free type, REPORT , for which the declaration is:

REPORT ::= ok | already known | not known

If we declare then the variable result : REPORT , that variable can assume just threevalues: ok , already known, not known

The notation for free type definitions adds nothing to the power of Z language, butit makes it easier to describe recursive structures such as lists and trees [22].

Schemas , of which the definition has a name, a declaration part, and a predicate part. Aschema is used to describe state information, operations on states, and initialisation.To describe the state space of our system, Birthday Book, we write the schema:

BirthdayBookknown : �NAMEbirthday : NAME � DATE

known = dom birthday

Here birthday is a function which, when applied to a certain name (that is an elementof the set known), gives the birthday associated with that name.

The declaration of a schema can also look like (called a horizontal schema):

Schema Name = [declaration | predicate]

A schema can also be defined in terms of its signature and its property. The signatureintroduces the schema variables together with their types. A declaration is a signa-ture together with implicit predicates. The property constrains the variables anddescribes the relationship between them. The property includes the explicit predi-cate ‘below the line’, and any implicit predicates concealed within the types used inthe declaration. The schema property is also called the schema invariant.The relationship known = dom birthday in the BirthdayBook schema is an invariantof the system.

A schema may be used wherever a declaration is expected. It may be used as apredicate, and also it may be used as a type, in the same way as mathematical types.A schema S represents a set, the set of all its bindings. A binding is an assignmentof values to a schema’s components such that they obey its predicate. If a variable isdeclared with a schema type s : S , then the variable’s value is one of these bindings.This is exactly the same as saying if a variable is declared x : , then x has the valueof one of the members of . A schema binding is denoted by θS , S being the schemaname [3].

15

Schemas may be composed, using schema calculus (schema operators), to form specificationof new states and operations. This has two benefits:

• the operations can be broken into small parts that are more easily understood,

• extensive reuse of schemas becomes possible.

Schema Operators are decoration (which symbol is ′), conjunction (∧), disjunction (∨),negation (¬ ), quantification (∀ - for all, ∃ - exists, ∃1 - exists only one), hiding (\),composition (�), and precondition (the symbol is ‘pre’).We use decoration to indicate the after variables of an operation, where the before variablesare undecorated [3] e.g., known ′ is the state of known after adding another person in theBirthday Book (that results in adding another name to the set known). Using decorationwe can define some common schemas, the Delta (∆) schema and Xi (Ξ) schema that wewill use in our example later.

∆BirthdayBook = BirthdayBook ∧ BirthdayBook ′

The ∆ is part of the name of the schema and is used to indicate change of state [3].

ΞBirthdayBook∆BirthdayBook

θBirthdayBook = θBirthdayBook ′

The Ξ is part of the schema name and is used to indicate no change of state [3], i.e., thevalues of BirthdayBook ’s components are the same before and after the operation.Examples of conjunction, disjunction, and quantification will be given later.To show how hiding works, let us take the schema S

S = [x : X ; y : Y | P ]

S \ (x ) (read S hiding x ) is the schema:

S \ (x ) = [y : Y | ∃ x : X • P ]

The operation pre is applied to operation schemas. The precondition of an operation is aschema that characterises the collection of ‘before’ states for which some ‘after’ states canbe shown to exist [9]. The precondition of an operation can be calculated. The purposeof the precondition calculation is to check that the operation is valid. There must be atleast one before state in which the operation is applicable. If there is an inconsistency inthe definition, the precondition is false, and hence there are no appropriate before states.So, in a precondition calculation the aim is to determine what must be true of the beforestate and the operation inputs to achieve a satisfactory outcome. This is done by hidingthe outputs and after variables [3].

pre Operation = ∃ State ′ • Operation \ outputs

16

Refinement

The meaning of refinement is ‘to make more concrete’. Refinement is the process ofturning a more abstract specification into a more concrete one [3]. Typically refinement isperformed in a number of steps, in which we move gradually towards the concrete designof the system, proving in each step that the lower level (the more concrete one) is a correctrepresentation of the higher level (the more abstract one), from which it is derived. Whatwe get at the end of the refinement process is a system design that is closer to the levelof the programming language. Efficiency issues, the space/time trade-off, should be takeninto consideration in the system design.Refinement has two parts, data refinement and function (operation) refinement.

Data Refinement [3] In data refinement, the abstract data type in the abstract spec-ification is related to the concrete data type in the concrete specification (design).This is done by means of a retrieve schema that shows the relationship between theabstract and the concrete state items in logical terms. This relationship allows us toretrieve the abstract state from the concrete one.

Function Refinement [3] In moving toward a more concrete description of the system,the operations change from describing the ‘what’ of the operation to the ‘how’. Tobe able to demonstrate the refinement, we have to show the following:

• Correct initial concrete state. Each possible initial concrete state (subscriptc) must represent a possible initial abstract state (subscript a). The concreteversion should not allow starting points that the abstract specification forbids.

∀ State ′c • Initc ⇒ ∃ State ′a • Inita ∧ Retrieve

• Correct operation refinement. Whenever the abstract operation terminates, soshould the concrete operation. In other words, if we are in a state in which theabstract operation is guaranteed to terminate and we apply the retrieve relation,we will be in a state in which the concrete operation is guaranteed to terminate.

∀ Statea ; Statec • pre Opa ∧ Retrieve ⇒ pre Opc

• Correct concrete operation. If the abstract operation terminates, then so shouldthe concrete one and the state in which the concrete operation terminates shouldrepresent a possible abstract state in which the abstract operation could termi-nate. In other words, we can either start in the precondition of the abstractoperation, perform the retrieve operation to reach the concrete state and thenperform the concrete operation, or we can first perform the abstract operationand then apply the retrieve relation.

∀ Statea ; Statec; State ′c • preOpa ∧ Retrieve ∧ Opc ⇒∃ State ′a • Opa ∧ Retrieve ′

17

2.2 Conceptual models for GIS

What should be done for the application of formal methods to GIS are: Choice of an on-tology, Choice of paradigm of formalisation, and Choice of formal languages and reasoningtechniques [4].Finite and discrete representations of infinite and continuous domains are achieved bymeans of abstraction and discretisation. One way of representing infinite and continuousdomains of individuals is to represent classes of individuals rather than individuals. Indi-viduals in the same class are considered to be equivalent. Equivalence classes of individualspartition the domain of individuals. The difficult problem is to define the classes of equiv-alence in such a way that the structural properties of the original domain of individuals arepreserved. This can be achieved if structures that govern the domain of individuals areused to define the classes of equivalent individuals [4].Such kind of structures can be achieved by :The geometric paradigm of formalisation which define the equivalence based on proper-ties and relations that remain invariant under particular classes of transformations; Theanalytic paradigm of formalisation defines the equivalence based on relation of order withrespect to a frame of reference; The qualitative paradigm of formalisation is based on land-mark values (individuals which represent significant changes) and order relations betweenthem which qualitatively structure the domain. Often more than one paradigm of for-malisation is used depending on the structural properties of the class of intended model[4].Until the beginning of the last century, Euclidean geometry was the only form of geometry.Then hyperbolic, or Lobachevskian geometry, and elliptic, or Riemannian geometry wereconstructed. As more than one geometric theory was designed, the problem became thedetermination of what made a theory of space a geometry. What is the essence of geome-try? Geometry abstracts from particular location and considers properties of geometricfigures that are independent of particular location, i.e., invariant under a certain groupof transformations. Geometric figures are equivalence classes of geometric individuals orconfigurations of geometric individuals that can be made to coincide by transformationsbelonging to such a group (e.g., triangles, configurations of triangles, polygons, configura-tions of polygons). Transformations are operations on geometric individuals that changecertain properties and leave others invariant [5]. For example, Euclidean geometry dealswith properties such as the length of a line, the sizes of angles, etc., all of which remainunchanged under the transformations of rotations and translations. This definition of ge-ometry also includes areas of mathematics such as graph theory and topology, which havea geometric component. Typically each group of transformations defines a set of propertiesthat remain invariant and thus creates a geometry that can be formally defined and studied[16].Location is a relation between spatial individuals and a frame of reference. Locationwithin a frame of reference allows abstracting from individuals using equivalence classeswith respect to identity of location. Individuals that have the same location in a particularframe of reference, form an equivalence class. An important issue is to evaluate which

18

geometric properties of the domain of individuals are preserved in the domain of equivalenceclasses of individuals with respect to location [5].Geometry can be represented in terms of locations in Rn . Analytical geometry is a math-ematical theory of geometric properties of sets of points based on finite representations oftheir locations in a frame of reference given by a system of n axes of directed real lines andthe corresponding coordinate space Rn [5].In [15], spatial objects are represented by inequalities, and the paper argues about advan-tages and disadvantages of linear constraints representation compared to a pointer-basedvector representation in terms of storage and processing time of operations.In [14], [20], [21], [23], [24] space is modelled as a subset of R2 and the objects defined onthat space are zero-, one- and two-dimensional. They consider different levels of granularity(different scales) for the objects in a single conceptual schema element, i.e., a land parcelcan be seen as a point or a region, dependent on the current scale of the application. So,a land parcel object can be a point or region, zero or two dimensional object.Spatial objects have a position in space that is made up of four components: Shape is oneof the four components, which fully and non redundantly define position: the others aresize, orientation and centroid [20].Spatial objects have descriptive attributes and spatial attributes. Spatial attributes areproperties of the embedding space that indirectly become properties of the spatial objectsvia their position in space, i.e., the spatial objects inherit them from space. The spatialattributes of objects may be captured independently of the objects using so-called fields(called also layers). Layers are one of two types: those that are continuous functions, e.g.,“temperature”, or “erosion”, and those that are discrete functions, e.g., “county division”represented as regions [24].It is frequently necessary to capture the position of spatial objects in the database. Thefirst step to support this is to provide means for representing the space in which the objectsare embedded. The next is to provide means for indicating that the objects’ position inthis space is to be captured. For this purpose the following special entity and relationshipsets are introduced.

• The special entity sets SPACE, GEOMETRY, POINT, LINE, REGION. Entity setGEOMETRY captures the geometrical position of the entity set and can be POINT,LINE, REGION, or any other geometric type (or geometry).

• The special relationship set “is located at” that associates a spatial entity set withits geometry. The cardinality of this set is 1:M because a spatial entity may havemore than one geometry when multiple granularities are employed. The relationshipset “belongs to” between GEOMETRY and SPACE with cardinality M:1 is alsoincluded.

The spatial attributes of entities are calculated via the relationship “belongs to” [24].MADS, as described in [18] and [19], is a conceptual model for spatial data. MADS offers aset of spatial abstract types, organised in a generalisation hierarchy. This hierarchy can bechanged according to the needs of the application, by creating new subtypes, or grouping

19

some existing types into a new supertype. With every spatial type are associated somemethods that permit to define and manipulate instances of that type. The most generaltype is Geo; its subtypes are Simple Geo and Complex Geo. Subtypes of Simple Geo arePoint, Line, Simple Area. The type Line is a subtype of Oriented Line type. Subtypes ofComplex Geo are Point Set, Line Set, and Complex Area. Line Set is a subtype of OrientedLine Set. If an object type has type Geo, the precise type of every instance will be definedat the moment of its creation.The spatiality of an object is described by a predefined attribute, geometry that is agrouping of shape (e.g., Point, Line, Area, or Simple Geo) and location that can be givenin absolute coordinates or relative to other known locations. The domain of values ofgeometry is one of the spatial abstract types given above. An element can be described asa spatial object or a spatial attribute, depending on the application. A spatial attributeis a simple attribute, single-valued or multivalued, derived or not, whose domain is aspatial abstract type. Spatial integrity constraints may be associated with spatial entitiesor spatial attributes.A spatial relationship may be of different types, e.g., topological, orientation, metrical,or spatial aggregation. Spatial relationships can be deduced from the spatiality of objects,thus, these relationships implicitly exist and are accessible through GIS functions. But,in MADS it is possible to define them explicitly, giving the possibility to attach to themattributes or methods, or to give them a special semantics, complementing the semanticsthey have from GIS functions. MADS provides topological relationships and spatial aggre-gation. Any other spatial relationship type may be explicitly declared with the methodsattached to the spatial abstract types. The predefined relationship types in MADS are:disjunction, adjacency, crossing, overlapping, inclusion, and equality.Aggregation can be spatial or thematic. It is a binary link directed from the compositeto the component object. An object type composed of several object types is representedusing several aggregation links, one for each component type. It is common that someattributes of the composite and component objects are related. These dependencies arerepresented either by derived attributes, or by integrity constraints.Generalisation links may relate to spatial or non-spatial object types. Inheritance canbe adjusted by using either refinement or redefinition. Refinement is useful each time aproperty of an object (thematic attribute or geometry) has a smaller domain in a subtypethan in a supertype. Redefining an inherited attribute aims at a different objective. Redef-inition creates a new attribute in the subtype, with the same name. Thus, redefinition ofgeometry makes it possible to associate different geometries to the same object. This canbe used for multiple representations. Multiple inheritance is another kind of generalisationlink that solves the problem of sharing the same object by several object types [18].Spatial partitions are discussed in [12]. Partitions are a central spatial concept to organiseour perception of understanding the space. They enable us to consider the attributes ofsingle points (space-based view), they also provide access to collections of points havingequal attributes (object-based view). Thus, the model closes the gap between these twoviews of the space.In set theory, a partition is a complete decomposition of a set S into non-empty, disjoint

20

subsets {Si | i ∈ I }, called blocks. A partition can be seen as a total surjective functionπ : S � I . A spatial partition can be defined as a set-theoretic partition of the plane, oras a function π : R2� I .From an application point of view, different blocks of a spatial partition are marked (la-belled). Thus, a partition model should consider point sets together with the associatedvalues. The set of values that are used for labelling in a specific partition, define in a waythe type of partition. The spatial partition of a type A are functions of type π : R2� A,where A in contrast to I has some semantics. To ensure that π is a total function, an as-sumption is that every label type A contains an element ⊥A (called undefined or unknown),and the outside area of a partition is labelled by ⊥A.Blocks of a spatial partition are called regions. Regions that actually appear in applicationare regular, without cuts or punctures, and without isolated points or lines. So, the interiorof regions of a partition is required to be a regular open set. Since points in the boundarycannot be assigned uniquely to either adjacent regions, they are mapped to the set of valuesgiven by the labels of all their adjacent regions.The definition of spatial partitions is given in two steps. First, a spatial mapping of typeA is defined as a total mapping π : R2 " A ∪ �A. The range of a spatial mapping π isthe set of labels actually used by π and is denoted by range(π). The blocks of a spatialmapping π are maximal point sets that are mapped to the same value. (If f : X "A then∀ a ∈ A • f −1(a) = {x ∈ X | f (x ) = a}. When f −1 is applied to a set it yields a set ofsets.) The common label of a block b of π is denoted by π[b], that is π(b) = {l} ⇒ π[b] = l .The cardinality of block labels identifies different parts of a partition: the interior and theboundary. A region of π is any block of π that is mapped to a single element of A, and aborder of π is given by a block that is mapped to a set of A-values. The interior of π isthe union of all its regions, and the boundary of π is the union of all its border blocks. Letπ be a spatial mapping of type A, then ρ(π) := π−1(range(π) ∩ A) are the regions of thepartition, and β(π) := π−1(range(π) ∩ �A) are the borders.Then a spatial partition is defined by topologically constraining regions to regular pointsets and by semantically constraining boundary labels to those of adjacent regions. Thus,a spatial partition of type A is a spatial mapping π of type A, such that

∀ r ∈ ρ(π) : r is a regular point set, and∀ b ∈ β(π) : π[b] = {π[r ] | r ∈ ρ(π) ∧ b ⊆ r}

The set of all spatial partitions of type A is denoted by [A], and it is [A] ⊆ R2"A∪�A.The partition boundary can be seen as an undirected planar graph. From this point ofview, using the cardinality of border labels, they can be discriminated further: an edgeblock is mapped to a two element subset of A and defines a border curve between tworegions. A vertex block is mapped to a subset of A with three or more elements; a vertexblock is a singleton point set and describes location where more than two regions of apartition meet [12].Three basic operators on partitions: Intersection, Relabel, Refine, are formally defined in[12] and it is shown that operations that arise generally in spatial analysis, cartography,

21

etc. such as overlay, reclassify, merge, cover, clipping, can be realised by these three basicoperators.Spatial types, together with some Base, Time, Temporal and Range types are introducedin [13]. The spatial types, used in the design of wider spatio-temporal types, are point,points as finite sets of points, line as finite sets of curves that intersect each other onlyat their ends, region as regular point sets. A curve is given as the range of a continuousfunction from a closed interval in R to R2, it is not self-intersecting, but can be looped,and a condition for the uniqueness of representation of the curve by the function is given.In the geometry object model of [7] a GeometryCollection is an object type that is acollection of one or more geometric types. Subtypes of it are Point , MultiPoint as the typeof finite collections of points; Curve is given as the homeomorphic image of a closed realinterval, and it can be simple – the line is not self-intersecting – and looped – the beginand end of line are the same point (but the homeomorphism excludes the looped curve);LineString is a curve obtained by linear interpolation between its representing points, Lineas line strings with exactly two points, LineRing as simple looped curves; a MultiCurveis a collection of curves and a subtype of it is MultiLineString ; a Surface can be three-dimensional as well as planar surface; A Polygon is a planar surface that is a regular closedset, its interior is a connected set, its frontier consists of a set of linear rings; MultiSurfaceis a collection of surfaces, and Multipolygon is a collection of polygons that can intersecteach other only in a finite set of points from their frontiers.

2.3 Data Modelling

The IFO model described in [1] is a formal semantic database model, which primary fo-cus is on the structural component of the data model (the other components being datamanipulation and the integrity specification component).Four fundamental principles of semantic database modelling are identified. The most ba-sic is that data about objects and relationships between them should be modelled in adirect manner. As first introduced by the Entity-Relationship model, such “object-based”modelling allows database designers and users to think in terms of objects without theindirection resulting from the symbolic identifiers necessitated by records and pointers. Ashighlighted in the Functional Data Model (FDM), a second basic perception of seman-tic modelling is that many (if not most) of the relationships recorded in a database arefunctional in nature. A third basic perception is the significance of the so-called “ISA”relationships, which specify the fact that one set of objects must be a subset of anotherset of objects. The final perception of semantic data modelling is to provide a hierarchi-cal mechanism for building object types out of other object types, like aggregation andgrouping.The IFO model is a mathematically defined database model that incorporates the fourprinciples within a coherent, graph-based representational framework. The presentationof the model is done in four steps. First types are introduced that model the structure ofobjects arising in database applications. Second, fragments are built from types, and are

22

used to represent functional relationships in the IFO model. Third, it is described howISA relationships between various objects of the schema are incorporated. Finally, all thepieces are put together to form IFO schemas.Types can be atomic or nonatomic and they are defined as tree structures. There arethree kind of atomic types: printable types are predefined types like STRING, INTEGER,BOOLEAN, PICTURE, etc.; abstract types correspond to objects in the world that haveno underlying structure, relative to the point of view of the application, e.g., the typePERSON is a typical one; free types correspond intuitively to entities obtained via an ISArelationship, e.g., STUDENTs are a subclass of PERSONs, and STUDENT is a free type.Nonatomic types are built from the previous ones using two mechanisms: “collection” orgrouping that are finite set of objects, which are represented by star-vertices (⊕⊗-vertex) inthe IFO schema. If m ≥ 0 and O1, . . . ,Om are objects, then {O1, . . . ,Om} is a set object ;“aggregation” or composition is a Cartesian product and is represented by cross-vertices(⊗-vertex). If n > 0 and O1, . . . ,On are objects, then [O1, . . . ,On ] is a tuple object. Thetwo constructs corresponding to star and cross vertices can be applied recursively in anyorder.Fragments are used to represent functional relationships, but differently from the FDMin the IFO model a distinction is made between vertices serving the role of domain andthose serving the role of range. Another difference is that nested functions can be modelledin the IFO model (but not in FDM).An ISA Relationship from a type SUB to a type SUPER indicates that each object asso-ciated with SUB is associated with the type SUPER, and functions of SUPER are inheritedby SUB. Two types of ISA relationships are distinguished in an IFO model: specialisationand generalisation. Specialisation can be used to define possible roles from members of agiven type, e.g., subtypes EMPLOYEE and STUDENT are specialisations of PERSON.An object may change such roles without changing its underlying or fundamental identity.A generalisation represents situations where distinct, pre-existing types are combined toform new virtual types, e.g., types CAR and MOTOR-BOAT can be combined to formVEHICLE. In such situations it is not allowed for an object of one subtype to migrate intoanother subtype. Also, it is common to require that a generalised supertype be covered byits subtypes.Two simple constraints on ISA relationships can be incorporated into IFO schemas: sub-types forming a generalised type can be forced to be disjoint, and specialisations of asupertype can be asked to cover the supertype.IFO Schemas can be built in a top-down design fashion, beginning with the specificationof the major object types arising in the application environment, then specifying subsidiaryobject types, either constructed or defined as sub- or supertypes, and finally specifying thefunctions of all object types of the schema. Five rules are put on ISA relationships of anIFO schema.To characterise very precisely the types occurring locally in a schema instance the conceptof “derived type” is introduced. The derived types are a family of tree structures that startwith basic types (printable and abstract), and permit the application of three constructs ⊕⊗,⊗ and ⊕, which represent aggregation, grouping, and generalisation. The type T formed

23

from ⊕ of two types T1 and T2 will have a domain equal to the union of domains of T1

and T2. An order relation and an equivalence relation are defined on the derived types.The paper also focuses on the semantics of updates in the IFO model. In particular, itallows to carefully examine the different ways that a modification of the data associatedwith one part of a database schema can affect data associated with other parts of theschema. A fundamental observation was that local update semantics can be specifiedseparately for each construct of the model, and then combined in a natural manner toform a well-defined global semantics [1].The spatial data model described in [11] is an integration of functional data modellingconcepts with order-sorted algebras. The novelties of this model are the modelling andquerying of networks and heterogeneous collections of spatial objects. Graphs are used formodelling networks, but they are also presented as a modelling tool to describe relationshipsbetween objects, thus making available all the efficient graph algorithms in data queries.They use a multilevel order-sorted algebra to put all the concepts they introduce anddescribe in a common formalism. The classical relational algebra is a one-sorted algebra,its domain being a set of relations, having a collection of operations (functions like join,project) defined on this domain. A many-sorted algebra provides for a well-structured typesystem and integration of arithmetic or aggregate functions, and generally ADT functions.Order-sorted algebra allows for subtype hierarchies and inheritance, by means of a partialorder on algebra sets (sort carriers), which implies that functions defined on one set can beapplied to elements of a subset. The model describes complex object types that are builtfrom simpler ones by means of type constructors that can not be described by means ofa one-level algebra. The model supports polymorphic functions, and this polymorphismcan not be modelled by only one level of algebra. To have the parametric polymorphismit is needed to have sets of types in which to define the (polymorphic) function, and thisis realised by kinds that are simply sets of types (sorts of the first level). The first levelalgebra describes basic types that are the sorts of this algebra, and operations on sorts.The second level algebra describes kinds, which carriers are sets of sorts of the first levelalgebra. Operations of the second-level algebra are the type constructors, their associatedfunctions are mappings between kinds, i.e., they map one or more sorts of one kind to a sortof another kind. This second level is called kind algebra. The types gained from applyingtype constructors of the second level, are then used at the first level. A third-level algebrais mentioned (called class algebra) to introduce some operations on complex (structured)types.The data model is developed in three steps, by introducing first the data types, then theobject types, and finally the structures.Data types are a collection of standard data types — BOOL, STR, INT , REAL — aswell as the geometric types — POINT , LINE , REG — and some other types shown infigure 2.1.a that gives the sort hierarchy .Standard and geometric types are the basic data types (sorts), and are the leaves of thetree in the sort hierarchy. The notation 〈·〉 is used to refer to the domain of a type (carrierset of the sort), e.g., 〈INT 〉 denotes the carrier set of sort INT . The carrier of the internalnodes is defined to be the union of the carriers of children, e.g., 〈NUM 〉 = 〈INT 〉∪〈REAL〉.

24

This meets the subtype constraint of order-sorted algebra.The set of kinds S = {NUM,ORD,GEO,DATA} is introduced at the second levelalgebra. A kind hierarchy is also defined for data types that is given in figure 2.1.b as partof the complete type system hierarchy. The notation 〈〈·〉〉 is used to denote the carrier ofkinds. The carriers of kinds contain just the sorts that are descendent in the sort hierarchy,e.g., 〈〈GEO〉〉 = {GEO ,POINT ,EXT ,LINE ,REG}.The introduction of kinds makes the definition of polymorphic functions possible, e.g., thespatial operation ‘inside’ can be defined as inside : GEO × REG " BOOL where GEOcan be any spatial type POINT , LINE , or REG .The type constructor ⊕ on data types with the signature

⊕ : DATA×DATA"DATA

is defined to return for any two sorts in 〈〈DATA〉〉 their smallest common supersort, e.g.,LINE ⊕ REG = EXT , and POINT ⊕ LINE = GEO .Object types are totally dependent on a specific application. Sorts represent objectclasses and operations represent functions applicable to objects. The object sorts modelthe object stored in the database, and they all make up a kind BASEOBJ. For each sortin BASEOBJ, the carrier is in principle a set of object identifiers.Two type constructors ⊕, union, and ⊗, aggregation, are used to build other object types(objects of this types are called “potential objects” and they are not stored in the database).The sort s⊕ t resulting from union operation will represent a “collection of objects” whichare of type s or t . In the object sort hierarchy, the constructed sort is a supersort of theoperands to the constructor. The ⊗ operation allows us to build a sort for “aggregationobjects”. The sort constructed from aggregation operation is a subsort of the operands tothe constructor.Structured types: The two fundamental structures available in this model are sequencesand graphs, that are introduced through constructors seq and graph, respectively. The(second level) signature of the seq constructor, used to build sorts of kind SEQ isseq : ANY" SEQ, and the set of sorts obtained by applying the seq constructor to thesorts in 〈〈ANY〉〉 are (the carrier of kind SEQ) :

〈〈SEQ〉〉 = {seq(BOOL), seq(INT ), . . . seq(POINT ), . . .}

A database could be modelled as an object hierarchy, together with functions applicable toobjects, describing attributes and relationships. The model introduces graphs as anothermodelling tool, which means that the user can define some part of the database explicitlyas a graph structure.Any three object sorts s , t , u of kind BASEOBJ can be selected, and applying the graphconstructor on them, a type graph(s , t , u) can be defined:

∀ s , t , u ∈ 〈〈BASEOBJ〉〉 : 〈graph(s , t , u)〉 := {(N ,E ,XP , ε, π) |(i) N ⊆ 〈s〉,E ⊆ 〈t〉,XP ⊆ 〈u〉,(ii) ε : E � N × N (no two edges between the same nodes),(iii) π : XP " E ∗ its range contains only simple paths of the graph (N , ε(E )).}

25

The idea is that in a given graph of type graph(HLocation, Section,Highway) a HLocationobject is associated to each node, a Section object is associated with each edge, and foreach Highway in XP there is a path in this graph associated to it by π. It is assumedthat an object type used in such a graph definition is ‘devoted to’ this graph instance, i.e.,that every object in the object type used for nodes, is automatically a node of this graphinstance.The constructors node, edge, and xpath, are in fact selectors: they extract from a graphsort the sorts it was constructed from:

node(graph(s , t , u)) = s , edge(graph(s , t , u)) = t , xpath(graph(s , t , u)) = u.

These constructors map to the kind COMPOBJ (graph component object) whose elementcan be treated like any other object sort; it is therefore a subkind of BASEOBJ. Thelast constructor is path, the application of which restricts the carrier of a given graphtype. Hence, ∀G ∈ 〈〈GRAPH〉〉 : 〈path(G)〉 ⊆ 〈G〉, which allows to define a subsortrelationship in the sort hierarchy. In other words, any path can also be viewed as a graph,that means it inherits all operations defined in graphs [11].TM presented in [2] is a typed language with object-oriented features such as attributesand methods in the presence of subtyping, and FM is the formal theory in which it isbased. The paper introduces two important features in conceptual database modelling:predicative description of sets, and static constraints of different granularity (object level,class level, database level). The formal theory FM is based in the Cardelli type theory.TM allows to handle expressions that denote enumerated sets, and set expressions that areformed by set comprehension, and have the form {x : σ | φ(x )}, where φ(x ) is a booleanexpression and σ a type in TM. A set expression of this form is called predicative set.Types are basic types such as integer, real, etc., power types, and record types such as〈age : integer, name : string〉.The boolean expression of TM are: Constants - true and false; Logical formulas - ¬ (e),(e ⇒ e ′), (e ∧ e ′), (e ∨ e ′), (e ⇔ e ′), expression involving quantifiers ∀ and ∃, where eand e are boolean expressions, or expressions built up from arithmetical relations like ≤,> etc.; Special boolean expression - e � e ′ (e isa e ′), e = e ′, e ∈ e ′ (e in e ′), eεe ′ (e sin e ′),and e ⊂ e ′ (e subset e ′) where e and e ′ are TM expressions.Expression are: constants such as 1integer, 2.0real, or variables such as xinteger, or records,or projections such as 〈age = 3, name = “John′′〉 · name (that evaluates to “John”).The set of types is equipped with a subtyping relation ≤ that is a partial order in the setof types. The typing rules are extended such that e : σ, σ ≤ τ ⇒ e : τ . This is calledsubtyping. The subtyping relation introduces polymorphism in the language, in the sensethat expressions can have more than one type. There is a way of attaching a unique typeto a correctly typed expression, which is called minimal typing. It is said that e has aminimal type τ if e : τ ∧ (¬ ∃σ | e : σ ∧ σ ≤ τ). The symbology used to show that τ isthe minimal type of e is e :: τ .If σ is a type then � σ denotes the powertype of σ, which is the collection of all sets ofexpressions e such that e : σ. An expression e is called a set if it has a powertype as itstype, i.e e : � σ for some type σ.

26

The methodology adopted in the paper to describe the set of allowed states of a databaseuses three levels :

1. The object level, in which the object types of interest are described as well as, foreach object type, the set of allowed objects of that type. For a class C, at the objectlevel, C’ object type is denoted by γ. C Universe :: � γ denotes the set of allowedobjects of class C.

C Universe = {x : γ | φ(x )}

The predicate φ(x ) determines which objects of type γ are allowed objects.

2. The class extension level, in which the set of allowed class extension for each classis described. At the class extension level, C ClassUniverse :: �� γ denotes the setof allowed class extensions. (An element of C ClassUniverse is thus a possible classextension of class C).

C ClassUniverse = {X : � γ | X ⊆ C Universe ∧ φ′(X )}

The predicate φ′(X ) is used to state constraints on the class extension (for instance,more than ten objects should be in any extension of that class).

3. The database level, in which the set of allowed database states is described. Atthe database level, the DatabaseUniverse :: �〈C1 : � γ1, ...Cn : � γn〉 denotes thecollection of allowed database states.

DatabaseUniverse = {DB : 〈C1 : � γ1, ...Cn : � γn〉 |∧ni=1 DB · Ci ∈ Ci ClassUniverse ∧ Φ(DB)}

By the generalised conjunction, this definition first of all requires each class extensionin an allowed database state to be in an allowed class extension. Furthermore, it maypose additional requirements on the database state by means of Φ(DB) like referentialintegrity between distinct class extensions.

27

Figure 2.1: —(a) Sort hierarchy for data types, (b) Complete type system at kind level

28

Chapter 3

Spatial Elements

In this chapter, we will formally present spatial elements that can be used in a conceptualschema for spatial applications, and relationships between them. We will first introducegraphs and graph concepts that can be used in data modelling in general.1 This is donein section 3.1. We will use these concepts in section 3.2 to represent or define the spatialelements. Section 3.3 gives a short summary of all spatial elements and relationshipsbetween them.

3.1 Graph notions

A graph, G , is an ordered triple (V (G),E (G), ψG) consisting of a nonempty set V (G)of vertices, a set E (G), disjoint from V (G), of edges, and an incidence function ψG thatassociates with each edge of G an unordered pair of (not necessarily distinct) vertices ofG . If e is an edge and u and v are vertices such that ψG(e) = {u, v}, then e is said tojoin u and v ; the vertices u and v are called the ends of e. (An edge with distinct ends isa link and one with identical ends is a loop.) The ends of an edge are said to be incidentwith the edge and vice versa. Two vertices which are incident with a common edge areadjacent, as are two edges which are incident with a common vertex.Graphs built from two given sets, V and E , are:

[V ,E ]GRAPH : �V × �E × (E ��1 V )

∀G : GRAPH ,∀V ′ : �V ,∀E ′ : �E ,∀ψ : E ��1 V | G = (V ′,E ′, ψ) •domψ = E ′ ∧ ∀ e : E ′ • (∃ v1, v2 : V ′ • ψ(e) = {v1, v2})

Graphs can be represented graphically by a diagram: each vertex is indicated by a point,and each edge by a line joining the points which represent its ends.We can find the components of a given graph by:

1Definitions of graph concepts are taken from [6].

29

[V ,E ]vertices : GRAPH [V ,E ]"�Vedges : GRAPH [V ,E ]"�Eincf : GRAPH [V ,E ]" (E ��1 V )

∀G : GRAPH [V ,E ] •(vertices G = first G ∧ edges G = second G ∧ incf G = third G)

first , second and third are projection functions in a Cartesian product. We will use ashorter notation for vertices G , edges G , and incf G 2 :

VG == vertices G , EG == edges G , ψG == incf G ,

For every graph we can define its incidence matrix as

[V ,E ]inc : GRAPH [V ,E ]" (V × E � )

∀G : GRAPH [V ,E ] • dom inc G = VG × EG ∧

∀ v : VG ,∀ e : EG • inc G(v , e) =

0 if v /∈ ψG(e), otherwise1 if #ψG(e) = 22 if #ψG(e) = 1

The degree of a vertex v in G is the number of edges incident with v , each loop countingas two edges. Thus, for every vertex of a graph we can define its degree as

[V ,E ]degree : GRAPH [V ,E ]" (V � )

∀G : GRAPH [V ,E ] • dom degree G = VG ∧∀ v : VG • degree G(v) =

∑e∈EG

inc G(v , e)

The following statement is true : ∀G : GRAPH [V ,E ] • ∑v∈VGdegree G(v) = 2 ·#EG .

A walk in G is a finite non-empty sequence W = 〈v0, e1, v1, ...ek , vk〉, whose terms arealternately vertices and edges, such that, for 1 ≤ i ≤ k , the ends of ei are vi−1 and vi . Ifthe edges e1, e2, ..., ek of a walk are distinct, W is called a trail. If, in addition, the verticesv0, v1, ...vk are distinct, W is called a path (a (v0, vk) path). To give the definitions of walk,trail and path we need an alternate sequence which elements are from two sets, such thatall the elements in odd positions are from one set, and in even positions are elements fromthe other set.

altseq [X ,Y ] == {s : 1� X ∪ Y | ∃ n : • dom s = 1 . . (2n + 1) ∧(∀m : | 2m + 1 ≤ 2n + 1 • s(2m + 1) ∈ X ) ∧(∀m : 1 | 2m < 2n + 1 • s(2m) ∈ Y )}

2When there is no confusion about V and E we will not associate the function names with them, e.g.we will write incf instead of incf [V ,E ].

30

Now we can give the definition of walks, trails and paths in a graph G .

[V ,E ]walks : GRAPH [V ,E ]"� altseq [V ,E ]trails : GRAPH [V ,E ]"� altseq [V ,E ]paths : GRAPH [V ,E ]"� altseq [V ,E ]

∀G : GRAPH [V ,E ] •walks G = {s : altseq [VG ,EG ] | ∃ k : • #s = 2k + 1 ∧

∀ i : 1 . . k • ψG(s(2i)) = {s(2i − 1), s(2i + 1)}}trails G = {w : walks G | ∃ k : • #w = 2k + 1 ∧

∀ i , j : 1 . . k | i 6= j • w(2i) 6= w(2j )}paths G = {t : trails G | ∃ k : • #t = 2k + 1 ∧

∀ i , j ∈ 0 . . k | i 6= j • t(2i + 1) 6= t(2j + 1)}

A walk is closed if it has a positive length and its origin and terminus are the same. Aclosed trail whose origin and internal vertices are distinct is a cycle.

[V ,E ]cycles : GRAPH [V ,E ]"� altseq [V ,E ]

∀G : GRAPH [V ,E ] •cycles G = {t : trails G | ∃ k : 1 • #t = 2k + 1 ∧ t(1) = t(2k + 1) ∧

(∀ i , j : 1 . . k | i 6= j • t(2i + 1) 6= t(2j + 1)}

A graph H is a subgraph of G (written H ⊆ G) if VH ⊆ VG , EH ⊆ EG , and ψH is therestriction of ψG to EH . Subgraphs of a given graph G are defined by

[V ,E ]subgraphs : GRAPH [V ,E ]"�GRAPH [V ,E ]

∀G : GRAPH [V ,E ] • subgraphs G ={H : GRAPH [V ,E ] | VH ⊆ VG ∧ EH ⊆ EG ∧ ψH = EH � ψG}

Subgraphs of a graph whose vertices and edges are terms of a path are

[V ,E ]SPaths : GRAPH [V ,E ]"�GRAPH [V ,E ]

∀G : GRAPH [V ,E ] •SPaths G = {p : paths G ,H : subgraphs G | ∃ k : • #p = 2k + 1 ∧

VH = {i : 0 . . k • p(2i + 1)} ∧ EH = {i : 1 . . k • p(2i)} ∧(∀ i : 1 . . k • ψH (p(2i)) = {p(2i − 1), p(2i + 1)}) • H }

PATH [V ,E ] ==⋃

G∈GRAPH [V ,E ] SPaths G is a subtype of GRAPH [V ,E ].

31

Similarly we will define subgraphs of a graph whose vertices and edges are terms of a cycle,and we will call them SCycles .

[V ,E ]SCycles : GRAPH [V ,E ]"�GRAPH [V ,E ]

∀G : GRAPH [V ,E ] •SCycles G = {c : cycles G ,H : subgraphs G | ∃ k : 1 • #c = 2k + 1 ∧

VH = {i : 0 . . k • c(2i + 1)} ∧ EH = {i : 1 . . k • c(2i)} ∧(∀ i : 1 . . k • ψH (c(2i)) = {c(2i − 1), c(2i + 1)}) • H }

CYCLE [V ,E ] ==⋃

G∈GRAPH [V ,E ] SCycles G is another subtype of GRAPH [V ,E ].Suppose V ′ is a nonempty subset of vertices VG of a graph G . The subgraph of G whosevertex set is V ′ and whose edge set is the set of edges of G that have both ends in V ′ iscalled the subgraph of G induced by V ′, and it is denoted by G [V ′].

[V ,E ][ ] : GRAPH [V ,E ]× �V � GRAPH [V ,E ]

∀G : GRAPH [V ,E ],∀V ′ : �V | V ′ ⊂ VG •G [V ′] = (V ′, dom(ψG � �1 V ′), ψG � �1 V ′)

Two vertices u and v of G are said to be connected if there is a (u, v) path in G . Connectionis an equivalence relation on the vertex set VG of G . Thus there is a partition of VG intononempty subsets V1, V2, . . . Vn such that two vertices u and v are connected if and onlyif both u and v belong to the same set Vi . The subgraphs G [V1], G [V2], . . . G [Vn ] arecalled the components of G . If G has exactly one component, G is connected ; otherwiseG is disconnected. Thus, a graph G is connected if every two vertices of it are connected.The set of connected graphs CGRAPH [V ,E ] is

CGRAPH [V ,E ] == {G : GRAPH [V ,E ] | ∀ u, v : VG •(∃ p : paths G • p(1) = u ∧ p(#p) = v)}

Suppose E ′ is a nonempty subset of the edge set of a graph G . The subgraph of G obtainedby deleting the edges of E ′ is G E ′ :

[V ,E ] : GRAPH [V ,E ]× �E � GRAPH [V ,E ]

∀G : GRAPH [V ,E ],∀E ′ : �E | E ′ ⊂ EG • G E ′ = (VG ,EG \ E ′,EG \ E ′ � ψG)

A cut edge of G is an edge e ∈ EG such that the number of components of G {e} isgreater than the number of components of G . If the degree of every vertex v ∈ VG of agraph G is even, G has no cut vertices.

32

Figure 3.1: Graph hierarchy

An acyclic graph is one that contains no cycles. A tree is a connected acyclic graph.

TREE [V ,E ] == {G : CGRAPH [V ,E ] | cycles G = �}

The following statement is valid for trees: ∀G : TREE [V ,E ] • #EG = #VG − 1.A path is a tree which vertices have degree at most 2. There are exactly two vertices withdegree 1 in a path. A cycle is a connected graph which vertices have degree 2. Thus,PATH [V ,E ] ⊂ TREE [V ,E ] and CYCLE [V ,E ] ⊂ CGRAPH [V ,E ].Another property of graphs (orthogonal to connectivity) is planarity. A graph is said to beembeddable in the plane, or planar, if it can be drawn in the plane so that its edges intersectonly at their ends. Such a drawing of a planar graph G is called a planar embedding of G.A planar embedding G of G can itself be regarded as a graph; the vertex set of G is theset of points representing vertices of G, the edge set of G is the set of lines representingedges of G, and a vertex of G is incident with all the lines of G that contain it. A planarembedding of a planar graph is sometimes called a plane graph. A plane graph carries thegeometry of plane.We will give the definition of plane graphs, PLGRAPH , when we talk about one-dimensionalelements (because this definition needs topological concepts). Connectivity, defined forgraphs in general, is valid for plane graphs and all the other concepts derived from it, e.g.tree, path, cycle, given above, are also valid for plane graphs. Figure 3.1 gives the hierarchyof graph types introduced until now together with their analogue planar types. PGRAPHare planar graphs, PCGRAPH are the connected planar graphs, PCYCLE are the planarcycles, PTREE are the planar trees, and PPATH are the planar paths. (For readabilityreason we omit V and E from the notation of graph types in figure 3.1.)A directed graph (or digraph) D is an ordered triple (VD ,AD , ψD) consisting of a nonemptyset VD of vertices, a set AD , disjoint from VD , of arcs, and an incidence function ψD thatassociates with each arc of D , an ordered pair (not necessarily distinct) of vertices of D .If a is an arc of D and u, v are vertices of D such that ψD(a) = (u, v), then u is the tail

33

of a and v is its head. Digraphs on V , A are defined by

[V ,A]DIGRAPH : �V × �A× (A� V × V )

∀D : DIGRAPH ,∀V ′ : �V ,∀A′ : �A,∀ψ : A� V × V | D = (V ′,A′, ψ) •domψ = A′ ∧ ∀ a : A′ • (∃ v1, v2 : V ′ • ψ(a) = (v1, v2))

We can define the components of a digraph in the same way we defined the components ofgraph, and for a given digraph D we are denoting with VD the set of its vertices, AD theset of its arcs, and ψD its incidence function. The incidence matrix of a digraph is definedas

[V ,A]incd : DIGRAPH [V ,A]" (V × A��)

∀D : DIGRAPH [V ,A] • dom incd D = VD × AD ∧

∀ v : VD ,∀ a : AD • incd D(v , a) =

1 if v is tail of a−1 if v is head of a

0 otherwise

Indegree d−D (v) of a vertex v in a digraph D is the number of arcs with head v ; andoutdegree d+

D (v) of a vertex v in D is the number of arcs with tail v .

[V ,A]d− : DIGRAPH [V ,A]" (V � )d+ : DIGRAPH [V ,A]" (V � )

∀D : Digraph • dom d−D = VD ∧ dom d+D = VD ∧

∀ v : VD • d−D (v) = −∑a∈ADmin{0, incd D(v , a)} ∧

d+D (v) =

∑a∈AD

max{0, incd D(v , a)}

A directed walk in D is a finite non-null sequence W = 〈v0, a1, v1, ...ak , vk〉, whose termsare alternately vertices and arcs, such that for i ∈ 1 . . k the arc ai has head vi and tailvi−1. A directed trail is a directed walk that is a trail; directed paths, and directed cyclesare similarly defined.

34

[V ,A]diwalks : DIGRAPH [V ,A]"� altseq [V ,A]ditrails : DIGRAPH [V ,A]"� altseq [V ,A]dipaths : DIGRAPH [V ,A]"� altseq [V ,A]dicycles : DIGRAPH [V ,A]"� altseq [V ,A]

∀D : DIGRAPH [V ,A] •diwalks D = {s : altseq [VD ,AD ] | ∃ k : • #s = 2k + 1 ∧

∀ i : 1 . . k • ψD(s(2i)) = (s(2i − 1), s(2i + 1))}ditrails D = {w : diwalks D | ∃ k : • #w = 2k + 1 ∧

∀ i , j : 1 . . k | i 6= j • w(2i) 6= w(2j )}dipaths D = {t : ditrails D | ∃ k : • #t = 2k + 1 ∧

∀ i , j : 0 . . k | i 6= j • t(2i + 1) 6= t(2j + 1)}dicycles D = {t : ditrails D | ∃ k : 1 • #t = 2k + 1 ∧ t(1) = t(2k + 1) ∧

(∀ i , j : 1 . . k | i 6= j • t(2i + 1) 6= t(2j + 1)}

Using the directed paths we can define DITREE [V ,A], DIPATH [V ,A] and DICYCLE [V ,A]as subtypes of DIGRAPH [V ,A].

DITREE [V ,A] == {D : DIGRAPH [V ,A] | (∃ r : VD • d−D (r) = 0 ∧ ∀ v : VD \ {r} •d−D (v) = 1) ∧ (∃ p : dipaths D • p(1) = r ∧ p(#p) = v)}

DIPATH [V ,A] == {D : DITREE [V ,A] | ∀ v : VD • d+D (v) ≤ 1}

DICYCLE [V ,A] == {D : DIGRAPH [V ,A] | (∀ v : VD • d+D (v) = 1 ∧ d−D (v) = 1)

∧ (∀ u, v : VD • ∃ p : dipaths D • p(1) = u ∧ p(#p) = v})

Vertex r in the definition of a directed tree is the root of the ditree: the path connectingthe root and any vertex of the ditree is unique. The path connecting any two vertices of adicycle is also unique.With each digraph D , we can associate a graph G on the same vertex set; correspondingto each arc of D is an edge of G with the same ends. This graph is the underlying graphof D . (Conversely, given any graph G , we can obtain a digraph from G by specifying, foreach link, an order of its ends. Such a digraph is called an orientation of G .)We postulate a function atoe from the set of arcs A to the set of edges E that associateswith each arc of A an edge, which is the arc without the direction, e.g. if arcs representrelationships between some source and target elements, then atoe will associate with everyarc an edge that is the relationship between the source and target of that arc, withoutmaking a distinction between those two elements. We will use atoe to build the functionugraph which associates a graph with every digraph.

[V ,A,E ]atoe : A� Eugraph : DIGRAPH [V ,A]" GRAPH [V ,E ]

∀D : DIGRAPH [V ,A] • ugraph D =(VD , ran(AD � atoe), {a : AD • (atoe(a), {first ψD(a), second ψD(a)})})

35

If S and T are subsets of V , D a digraph on V ,A, we denote by (S ,T )D the set of arcsfrom AD that have their tails in S and their heads in T .

[V ,A]( , ) : �V × �V × DIGRAPH [V ,A]"�A

∀ S ,T : �V ,∀D : DIGRAPH [V ,A] • (S ,T )D = domψD � (S × T )

A network N is a digraph D (the underlying digraph of N ) with two distinguished subsetsof vertices, X and Y , and a non-negative integer-valued function c defined on its arc setAD ; the sets X and Y are assumed to be disjoint and nonempty. The vertices in X arethe sources of N and those of Y are the sinks of N . They correspond to productioncentres and markets respectively. Vertices which are neither sources nor sinks are calledintermediate vertices ; the set of such vertices will be denoted by I . The function c is thecapacity function.

[V ,A]NETWORK : �(DIGRAPH [V ,A]× �1 V × �1 V × (A� ))

∀N : NETWORK • N = (D ,X ,Y , c)⇒ (dom c = AD ∧X ∩ Y = � ∧ X ∪ Y ⊂ VD ∧ (VD \ X ,X )D = � ∧ (Y ,VD \ Y )D = �)

The last two predicates state that there are no arcs coming in X from out of X (X is thesource), and there are no arcs going from Y to vertices not in Y (Y is the sink). We willdenote

DN == first N ∧ VN == VDN∧ AN == ADN

∧ ψN == ψDN∧

XN == second N ∧ YN == third N ∧ IN == VN \ (XN ∪ YN ) ∧ cN == forth N

To define a flow in a network we will first need some notations. If f is a real-valuedfunction defined in the arc set AN of N , and if K ⊂ AN , we denote

∑a∈K f (a) by f (K ).

Furthermore, if S ⊂ VN and K is a set of arcs of the form (S ,VN \S )DN, we shall write f +

N

for f ((S ,VN \ S )DN), f −N for f ((VN \ S , S )DN

), f −(v) for f −({v}), and f +(v) for f +({v})[6].A flow in a network N is an integer-valued function fN defined on AN such that :

0 ≤ fN (a) ≤ c(a) for all a ∈ AN , and f +N (v) = f −N (v) for all v ∈ IN .

The value fN (a) of fN on an arc a can be likened to the rate at which material is transportedalong a under the flow fN . The upper bound c, called the capacity constraint imposes thatthe rate of flow along an arc cannot exceed the capacity of the arc. The other condition,called the conservation condition, requires that, for any intermediate vertex v , the rateat which the material is transported to v is equal to the rate at which the material istransported out of v .

36

To formally write a flow we have to formalise the notations given above.

[V ,A]f : A" Rf + : NETWORK [V ,A]" (V � R)f − : NETWORK [V ,A]" (V � R)

∀N : NETWORK [V ,A] • dom f +N = VN ∧ dom f −N = VN ∧

∀ v : VN • f +N (v) =

∑a∈({v},VN \{v})DN

f (a) ∧f −N (v) =

∑a∈(VN \{v},{v})DN

f (a)

We are using the notations f +N and f −N for f +(N ) and f −(N ), respectively. We will do the

same in the next definition of a flow in a network N ; we will write flowN for flow(N ).

[V ,A]flow : NETWORK [V ,A]" (A� )

∀N : NETWORK [V ,A] • dom flowN = AN ∧∀ a : AN • flowN (a) ≤ cN (a) ∧∀ v : IN • flow+

N (v) = flow−N (v)

3.2 Spatial Elements

The space we will deal with is the two-dimensional space R2. Together with the usualmetric, Euclidean distance, it forms a metric space (R2, ρ). The metric topology generatedby the usual metric in R2 is called the usual topology (let’s denote it τρ). (R2, τρ) is atopological space. Spatial elements that we will introduce are subsets of R2, and they aresubspaces of R2 with the relative topology derived from the usual topology. (Any othertopology would also be valid for all the definitions and reasoning given below.) 3

Properties of space can be described as functions from R2 to the domain of values of thisproperty. The property domain may contain values that are measurements belonging toone of the following types: nominal (the values are qualitative and not quantitative ones),ordinal, interval, or ratio [27]. Nominal values can be represented as enumerated types andthis makes the property domain to be a subset of real numbers, in all the cases. Generallywe deal only with subsets of space and to make the function a total function, we can extendthe property domain with an undefined value, ⊥. Then a property in space can be givenas a function f : R2" R ∪ {⊥}, and the property domain is ran f .A field is (generally) a continuous function from space to an interval of R extended with⊥, the range of the function is an uncountable set. To get a discrete representation of afield, we try to discretise the range of function values, following the idea that close pointsin space will have close function values. When ran f is a finite set, we can follow another

3Definitions of topology terms, taken from [26], are given in Appendix A.1

37

Figure 3.2: Extended Spatial ER Diagram

approach. A total function f : X " Y is completely defined if we can define for everyy ∈ ran(f ) the set f −1(y) ⊂ X . If the images by f −1 of ran f elements are (somehow)regular as e.g. linear features, or collection of points, or areas (and the rest of the spacehas the value undefined) then we have the object representation of space, and we can treatthem as functions from the property domain to a set of spatial objects. And what we needto do is defining what are these spatial objects, and how they can be represented. This isour concern in the coming sections.

We will give the description and the formalisation of the spatial elements using conceptsfrom topology and graph concepts introduced before. We will define zero-, one-, and two-dimensional primitives, and then with the help of the set constructor (called grouping insome papers, or the star-vertex in the IFO model) we will build new elements from theseprimitives, which are collections of zero-, one-, and two-dimensional primitives. Consider-ing the frequent use of some special collections of spatial object such as paths (e.g.,) forhighways, directed trees in hydrological models, spatial partitions in administrative divi-sion of a country, etc., we would need to have other elements for these collections. Wecan obtain such elements by setting constraints on the collections. Then by adding other

38

constraints in the newly created elements, we can get specialisations of these elements.Figure 3.2 gives a spatial ER diagram (as given in [27]) extended with the elements thatwill be introduced in this section. In the diagram, rectangles are used for the entity types,diamonds for relationship types, ellipsis for the attributes, cardinality constraints are givenas a pair of “min . .max” values that denote the participation constraint and the cardinalityratio, respectively (the notation is according to [8]), arrows show subtyping and ⊕⊗ is usedfor the set constructor. A brief summary of this diagram will be given at the end of thechapter.

The way the spatial elements will be presented here is: first the zero-dimensional primitiveand collections of them will be introduced in section 3.2.1; then, the one-dimensionalprimitive and the other one-dimensional elements built up using the set constructor andthe addition of constraints will be defined in section 3.2.2; the two-dimensional primitiveand two-dimensional elements (built up in the same way as the one-dimensional ones) willbe given in section 3.2.3. The relation between higher dimension primitives and lowerdimension ones will also be described.

3.2.1 Zero-dimensional elements

Zero-dimensional elements of R2 are Point and PointSet .

• A zero-dimensional primitive, point, is an element of R2. A point is a suitable repre-sentation for an object for which only the position, not the extent, is of interest [13].The set of all points is Point == R2.

• PointSet is the set of finite collection of points. Thus PointSet == �Point , that isPointSet = �R2.

3.2.2 One-dimensional elements

One-dimensional elements are the two analogues of the zero-dimensional elements, Lineand LineSet , and some other that will be defined by adding constraints to the elements ofLineSet . There are cases when we are interested in the direction of movement on a line,and other cases when we just want the set of points that make up a line. For this reason,we will make the distinction between a directed line and a line.

• A one-dimensional primitive, line, is a continuous non-self-intersecting curve. It canbe a looped curve (the begin and end of the curve are the same point). A line is(in most cases) an abstraction for ways of moving through space, or for connectionsthrough space (roads, rivers, electricity networks, gas lines, etc.) [13]. Figure 3.3.agives an example of a line, 3.3.b is a looped line and 3.3.c is not an accepted line.

To give the definitions of a directed line and a line we need to define intervals in R.

39

Figure 3.3: — (a) a line, (b) a looped line, (c) not a line.

[ , ] : R × R��R[ , [: R × R��R] , [: R × R��R

∀ p, q : R | p < q •[p, q ] = {r : R | p ≤ r ≤ q}[p, q [= {r : R | p ≤ r < q}]p, q [= {r : R | p < r < q}

We will give now the definitions of directed lines and lines using continuous functionson I , the closed unit interval in R.

C (I ,R2) == {f : [0, 1]" R2 | f is continuous}DLine == {f : C (I ,R2) | [0, 1[�f : [0, 1[�R2 ∧ f (1) /∈ f �]0, 1[�}

Line == {d : DLine • ran(d)}

A directed line is a function d from the unit interval in R to a point set l ⊂ R2.The point set l is a line (an element of Line). The function d lifts the order of theinterval [0, 1] to the subset l of R2. This order gives a direction of movement in theline l , which is the reason we call d a directed line. DLine is the set of directed lines.

The functions FNode and TNode define the begin and end node (respectively) of adirected line.

FNode : DLine " PointTNode : DLine " Point

∀ d : DLine • FNode(d) = d(0) ∧ TNode(d) = d(1)

For every line, there are only two functions (directed lines) which range is this line.

∀ l : Line,∀ d , d ′ : DLine | ran d = l ∧ ran d ′ = l •(∀λ : [0, 1] • d ′(λ) = d(1− λ) )

For directed lines d , d ′, which range is the line l , it is true that

d(0) = d ′(1) ∧ d(1) = d ′(0) that implies {d(0), d(1)} = {d ′(0), d ′(1)}

40

Figure 3.4: — (a) a collection of lines, (b) a tree, (c) a path, (d) a cycle.

Thus, we can define the set of nodes of a line l as

Node : Line "�1 Point

∀ l : Line,∀ d : DLine | l = ran(d) • Node(l) = {d(0), d(1)}

The boundary of a line l is the set Node(l).

• DLineSet is built from DLine in the same way PointSet is built from Point . Thus,DLineSet == �DLine is a finite collection of directed lines.

From DLineSet we can build LineSet of which the elements will be the ranges ofdirected lines that are components of DLineSet elements. First, we define a functionlines :

lines : DLineSet ��Line

∀ ds : DLineSet • lines(ds) = {d : ds • ran(d)}

then we build LineSet == lines�DLineSet�. So, LineSet = �Line. Figure 3.4.agives an example of a collection of lines.

For a collection of directed lines we can find all the begin and end nodes of itscomponents.

FNodes : DLineSet "�PointTNodes : DLineSet "�Point

∀ ds : DLineSet • FNodes(ds) =⋃

d∈ds{FNode(d)} ∧TNodes(ds) =

⋃d∈ds{TNode(d)}

We can define the set of nodes of a collection of lines as the union of (the set of)nodes of its line components.

Nodes : LineSet "�Point

∀ ls : LineSet • Nodes(ls) =⋃

l∈ls Node(l)

41

For any collection of directed lines and its corresponding collection of lines we have

∀ ls : LineSet ,∀ ds : DLineSet | ls = lines(ds) •Nodes(ls) = FNodes(ds) ∪ TNodes(ds)

We will define the other one-dimensional elements using graph specifications given in theprevious section. We will (first) give another representation of collections of lines as graphsbuilt on sets V = Point and E = Line, and we will use this representation to givethe specifications the subtypes of LineSet . Later we will give another representation forcollections of directed lines as directed graphs built on V = Point and A = DLine. Then,similarly with the LineSet , we will define subtypes of DLineSet .Graphs built on sets Point and Line are:

LGRAPH : GRAPH [Point ,Line]

∀G : LGRAPH • VG = Nodes(EG) ∧ ψG = EG � Node

Then with every ls : LineSet we can associate a graph such that its edges are the lines ofls , its vertices are the nodes of ls , and the incidence function associates with every line ofls the set of its nodes. The function graph builds this relation.

graph : LineSet " LGRAPH

∀ ls : LineSet • graph(ls) = (Nodes(ls), ls , ls � Node)

Using this representation of line collections we can define subtypes on them.

• Tree == {ls : LineSet | graph(ls) ∈ LGRAPH ∩ TREE [Point ,Line]}Figure 3.4.b gives an example of a tree.

• Path == {ls : LineSet | graph(ls) ∈ LGRAPH ∩ PATH [Point ,Line]}Figure 3.4.c gives a path.

• Cycle == {ls : LineSet | graph(ls) ∈ LGRAPH ∩ CYCLE [Point ,Line]}Figure 3.4.d shows a cycle.

We will define now the plane graphs:

PLGRAPH == {G : LGRAPH | (∀ li , lj : EG | li 6= lj • li ∩ lj = Node(li) ∩ Node(lj ))}

• A collection of lines that intersect each other (only) at their ends is

PGraph == {ls : LineSet | graph(ls) ∈ PLGRAPH }

Figure 3.5.a is showing a plane graph. (We are using the same name for a graph anda collection of lines, assuming that the context will make clear which we are referringto.)

42

Figure 3.5: — (a) a plane graph, (b) a partition boundary, (c) a plane cycle, (d) a regionboundary.

Every node of Nodes(ls) is a vertex of graph graph(ls), thus we can talk about the degreeof a node as the degree of the vertex in graph(ls).

deg : LineSet " (Point � )

∀ ls : LineSet • dom deg ls = Nodes(ls) ∧∀ p : Nodes(ls) • deg ls(p) = degree graph(ls)(p)

• The spatial partition is a central concept in our perception of space. The boundaryof a spatial partition is a plane graph which edges are the border curves between tworegions, and its vertices are the points where more than two regions of the partitionmeet[12]. Considering the fact that every looped line has a node, which degree istwo, a partition boundary is a plane graph without cut edges, and its only nodeswith degree two are the nodes of looped lines. A plane graph ls such that no line ofit is a cut edge in graph(ls), and the only nodes of degree two are the nodes of loopedlines, is :

PBound == {ls : PGraph | no cut edges in graph(ls) ∧∀ p : Nodes(ls) | deg ls(p) = 2 • ∃ l : ls • Node(l) = {p}}

Figure 3.5.b gives a partition boundary (an element of PBound).

• We will define plane cycles using cycles and plane graphs.

PCycle == Cycle ∩ PGraph

Figure 3.5.c gives a plane cycle.

A Jordan curve is a continuous non-self-intersecting curve whose origin and terminus co-incide. A Jordan curve J partitions the rest of the plane into two disjoint open set calledthe interior and the exterior of J , denoted by Int J and Ext J , respectively. Uρ(O , 1) isthe unit disk in R2, where O : Point and ρ is the Euclidean distance. To say that a subset

43

Figure 3.6: — (a) The hierarchy of directed line collections, (b) The hierarchy of linecollections

A ⊂ R2 is homeomorphic with the unit disk we will write A ∼ Uρ(O , 1). For all Jordancurves Int J ∼ Uρ(O , 1).The union of lines of a plane cycle constitutes a Jordan curve:

J : PCycle " Line

∀ c : PCycle • J (c) =⋃

l∈c l

For any plane cycle c, J (c) is a Jordan curve and Int J (c) ∼ Uρ(O , 1).

• Another subtype of plane graphs is RegBound that we will need for the two dimen-sional element region; the boundary of a region is of type RegBound .

RegBound == {cs : �PCycle, ls : PGraph | ls =⋃

c∈cs

c ∧

∃ c0 : cs • (∀ c : cs \ {c0} • Int J (c) ⊂ Int J (c0) ∧∀ c ′ : cs \ {c0, c} • Int J (c) ∩ Int J (c ′) = �) ∧

SCycles graph(ls) = #cs • ls}

(c0 is the outer cycle.) The reasoning about the constraints put for a region bound-ary, will be given when the two-dimensional primitive region will be introduced.Figure 3.5.d gives an example of region boundary.

From the hierarchy of graph elements we can derive a hierarchy on one-dimensional ele-ments. Figure 3.6.b gives the hierarchy of line collection types.Directed graphs on sets Point and DLine are:

DiLGRAPH : DIGRAPH [Point ,DLine]

∀D : DiLGRAPH • VD = FNodes(AD) ∪ TNodes(AD) ∧ψD = {a : AD • (a, (FNode(a),TNode(a)))}

44

Figure 3.7: The relation between one-dimensional elements and graphs

For every ds : DLineSet we can build a digraph that has as the set of arcs the directedlines of ds , as the set of vertices the begin and end nodes of directed lines from ds , and itsincidence function associates with every d ∈ ds the begin and end node of d .

digraph : DLineSet " DiLGRAPH

∀ ds : DLineSet • digraph(ds) = (FNodes(ds) ∪ TNodes(ds), ds , ψds) ∧∀ d : ds • ψds(d) = (FNode(d),TNode(d))

Subtypes of DLineSet can be defined using subtypes of directed graphs.

• DTree == {ds : DLineSet | digraph(ds) ∈ DiLGRAPH ∩ DITREE [Point ,DLine]}

• DPath == {ds : DLineSet | digraph(ds) ∈ DiLGRAPH ∩ DIPATH [Point ,DLine]}

• DCycle == {ds : DLineSet | digraph(ds) ∈ DiLGRAPH∩DICYCLE [Point ,DLine]}

• A collection of directed lines, which ranges intersect each other only at their ends is

DPGraph == {ds : DLineSet | digraph(ds) ∈ DiLGRAPH ∧∀ d , d ′ : ds ,∀ a, b : [0, 1] | d 6= d ′ ∧ d(a) = d ′(b) • a, b ∈ {0, 1}

These subtypes of DLineSet are connected to the subtypes of LineSet by the functionlines . Tree, Path, Cycle, and PGraph are the relational images by lines of DTree, DPath,DCycle, and DPGraph, respectively, e.g. Tree = lines�DTree�. The hierarchy in directedlines is analogous to the hierarchy of lines. The hierarchy of types introduced here is givenin figure 3.6.a.Figure 3.7 gives the relationship between collections of lines, collections of directed lines,graphs, and digraphs.

3.2.3 Two-dimensional elements

Elements that will be introduced here are regions, finite collections of regions, (quasi-)disjoint regions, and spatial partitions.

45

A plane graph G partitions the rest of the plane into a number of connected regions; theclosures of these regions are called the faces of G . Each plane graph has exactly oneunbounded face, called the exterior face. The definition that we will give for a region anda spatial partition is the formalisation of this statement.

• A face is the two-dimensional primitive that we call region. A region is the abstractionof an object for which the position and the extent are relevant [13]. A region is aregular closed set (i.e., it is a set without isolated points or lines, dangling lines, cutsor punctures) of which the interior is a connected set. The set given in figure 3.8.a isnot a region because it contains a cut and a dangling line, 3.8.b and 3.8.c are regions(3.8.c is an example of an exterior face that is an unbounded region), but 3.8.d and3.8.e are not regions because the interior of the sets is not connected.

Figure 3.8: — (a) not a region, (b), (c) regions, (d), (e) not regions

To give the definition of a region we will use homeomorphic sets to the unit disk in R2

(we will omit the distance ρ from its notation). The frontier of a set A is denoted by∂A, its interior is denoted by A◦, and the closure is denoted by A. The frontier of R2

is the empty set. We will use the notation β(A) for the set of lines that constitutesthe frontier of A, and we will call it the boundary of A.

Region == {r : �Point ,D : iseq1 �Point | (D(1) ∼ U (O , 1) ∨ D(1) = Point) ∧∀ j : 2 . .#D • D(j ) ⊂ D(1) ∧ D(j ) ∼ U (O , 1) ∧∀ j , k : 2 . .#D | j 6= k • D(j ) ∩ D(k) = � ∧∀ j , k : dom D | j 6= k •

∂D(j ) ∩ ∂D(k) = Nodes(β(D(j ))) ∩ Nodes(β(D(k))) ∧#SCycles graph(β(r)) = #D ∧ r = D(1) \

⋃j∈2..#D

D(j ) • r}

D(1) is the whole R2, or a set homeomorphic to the closed unit disk. The lastpredicate states that a region r is a set D(1) with (probably) a finite number of holesin it, D(j )’s. When D(1) = R2 the region is unbounded, otherwise it is bounded.

46

The unit disk in R2 is a regular open set, and the sets D(j ) (j : dom D \{1}) are regularopen sets for being homeomorphic to the unit disk. The union of two regular open setsis not always a regular open set. It is true that:

A = A◦ ∧ B = B

◦ ∧ x ∈ A ∪ B◦ \ (A ∪ B)⇒ x ∈ ∂A ∩ ∂B .

If ∂A ∩ ∂B : �Point then the union A ∪ B is a regular open set. With mathematicalinduction this can be proved for the union of a finite number of regular open sets. Thusthe union of D(j )’s (j : dom D \ {1}) is a regular open set, because we ask for theirboundaries to meet in a finite set of points.A closed unit disk is a regular closed set, R2 is regularly closed. This implies thatD(1) is regularly closed. A region r is the difference of D(1) with the union of D(j )’s(j : dom D \ {1}), that is the intersection of D(1) with the complement of the union.The complement of a regular open set is a regular closed set, thus a region r is theintersection of two regular closed sets, which is not always a regular closed set. It istrue that:

E = E◦ ∧ F = F ◦ ∧ x ∈ (E ∩ F ) \ (E ∩ F )◦ ⇒ x ∈ ∂E ∩ ∂F .

If ∂E ∩ ∂F : �Point and E ∩ F has no isolated points, then the intersection of E andF is a regular closed set. We ask for the intersection of ∂D(1) with any other ∂D(j )to be a finite set a points, then the intersection of the frontier of D(1) with the frontierof the union of the other D(j )’s is again a finite set of points, because the frontier ofunion is a subset of union of the frontiers, the finite union of finite sets is a finite set,and any subset of a finite set is also finite. The union of D(j )’s (j : dom D \ {1}) is asubset of D(1) and no D(j ) (j : dom D) has isolated points, thus the difference has noisolated points. This makes r a regular closed set.

The boundary of a region r is the union of boundaries of D(j )-sets (j : dom D) thatbuild it up: β(r) =

⋃j∈dom D β(D(j )). We (also) want the interior of a region r to be

a connected set.

β(r) has (at least) as many cycles as there are D sets that form the region r . Ifthere are other cycles in graph(β(r)), they would define other open sets that would bedisconnected from the rest of the region. This is why we ask the number of cycles ingraph(β(r)) to be equal to the number of D sets that form the region r . This alsoimplies that the frontiers of every pair of D sets can meet at most in one point, whichis an assertion made about polygons (a subtype of region) in [7].

A Jordan curve (uniquely) defines its interior, which is a point set that is homeo-morphic to the unit disk in R2. Thus, a set D(j ) is defined by its boundary β(D(j )),and a region is defined by the boundaries of the D(j ) sets that makes it up, and it isof type RegBound .

The boundary of a set D , homeomorphic to the unit disk, is of type PCycle, thus β(r)is the union of one or more plane cycles. For all the sets D(j ) that make up the regionr we want the meeting points of their frontiers to be nodes of the cycles forming theboundaries of these sets D(j ), so β(r) : PGraph. Int J (D(1)) includes the interiors ofthe other cycles, and the predicate on the number of cycles in β(r) is also satisfied. Allthese make β(r) : RegBound .

47

Figure 3.9: — (a) a collection of regions, (b) quasi-disjoint regions, (c) spatial partition

An alternate definition of a region would be from its boundary: as the set differenceof the closure of interior of the outer cycle, with the interiors of the other cycles.

• RegionSet is built from Region in the same way PointSet and LineSet are built fromPoint and Line, respectively. Thus RegionSet == �Region. Its elements are finitecollection of regions. Figure 3.9.a gives a collection of (three) regions, two of whichare overlapping, two of which are touching each other, and the last combination oftwo regions is a pair of disjoint regions.

• (Quasi-) disjoint regions is a collection of regions, of which the interiors are pairwisedisjoint, and they intersect each other in a finite set of points (it can be empty) fromtheir frontiers.

DisjointRegs == {rs : RegionSet | ∀ r , r ′ : rs • r 6= r ′ ⇒ (r ◦ ∩ r ′◦ = � ∧r ∩ r ′ = Nodes(β(r)) ∩ Nodes(β(r ′)) )}

The last predicate assures us that the meeting points of regions are nodes of the linesets constituting their boundaries. Figure 3.9.b gives an example of quasi-disjointregions.

• A spatial partition is the separation of the plane in regions, of which the interiors aredisjoint, but they share with each other (part of) their frontiers.

SpPartition == {rs : RegionSet | (∀ r , r ′ : rs | r 6= r ′ • r ◦ ∩ r ′◦ = � ∧r ∩ r ′ =

⋃l∈β(r)∩β(r ′) l ∪ ⋃p∈Nodes(β(r))∩Nodes(β(r ′)) p) ∧⋃

r∈rs r = R2}

The last predicate states that the union of regions of a spatial partition is the wholeplane. In every spatial partition there is only one exterior region (unbounded set).The intersection of two regions in a spatial partition is the intersection of their bound-aries. The predicate on the intersection of two regions is forcing the splitting of a line

48

Figure 3.10: Two-dimensional elements hierarchy

from the boundary of a region in the point where it touches the frontier of anotherregion, or at the ends of a common part.

Figure 3.9.c gives an example of a spatial partition. The exterior region is given ingray colour, the other regions are white. Figure 3.9.d gives two regions A and Bthat are part of a spatial partition. Their boundaries are β(A) = {a, f , g , d , i} andβ(B) = {a, b, c, d , e}, having two lines a and d in common. The intersection of theirfrontiers is ∂A∩∂B = a ∪d ∪{3}, that is two lines and one node. This is not quite acommon example, but it explains the expression used in the second predicate of thedefinition of spatial partitions.

The boundary of a spatial partition is

Bound : SpPartition � PGraph

∀ S : SpPartition • Bound(S ) =⋃

r∈S β(r)

There are two possibilities for a pair of lines in Bound(S ) : They are both in theboundary of one region, and in that case they can intersect each other only at theirnodes; or, they are in the boundaries of two different regions, and for being differentthey can intersect each other only at their nodes, because the frontiers of two regionsintersect each other at the nodes of their boundary lines (or in lines from the boundary,which is not the case here because the lines are different). Thus, two lines in Bound(S )can intersect only at their ends, i.e., Bound(S ) : PGraph. Every line in Bound(S ) is inthe boundary of a region of S , and as such it can not be a cut edge (in graphBound(S )). Ifthe only nodes with degree two are the nodes of looped lines, then Bound(S ) : PBound .

The hierarchy of two-dimensional elements is given in figure 3.10.Given a plane graph G , one can define another graph G∗ as follows: corresponding to eachface f of G there is a vertex f ∗ of G∗, and corresponding to each edge e of G there is anedge e∗ of G∗; two vertices f ∗ and g∗ are joined by the edge e∗ in G∗ if and only if theircorresponding faces f and g are separated by the edge e. The graph G∗ is called the dualof G . The dual of a plane graph is a planar graph [6].The dual of the graph of a spatial partition boundary represents the adjacency relationshipbetween regions; each edge of the dual graph stands for an adjacency relationship betweentwo region-vertices. This translates regions adjacency problems in graph problems, forwhich fast algorithms exist.

49

Two regions of a spatial partition S are adjacent if they share a line l ∈ Bound(S ). A linefrom Bound(S ) is in the frontier of only two regions r , r ′ ∈ S , that is

∀ S : SpPartition • ∀ l : Bound(S ) •∃ r : S • ∃1 r ′ : S • (l ∈ β(r) ∩ β(r ′) ∧ ∀ r ′′ : S \ {r , r ′} • l ∩ β(r ′′) = �)

To any line in the boundary of a spatial partition we can join these two adjacent regions.

Adj == {S : SpPartition, ls : PGraph, r : Region, l : Line |ls = Bound(S ) ∧ l ∈ ls ∧ r ∈ S • (l , {l : β(r) • r})}

Adj is a partial function, and ADJ == ran Adj is the adjacency relationship in Region.Using this relationship we will define dual graphs (but only for a subset) of plane graphs.

Dual : PGraph � PGRAPH [Region,Adj ]

∀ ls : PGraph,∀ S : SpPartition | ls = Bound(S ) •Dual(ls) = (S , ls � Adj , (ls � Adj )� second)

The function Dual is defined in dom Dual = Bound�SpPartition�. The boundary of aspatial partition forms a plane graph (with no cut edges) and its dual is a graph whichvertices are the regions of the spatial partition, and its edges are the adjacency relationshipsbetween these regions. Thus, for every S : SpPartion the dual of G = graph(Bound(S )) isG∗ = Dual(S ).

3.3 Summary

The diagram of figure 3.2 is (more or less) a complete diagram of types presented hereand their relationships. Some types or some parts of this diagram will be used in theconceptual schemas of spatial applications to represent spatial objects that are relevantfor the specific application. This diagram is putting in a single schema the relationshipbetween zero-, one-, and two-dimensional primitives and their collections, the hierarchieson one-and two-dimensional collections (given in figures 3.6 and 3.10), and the relationshipbetween higher dimension and lower dimension primitives — Node, DLine, and Region.The diagram is also giving the relationship between the directed lines and lines. Becausethe hierarchy on DLineSet is analogue with the hierarchy in LineSet , it is not presentedin this diagram. The relationships BEGIN OF and END OF between DLine and Nodeare realised by the function FNode and TNode, respectively. The relationship RANGEbetween DLine and Line is achieved by ran.The types introduced here are points and finite collections of points, Point and PointSet ,respectively.Linear features in plane are lines and directed lines, which are the elements of types Lineand DLine. A line is a point set, and a directed line is a point set with an order defined onit. The types LineSet and DLineSet are introduced to describe the collection of lines and

50

directed lines (respectively). A collection of lines connected to each other in a linear fashionis a path (an element of Path). When branching is allowed in such collections, trees areachieved and their type is Tree. If a linear collection is closed i.e., the beginning of the firstelement coincide with the end of the last element, a Cycle is obtained. Analogue collectionsof directed lines such that all the components of the collection follow the same directionare Dpath, DTree, and DCycle. Collections of lines which components intersect each otheronly at their ends are of type PGraph. Cycles that satisfy this condition are elements ofPCycle. Analogously to PGraph, DPGraph is defined as a subtype of DLineSet .Areal feature is a region, which is a regular point set (possibly) with holes. Collectionsof regions are RegionSet ; regions of a region set can be overlapping. Collection of regionsthat are quasi-disjoint form the type DisjointRegs . A collection of regions that partitionthe plane is of type SpPartition. To describe the boundary of a region a type RegBoundis introduced in linear features; RegBound is a subtype of PGraph. When the frontier ofany region of a spatial partition is split up only at the end points of a common part withanother region, or at the touching point with another region, the collection of boundarylines of all regions (of the spatial partition) is of type PBound . PBound is also a subtypeof PGraph.

51

Chapter 4

Logical Model

In this chapter we will refine the abstract specification of spatial elements given in Chapter 3into concrete specifications that can serve as a basis for defining the structure of new datatypes. The chapter is made of three parts: Section 4.1 is dedicated to the definitionof concrete schemas of graphs; section 4.2 will continue with the refinement of abstractspecifications of spatial elements, which schemas can serve for the definition of (some)spatial data types; section 4.3 will deal with the possibilities that these spatial data typesoffer to better express and control the relationships between spatial entities in a spatialdatabase. We are assuming that our data types will be implemented in an extensibleRDBMS which data model is an NF2, i.e., allows set and composite types; that supportssubtyping (specialisation and generalisation); and it offers data structures like lists, trees,etc. We will use database terminology to describe concrete schemas such as tables, columns(informal terms), or tuple, attribute (formal ones), etc.

4.1 Graph Design Schemas

We will begin with refining the abstract schema of graph, then based on this schema wewill give the definition of concrete schemas for the other graph features: trees, paths andcycles. We will do the same for the directed graphs and its subtypes: ditrees, dipaths,and dicycles. In some cases the refinement process will take more then one step. We willuse this notation for the refined schemas: the name of each intermediate schema will bethe name of its abstract schema preceded by an e (the refinement of every specificationschema given in 3.1 will be one or two step process); each final design schema will have thename of its abstract schema preceded by c. (We will give the retrieve schemas, the relationbetween the abstract and concrete schemas, and the proofs for the correct refinement inonly a few cases.)

A graph can be represented as a set of free vertices (that are not incident with any edge)and a table that holds the incidence function (a column for the edges and a column for theset of vertices incident with the edges).

52

[V ,E ]cGRAPH : �V × (E ��1 V )

∀G : cGRAPH | G = (W , t) •W ∩ ⋃s∈ran t s = � ∧ ∀ e : dom t • #t(e) ≤ 2

cgraph relates the abstract specification of graphs GRAPH [V ,E ], with the concrete spec-ification cGRAPH [V ,E ].

[V ,E ]cgraph : GRAPH [V ,E ]� cGRAPH [V ,E ]

∀G : GRAPH [V ,E ],∀ cG : cGRAPH [V ,E ] | cG = cgraph(G) •first cG = VG \

⋃e∈EG

ψG(e) ∧ second cG = ψG

To show that cgraph is a total function we should prove that (G , cG1) ∈ cgraph and(G , cG2) ∈ cgraph implies cG1 = cG2, and for every G : GRAPH [V ,E ] exists cG :cGRAPH [V ,E ] such that (G , cG) ∈ cgraph. An element of cGRAPH [V ,E ] is an orderedpair, thus cG1 = (W1, t1) and cG2 = (W2, t2) and they are equal if their co-ordinatesare equal. This is directly derivable from the definition of cgraph as there is that everyG : GRAPH [V ,E ] can be related to an element of cGRAPH [V ,E ] by cgraph.For every cG = (W , t) of type cGRAPH [V ,E ], G = (W ∪ ⋃s∈ran t s , dom t , t) is of typeGRAPH [V ,E ] and cG = cgraph G . This makes cgraph a surjection. Let G1 = (V1,E1, ψ1)and G2 = (V2,E2, ψ2) be of type cGRAPH [V ,E ] and cG1 = (W1, t1), cG2 = (W1, t2) betheir images by cgraph. G1 6= G2 implies that they differ at least in one of their co-ordinates. If E1 6= E2 or ψ1 6= ψ2 then t1 6= t2, which implies G1 6= G2. Otherwise V1 6= V2;ψ1 = ψ2 implies that the sets of non free vertices of G1 and G2 are equal, which impliesW1 6= W2. Thus cgraph is also injective.To write the degree of a vertex in a graph we will need a function that converts fromboolean values to integer values:

BtoI == {(false, 0), (true, 1)}

For a graph G = (W , t) we can define the incidence matrix cInc G as a partial functionfrom the Cartesian product of vertices and edges to the natural numbers, such that for anyvertex v and any edge e:

cInc G(v , e) =

0 if v /∈ t(e)1 if v ∈ t(e) and #t(e) = 22 if v ∈ t(e) and #t(e) = 1

The function cIncd G(v , e) can be written

cInc G(v , e) =

{0 if v /∈ t(e)3−#t(e) if v ∈ t(e)

or cInc G(v , i) = (3−#t(e)) · BtoI (v ∈ t(e))

53

The complete definition of the incidence matrix (its concrete schema) is:

[V ,E ]cInc : cGRAPH [V ,E ]" ((V × E )� )

∀G : cGRAPH [V ,E ] | G = (W , t) • dom cInc G = (⋃

s∈ran t s)× dom t ∧∀ v :

⋃s∈ran t s ,∀ e : dom t • cInc G(v , i) = (3−#t(e)) · BtoI (v ∈ t(e))

Then for an cGRAPH we can define the degree of every vertex of it :

[V ,E ]cDeg : cGRAPH [V ,E ]" (V � )

∀G : cGRAPH [V ,E ] | G = (W , t) • dom cDeg G = W ∪ ⋃s∈ran t s ∧∀ v : W • cDeg G(v) = 0 ∧∀ v :

⋃s∈ran t s • cDeg G(v) =

∑e∈dom t(3−#t(e)) · BtoI (v ∈ t(e))

The property of graphs, the sum of vertices degree is twice the number of edges, is auto-matically satisfied.The function cwalks defines walks in an cGRAPH [V ,E ] :

[V ,E ]cwalks : cGRAPH [V ,E ]"� altseq [V ,E ]

∀G : cGRAPH [V ,E ] | G = (W , t) •cwalks G = {s : altseq [

⋃s∈ran t s , dom t ] | ∃ k : • #s = 2k + 1 ∧

∀ i : 1 . . k • t(s(2i)) = {s(2i − 1), s(2i + 1)}}

ctrails , cpaths , ccycles are defined from cwalks in the same way trails , paths , cycles aredefined from walks .Function subgs builds subgraphs of a given graph cGRAPH [V ,E ] :

[V ,E ]subgs : cGRAPH [V ,E ]"� cGRAPH [V ,E ]

∀G : cGRAPH [V ,E ] •subgs G = {H : cGRAPH [V ,E ] | first H ⊆ first G ∧ second H ⊆ second G}

Subgraphs of a graph whose vertices and edges form paths or cycles can be defined analo-gously from subgs , cpaths , and ccycles .The concrete schema of connected graphs cGRAPH [V ,E ] is:

eCGRAPH [V ,E ] == {G : cGRAPH [V ,E ] | first G = �[G ] ∧∀ u, v :

⋃s∈ran second G

s • ∃ p : cpaths G • p(1) = u ∧ p(#p) = v}

54

A tree is a connected graph (which means no free vertices) which number of edges is oneless the number of its vertices. The refined TREE [V ,E ] is:

[V ,E ]cTREE : E ��1 V

∀T : cTREE ,∀G : cGRAPH [V ,E ] | G = (�[G ],T ) • #⋃

s∈ran T s = #T + 1 ∧∀ u, v :

⋃s∈ran T s • (∃ p : cpathsG • p(1) = u ∧ p(#p) = v)

T is a subset of E × �1 V (from the definition of partial functions). Its cardinal is equalto # dom T , number of tree edges, because it is a function. The first predicate uses thisequality to express the relation between number of edges and number of vertices in a tree.The second predicate assures the connectivity.

Graphs and subgraphs whose table elements are terms of a path are ePATH [V ,E ] andgraphs whose table elements are terms of a cycle are eCYCLE [V ,E ].

ePATH [V ,E ] == {G : cGRAPH [V ,E ], p : cpaths G | ∃ k : • #p = 2k + 1 •(�[V ], {i : 1 . . k • (p(2i), {p(2i − 1), p(2i + 1)})})}

eCYCLE [V ,E ] == {G : cGRAPH [V ,E ], c : ccycles G | ∃ k : • #c = 2k + 1 •(�[V ], {i : 1 . . k • (c(2i), {c(2i − 1), c(2i + 1)})})}

Paths and cycles can be refined as sequences of E × �1 V that satisfy some conditions.cPATH [V ,E ] is such a representation of paths.

[V ,E ]cPATH : seq(E × �1 V )

∀P : cPATH • ran P ∈ E ��1 V ∧ ∀ i : dom P • #second P(i) = 2 ∧∀ i : 1 . .#P − 1 • second P(i) ∩ second P(i + 1) 6= � ∧∀ i : dom P || i − j |6= 1 • second P(i) ∩ second P(j ) = �

The first predicate assures that with every edge is associated a unique set of vertices.The second predicate states that there are exactly two vertices associated with each edge.The third predicate states that the edges of two subsequent sequence members share ver-tices, which ensures connectedness, and the last predicate states that this is true only forsubsequent sequence members, which prevents cycles.

∃G : cGRAPH [V ,E ] • G = {�[V ], ran P} is a statement that is equivalent with the firstand second predicate. The last predicate can be replaced with the following statement:#⋃

i∈dom P second P(i) = #P + 1.

cCYCLE [V ,E ] is a representation of cycles.

55

[V ,E ]cCYCLE : seq(E × �1 V )

∀C : cCYCLE • ran C ∈ E ��1 V ∧ ∀ i : dom C • #second C (i) ≤ 2 ∧second C (1) ∩ second C (#C ) 6= � ∧∀ i : 1 . .#C − 1 • second C (i) ∩ second C (i + 1) 6= � ∧∀ i : dom C || i − j |6= 1 ∧| i − j |6= #C − 1 • second C (i) ∩ second C (j ) = �

The first and second predicate assure that every edge is associated with at most 2 vertices.(Loops are allowed). The third predicate states that the edge of the first sequence membershares vertices with the edge of the last sequence member. The forth predicate statesthat the edge of each sequence member shares vertices with the edge of the next sequencemember. The last predicate states that only the edges of consecutive members or the edgesof the first and the last member share vertices.The first, second and last predicate can be replaced with the following:

∃G : cGRAPH [V ,E ] • G = (�[V ], ran C ) ∧∀ v :

⋃i∈dom C

second C (i) • cDeg G(v) = 2

An efficient digraph representation is as a set of free vertices and a table with three columns:one for the arcs, one for tail of the arcs, and the last for the arcs head.

[V ,A]cDIGRAPH : �V × �(A× V × V )

∀D : cDIGRAPH | D = (W ,T ) •W ∩ (second�T � ∪ third�T �) = � ∧∀ t , t ′ : T | t 6= t ′ • first t 6= first t ′

The last predicate makes T a function from the set of arcs to the Cartesian product ofvertices set (with the difference that a function is of type �(A × (V × V ))). Later wewill refer to an element of the first column in T as an arc of D , an element of the secondcolumn will be called a tail, and an element of the third column will be called a head. Wewill refer to heads, tails and free vertices as vertices of D .The relation between the abstract specification of digraphs and the concrete representationof them is given by:

[V ,A]cdigraph : DIGRAPH [V ,A]� cDIGRAPH [V ,A]

∀D : DIGRAPH [V ,A],∀ cD : cDIGRAPH [V ,A] | cD = cdigraph(D) •first cD = VD \ (first�ranψD� ∪ second�ranψD�) ∧second cD = {a : AD • (a, first ψD(a), second ψD(a))}

Similarly as for cgraph, it can be proven that cdigraph is a bijection.

56

For D = (W ,T ) of type cDIGRAPH [V ,A], its incidence matrix would be:

∀ r : T ,∀(v , a) : dom cIncd D | a = first r •

cIncd D(v , a) =

1 v = second r and v 6= third r−1 v 6= second r and v = third r

0 otherwise

The function cIncd D can be writtencIncd D(v , a) = BtoI (v = second r)− BtoI (v = third r).The (complete) concrete schema of the incidence matrix is:

[V ,A]cIncd : cDIGRAPH [V ,A]" (V × A��)

∀D : cDIGRAPH [V ,A] | D = (W ,T ) •dom cIncd D = (second�T � ∪ third�T �)× first�T � ∧∀ r : T ,∀ v : second�T � ∪ third�T �,∀ a : first�T � | a = first r •cIncd D(v , a) = BtoI (v = second r)− BtoI (v = third r)

The functions cd−D and cd+D are the concrete representation of the functions d−D andd+D , which calculate the indegree and outdegree of vertices in a digraph.

[V ,A]cd− : cDIGRAPH [V ,A]" (V � )cd+ : cDIGRAPH [V ,A]" (V � )

∀D : cDIGRAPH [V ,A] | D = (W ,T ) •dom cd−D = W ∪ second�T � ∪ third�T � ∧ dom cd+D = dom cd−D ∧(∀ v : W • cd−D(v) = 0 ∧ cd+D(v) = 0) ∧∀ v : second�T � ∪ third�T � •

cd−D(v) =∑

r∈T max{0,BtoI (v = third r)− BtoI (v = second r)} ∧cd+D(v) =

∑r∈T max{0,BtoI (v = second r)− BtoI (v = third r)}

The second and third column of a digraph table hold respectively the tails and heads of thearcs of the first column. A vertex can appear several times in the second column, whichmeans as a tail, and it can appear several times in the third column, which means it canbe the head of many arcs. Both columns, the second and the third, are bags of vertices.The multiplicity of any element in the second column is the number of arcs for which it isa tail. The multiplicity of an element in the third column is the number of arcs for whichit is the head. The following functions Heads and Tails build bags on V in which thecardinality of elements is this multiplicity.

57

[V ,A]Heads : �(A× V × V )� bag VTails : �(A× V × V )� bag V

dom Tails = {T : �(A× V × V ) | (∀ r , r ′ : T | r 6= r ′ • first r 6= first r ′)} ∧dom Heads = dom Tails ∧∀T : dom Tails • Tails T = �r : T • second r� ∧ Heads T = �r : T • third r�

The semantics of bag expression is 1:

∀T : dom Tails •∀ v : dom Tails T • Tails T ] v =

∑r∈T BtoI (v = second r) ∧

∀ v : dom Heads T • Heads T ] v =∑

r∈T BtoI (v = third r)

The concrete schema of directed walks (in a digraph) is:

[V ,A]cdws : cDIGRAPH [V ,A]"� altseq [V ,A]

∀D : cDIGRAPH [V ,A] | D = (W ,T ) •cdws D = {s : altseq [second�T � ∪ third�T �, first�T �] | ∃ k : • #s = 2k + 1

∧ {i : 1 . . k • (s(2i), s(2i − 1), s(2i + 1))} ⊆ T}

Functions cdts , cdps , and cdcs build ditrails, dipaths and dicycles in a digraph, and theyare defined from cdws in the same way their abstract equivalents ditrails , dipaths anddicycles are defined from diwalks .A directed tree can be defined in a recursive way using free types in Z.Similarly to computer representations (concrete schemas) of paths and cycles we will definecomputer representations of dipaths and dicycles.

[V ,A]cDIPATH : seq(A× V × V )

∀D : cDIPATH • ∀ i , j : dom D | i 6= j • first D(i) 6= first D(j ) ∧∀ i : 1 . .#D − 1 • third D(i) = second D(i + 1) ∧#(dom Tails D ∪ dom Heads D)− 1 = #D

From the type definition of cDIPATH results that there are no free vertices in dipaths. Thefirst predicate assures that an arc is associated with only one pair of vertices. The secondpredicate states that the tail of a sequence member is the head of the subsequent member.This assures the connectivity in D in a linear fashion. The last predicates states that the

1We will not write the type parameters of a function when it is clear from the context which are theparameters, e.g., we will write Tails T instead of Tails[V, A] T.

58

number of vertices is one less than the number of arcs in D . This predicate (consideringconnectivity) assures the acyclicity in D .

[V ,A]cDICYCLE : seq(A× V × V )

∀D : cDICYCLE • ∀ i , j : dom D | i 6= j • first D(i) 6= first D(j ) ∧∀ i : 1 . .#D − 1 • third D(i) = second D(i + 1) ∧second D(1) = third D(#D) ∧ #(dom Tails D ∪ dom Heads D) = #D

As for dipaths, from cDICYCLE type derives that there are no free vertices in dicycles. Thefirst and the second predicate are the same with the first and second ones in the definitionof computer dipaths. The third predicate guarantees the closing of dicycle — the head ofthe first sequence member is the tail of the last one. The last predicate states that thenumber of vertices is equal to the number of arcs, and it ensures that only subsequent arcs,or the first and the last arc are adjacent.

4.2 Spatial Data Types

Spatial elements given formally in chapter 3 will be implemented as abstract data types.(An abstract data type generally consists of a structure definition as well as a definition ofoperators that are exclusively applicable to this type [25].) What we will give here is a basisfor building the data structure of these types. The other important part of an abstractdata type, the operators, can be defined later considering the structure of the data type.Having the formal specification of spatial elements we can proceed with their refinement.As for the abstract specifications of spatial elements, we will find quite useful the designschemas of graph types. We will not pay much attention to the implementation schemasof points and lines, and pass them by just saying a few words about the existing imple-mentation of them. We will concentrate more on the types defined for collections of linesand collections od directed lines, and we will say a few words about the implementation oftwo-dimensional elements.In presenting the concrete representations of spatial elements, we follow the order in whichtheir abstract schemas were given. Assuming that there is a finite representation of realnumbers in Z, let us call it Q , it is obvious that:

cPoint == Q ×Q and cPointSet = � cPoint

An implementation of directed lines is a sequence of points cDLine == seq1 cPoint and afunction that defines the interpolation between these points, such that the curves obtainedfrom the interpolation between subsequent points do not intersect each other except forthe subsequent curves that meet in their common point from the sequence.Lines are defined as the sets of points obtained from the interpolation on directed lines.cLine is the spatial type for lines and it is derived from the elements of cDLine, i.e. its

59

elements are not stored, but calculated from the stored directed lines. The most simple(and commonly used) case is linear interpolation. Considering linear interpolation as the(only) interpolation method used, lines would be:

cLine == {d : cDLine •(d ,

⋃i∈1..#d−1

{λ : [0, 1] • (xi · (1− λ) + xi+1 · λ, yi · (1− λ) + yi+1 · λ)})}

The begin and end node of a directed line will be:

cFNode == {d : cDLine, p : cPoint | p = d(1) • (d , p)}cTNode == {d : cDLine, p : cPoint | p = d(#d) • (d , p)}

The function cNode associates every (directed) line with its ends:

cNode == {d : cDLine • (d , {cFNode(d), cTNode(d)})}.

Before proceeding with the concrete schemas for collections of lines, we will make a generalremark for all the collection types: collections of points, collections of lines, directed lines,regions and their subtypes. The idea of their implementation is a structure that storesreferences to the components that constitute the collection. The components themselves,i.e., their geometry, will be stored elsewhere. The data structure of an abstract type fora collection should provide for a fast check of (mostly topological) relationships betweencollection components. For collection of lines types, the references will be made to thedirected lines that define the line components (because there are no stored lines).Collection of lines are:

cLineSet == {L : cDLine ��1 cNode | L = dom L� cNode}.

The function cDg calculates the degree of every node of a collection of lines.

cDg : cLineSet " (cPoint � )

∀L : cLineSet • dom cDg L =⋃

l∈dom L ∧∀ p :

⋃l∈dom L • cDg L(p) =

∑l∈dom L(3−#L(l)) · BtoI (p ∈ L(l))

The refined Tree is eTree, which is a collection of lines that have a tree shape. Its defini-tion uses the properties of the refined (graph) tree. (Its implementation could be a treestructure.)

eTree == {L : cLineSet ,G : cGRAPH [cPoint , cLine] | G = (�[cPoint ],L) ∧∀ u, v :

⋃l∈dom L

L(l) • (∃ p : cpaths G • p(1) = u ∧ p(#p) = v) ∧

#L + 1 = #⋃

l∈dom L

L(l) • L}

60

The concrete representation of collections of lines that form a path is cPath. It is therefined Path and its definition makes use of cPATH [V ,E ] properties.

cPath : seq(cDLine × �1 cPoint)

∀P : cPath • ran P ∈ cLineSet ∧∀ i : 1 . .#P − 1 • second P(i) ∩ second P(i + 1) 6= � ∧#P + 1 = #

⋃i∈dom P second P(i)

The concrete representation of collections of lines that form a cycle is:

cCycle : seq(cDLine × �1 cPoint)

∀C : cCycle • ran C ∈ cLineSet ∧second C (1) ∩ second C (#C ) 6= � ∧∀ i : 1 . .#C − 1 • second C (i) ∩ second C (i + 1) 6= � ∧#C = #

⋃i∈dom C second C (i)

The last predicate in the cCycle definition can be replaced by:

∀ p :⋃

i∈dom C second C (i) • cDg C (p) = 2

The concrete representation of collections of lines that intersect only at their nodes is:

cPGraph == {L : cLineSet |∀ d , d ′ : dom L | d 6= d ′ • cLine(d) ∩ cLine(d ′) = L(d) ∩ L(d ′)}

A concrete schema cRegBound for regions boundary will include functions that involvesgeometric calculations for defining that cycles which constitute a cRegBound are inside oroutside each other.The concrete schema of a partition boundary will define a spatial type cPBound such thatthe only nodes with degree two will be the nodes of the looped lines (in the collection oflines that constitute the partition boundary) and there are no lines which adjacent regionsare the same.The set cDLineSet is the concrete representation of collections of directed lines.

cDLineSet : �(cDLine × cPoint × cPoint)

∀D : cDLineSet • ∀ d , d ′ : D | d 6= d ′ • first d 6= first d ′ ∧∀ d : D • second d = cFNode(first d) ∧ third d = cTNode(first d)

The concrete representation of collections of directed lines that form a directed path iscDPath. Its definition is based on properties of cDIPATH [V ,E ].

cDPath : seq(cDLine × cPoint × cPoint)

∀P : cDPath • ran P ∈ cDLineSet ∧∀ i : 1 . .#P − 1 • third P(i) = second P(i + 1) ∧#(dom Tails P ∪ dom Heads P)− 1 = #P

61

The first predicate is equivalent with the following statements:

∀ i , j : dom P | i 6= j • first P(i) 6= first P(j ) and∀ i : dom P • second P(i) = cFNode(first P(i)) ∧ third P(i) = cTNode(first P(i))

cDCycle is the refinement of DCycle.

cDCycle : seq(cDLine × cPoint × cPoint)

∀C : cDCycle • ran C ∈ cDLineSet ∧second C (1) = third C (#C ) ∧∀ i : 1 . .#C − 1 • third C (i) = second C (i + 1) ∧#(dom Tails C ∪ dom Heads C ) = #C

A cRegion will be defined as a boundary that defines the region and a (label) point thatis inside the area enclosed from the boundary, which means it is a subset of the Cartesianproduct cPoint × cRegBound that satisfies the inside constraint of the point to the areadefined from the boundary. The collections of regions and its subtypes can then be definedusing the concrete schema of a region, and the constraints of the collection type.

4.3 Mapping Rules

The spatial data types introduced in the previous section can be used to describe the shapeof entities in the conceptual model of a spatial application, in other words there will bean attribute of this entity type which describes the geometry of it, and the domain of thisattribute will be one of the spatial types given in section 4.2. This means they will work atthe tuple level, setting constraints that define what are the allowed tuples. On the otherhand, collection types can be used at a relation level, setting constraints that define whatare the allowed extensions of a possible relation instance (for this entity type).

Before continuing with translation rules, we give a table of (topological) relationshipsbetween spatial primitives: points, lines, and regions, which will be needed to formulatethe rules. 2

cPoint cLine cRegioncPoint =; 6= ∈; /∈ inside; in the boundary; outsidecLine intersect;

share part;disjoint

inside; outside; intersects the boundary; is partof the boundary; shares part with the boundary;meets the boundary

cRegion the 9-intersection matrix : disjoint, contains, con-tained, equals, meets, covers, covered by, overlaps,

2A complete list of topological relationships between lines and regions can be found in [17]. The listgiven here suffices for our purpose.

62

All entity types for which the shape is described by a collection type will be translated ina relation that holds (the attributes of) the entity itself, and another relation that has thegeometry of all the components of the (collection) entities.A spatial entity type that has shape of type cPoint will be translated in a single relation.A spatial entity type with cLine (/cDLine) shape, with a cLineSet (/cDLineSet) extensionconstraint, and no ‘share part’ intrarelational constraint (a relationship that is definedonly between some entities and cannot be defined in the whole entity type extension)will be translated in a single relation schema, which means the geometry will be storedtogether with the other attributes; otherwise there will be a separate relation for storingthe geometry.A spatial entity type with cRegion shape, cRegionSet extension constraint, and no ‘meets’,‘covers’ and ‘covered by’ intrarelational constraint, will be translated in a single relation;otherwise the geometry should be kept in a separate relation.The existence of relationships — ∈ between cPoint and cLine; ‘in the boundary’ betweencPoint and cRegion; ‘is part of the boundary’, ‘shares part with the boundary’, ‘meets theboundary’ between cLine and cRegion — in the conceptual schema, is also affecting thetranslation to the relational schema.Because of different implementation schemas, the hierarchy of spatial data types is not thesame as the hierarchy of spatial elements. Figure 4.1 gives a partial hierarchy of spatialdata types, including two generalised types Geo, which indicates any spatial type, and Ext ,which is any of the types of one- or two- dimensional elements. The other types, cPath,cCycle, cDPath, cDCycle (not included in the figure for lack of space) are directly underthe Ext type in the hierarchy of types.

Figure 4.1: A (partial) hierarchy of spatial data types

63

Chapter 5

Conclusions

Based on topology concepts, we formally defined zero-, one-, and two-dimensional primi-tives: Point , Line, and Region. A distinction was made between a directed line and a line:a directed line is an ordered set.Using the set constructor, collections of points, collections of lines, directed lines, andcollections of regions were built. To build special collections of linear objects, i.e., (lines/ directed lines) collections that satisfy some constraints, we used graph concepts. Firstwe presented a formal definition of graphs, directed graphs (digraphs), paths, cycles, trees,dipaths, dicycles and ditrees. We gave another representation for collections of lines asgraphs, and collections of directed lines as digraphs. Using these representations and graphconcepts, we introduced the other one-dimensional elements. Topological constraints wereimposed in collections of regions to build new two-dimensional elements.We defined the relation between spatial primitives with the lower dimensional primitives.Thus the boundary of a line is a set of points that are the nodes of this line; a directedline has a pair a points as its begin and end node; a region boundary is a collection of linesthat satisfy some constraints. RegBound is the type that describes a region boundary.We refined the formal specifications of spatial elements introduced, to get schemas of themthat are closer to computer representations. Thus, a point is a pair of real numbers, acollection of points is a set of pairs. A directed line is a sequence of pairs, and a lineis calculated from that sequence using an interpolation function. Collections of lines,collections of directed lines, and their subtypes are structures that store references totheir components and take care of the topological relationships between the components.For collections of lines and their subtypes, the references are made to directed lines thatresolve the lines (because the lines are not stored, but calculated from the stored directedlines). A region is resolved (determined) from its boundary, thus, it is represented by the(structure of the) collection of lines that makes up its boundary. The types representingcollection of regions are structures that store references to the region components, andenforce constraints in the collection.The adjacency relationship between regions was represented as a graph, which translatesproblems related to adjacency relationships into graph problems.The (concrete) schemas of spatial elements can be used to define the (data) structure

64

of abstract data types, and the operators can be defined based on this structure. Theabstract data types are then used in the conceptual model to describe spatial attributesof (spatial) entity types. The data types of collections of (spatial) elements can be usedin the conceptual model also to impose constraints over the extension of a spatial entitytype. These spatial data types can be implemented in a nested (NF2) relational modelthat supports structures such as lists or trees.Using the topological relationships between spatial primitives and constraints imposed onthe spatial attribute (by its data type) and the relation extension, we defined a set of rulesfor translating from an (extended) Entity-Relationship schema to a relational schema.As a final remark, the formal specifications of graph elements define type constructors, e.g.,graph constructor, path constructor, etc., which we applied to build types representing(special) collection of lines / directed lines.

65

Appendix A

A.1 Topology Concepts

Definition We say two sets A and B intersect each other if A∩B 6= �. Otherwise, A andB are disjoint.

Definition A metric space is an ordered pair (M , ρ) consisting of a set M together witha function ρ : M ×M " R satisfying, for x , y , z ∈ M :

1. ρ(x , y) ≥ 0,

2. ρ(x , x ) = 0; ρ(x , y) = 0 implies x = y ,

3. ρ(x , y) = ρ(y , x ),

4. ρ(x , y) + ρ(y , z ) ≥ ρ(x , z ) (triangle inequality).

The function ρ is called metric on M . Functions ρ : M ×M " R (which are potentialmetrics, but which have not yet been tested) are called distance functions.

Fact The distance function

ρ((x1, ...xn), (y1, ...yn)) =√∑n

k=1(xk − yk)2

is called the usual metric in Rn . Rn together with the usual metric is a metric space.

Definition A metric ρ on M is bounded iff for some constant A, ρ(x , y) ≤ A for all x andy in M .

Definition Let (M , ρ) be a metric space, x a point of M . For ε > 0, we define

Uρ(x , ε) = {y ∈ M | ρ(x , y) < ε}

called the ε-disk about x . When there is no confusion about ρ, we can abbreviate Uρ(x , ε)to U (x , ε).

Definition A set E in a metric space (M , ρ) is open iff for each x ∈ E there is an ε-diskU (x , ε) about x contained in E . A set is closed iff it is the complement of an open set.Evidently, a set F is closed iff whenever every disk about x meets F , then x ∈ F .

66

Definition If (M , ρ) and (N , σ) are metric spaces, a function f : M " N is continuousat x in M iff for each ε > 0, there is some δ > 0 such that σ(f (x ), f (y)) < ε wheneverρ(x , y) < δ.

Theorem The open sets in a metric space (M , ρ) have the following properties:

1. Any union of open sets is an open set.

2. Any finite intersection of open sets is an open set.

3. � and M are both open.

Definition A topology on a set X is a collection τ of subsets of X , called the open sets,satisfying :

1. Any union of elements of τ belongs to τ ,

2. any finite intersection of elements of τ belongs to τ ,

3. � and X belong to τ .

We say (X , τ) is a topological space, sometimes abbreviated “X is a topological space”when no confusion can result about τ .

Definition If X is a topological space and E ⊂ X , we say E is closed iff X − E is open.

Definition if (X , τ) is a topological space and A ⊂ X , the collection τ ′ = {G ∩A | G ∈ τ}is a topology for A, called the relative topology for A. The fact that a subset of X is beinggiven this topology is signified by referring to it as a subspace of X .

Definition Let (M , ρ) be a metric space. Then (by Theorem ...) the open sets in M forma topology on M , called the metric topology τρ.

Definition If X is a topological space and E ⊂ X , the closure of E in X is the set

E = ClX (E ) =⋂{K ⊂ X | K is closed and E ⊂ K}

E is closed, and it is the smallest closed set containing E .

Definition If X is a topological space and E ⊂ X , the interior of E in X is the set

E ◦ = IntX (E ) =⋃{G ⊂ X | G is open and G ⊂ E}

E ◦ is open and it is the largest open set contained in E .

Definition If X is a topological space and E ⊂ X , the frontier of E in X is the set

∂E = FrX (E ) = E ∩ X − E

The frontier of E is a closed set.

Theorem For any subset E of a topological space X :

67

1. E = E ∪ Fr(E )

2. E ◦ = E − Fr(E )

3. X = E ◦ ∪ Fr(E ) ∪ (X − E )◦

Definition An open subset G in a topological space is regularly open iff G is the interiorof its closure, i.e. G = (G)◦. A closed subset C is regularly closed iff it is the closure ofits interior, i.e. C = C ◦.

Facts

1. The complement of a regularly open set is a regularly closed set and vice versa.

2. The intersection, but not necessarily the union, of two regularly open sets is regularlyopen.

3. The union, but not necessarily the intersection, of two regularly closed sets is regularlyclosed.

Definition If X is a topological space and x ∈ X , a neighbourhood (abbreviated in nhood)of x is a set U which contains an open set V containing x . Thus, evidently, U is a nhoodof x iff x ∈ U ◦.

Definition Let X and Y be topological spaces and let f : X "Y . Then f is continuousat x0 ∈ X iff for each nhood V of f (x0) in Y , there is a nhood U of x0 in X such thatf (U ) ⊂ V . We say f is continuous on X iff f is continuous at each x0 ∈ X .

Definition If X and Y are topological spaces, a function f : X "Y is a homeomorphismiff f is bijective and continuous and f −1 is also continuous. In that case we say that X andY are homeomorphic.

Definition A space X is disconnected iff there are disjoint nonempty open sets H and Kin X such that X = H ∪K . When no such disconnection exists, X is connected.

A.2 Z at work

To add persons to our Birthday Book, we use the schema:

AddBirthday∆BirthdayBookname? : NAMEdate? : DATE

name? /∈ known

birthday ′ = birthday ∪ {name? 7→ date?}

68

∆BirthdayBook indicates that the state will change after this operation. ? is a conventionfor naming input variables. name? /∈ known is a precondition for the success of theoperation (and it is reasonable, because each person can only have one birthday). Whatwe expect to happen is that the set of names known to the system be augmented with thenew name.

known ′ = known ∪ {name?}

It can be proved that this happens (using the specification of AddBirthday).To find the birthday of a person, we use the schema:

FindBirthdayΞBirthdayBookname? : NAMEdate! : DATE

name? ∈ known

date! = birthday(name?)

ΞBirthdayBook indicates that the state of the system does not change after this operation.! is a convention for naming output variables. The precondition for success of the operationis that name? ∈ known.We also need to specify an initial state of our system, that naturally is an empty BirthdayBook, which means known is an empty set. The schema below specifies this initial state.

InitBirthdayBookBirthdayBook

known = �

Until now, we have not been considering error situations, e.g. we try to add a person thatis already in the book, or we look for the birthday of a person that is not in the book.To include those cases also, we should change our specification (in fact we will only addsome other schemas and then combine them). First, let us declare a schema that reportsa successful operation.

Successresult ! : REPORT

result ! = ok

The schema AddBirthday ∧ Success describes an operation which, for correct input, bothacts as described by AddBirthday and produces the result ok .

69

We will show here a complete specification only for the AddBirthday operation. (Ananalogue specification can be given for FindBirthday operation.) We declare now a schemathat deals with the case when we try to add an existing name.

AlreadyKnownΞBirthdayBookname? : NAMEresult ! : REPORT

name? ∈ known

result ! = already known

The complete (robust) RAddBirthday operation will be

RAddBirthday = (AddBirthday ∧ Success) ∨ AlreadyKnown.

The operation RAddBirthday terminates, whatever is its input. If the input name? is al-ready known, the state of the system does not change, and the result already known is re-turned; otherwise, the new birthday is added to the database as described by AddBirthday ,and the result ok is returned.What we have done until now is the abstract description of our Birthday Book. We shouldthink now of a concrete description of the system. It seems to be a good idea the use ofarrays to store the names and their birthdays. So the variables

names : array [1 . . ] of NAME ;dates : array [1 . . ] of DATE ;

can be a right implementation in a programming language (for simplicity let us not discussabout the dimension of the arrays). The name and birthday of a person has the same indexin two arrays. A mathematical approximation of an array is a function from 1 (naturalswithout 0) to the desired type.

names : 1" NAMEdates : 1" DATE

Then the concrete state space can be defined as

BirthdayBook1names : 1� NAMEdates : 1" DATEhwm :

The injection names assures us that there are no repetitions among the elements of thenames array. hwm (stands for ‘high water mark’) shows how much of the arrays is in use.

70

The retrieve schema that defines the relationship between the abstract and the concreteschema is

AbsBirthdayBookBirthdayBook1

known = { i : 1 . . hwm • names(i) }

∀ i : 1 . . hwm •birthday(names(i)) = dates(i)

This schema relates the variables of the abstract schema – known and birthday , withthe variables of the concrete schema – names , dates and hwm. The first predicate saysthat the set known consists of just those names which occur somewhere among names(1),. . . , names(hwm). The second predicate says that the birthday for names(i) is the corre-sponding element dates(i) of the array dates .The schema AddBirthday1 adds another person in the book. To add a new name, weincrease hwm by one, and fill in the name and date in the arrays:

AddBirthday1∆BirthdayBook1name? : NAMEdate? : DATE

∀ i : 1 . . hwm • name? 6= names(i)

hwm ′ = hwm + 1names ′ = names ⊕ {hwm ′ 7→ name?}dates ′ = dates ⊕ {hwm ′ 7→ date?}

It is a correct implementation of AddBirthday , because of the following two facts:

1. Whenever AddBirthday is legal in some abstract state, the implementation AddBirthday1is legal in any corresponding concrete state - it is a correct operation refinement.

2. The final state which results from AddBirthday1 represents an abstract state whichAddBirthday could produce - it is a correct concrete operation.

The operation AddBirthday is legal exactly if its pre-condition name? /∈ known is satisfied.If this is so, the predicate

known = { i : 1 . . hwm • names(i) }

from Abs tells us that name? is not one of the elements names(i):

∀ i : 1 . . hwm • name? 6= names(i)

71

This is the pre-condition of AddBirthday1. That means AddBirthday1 is legal wheneverAddBirthday is legal.To prove the second fact, we need to think about the concrete states before and after anexecution of AddBirthday1, and the abstract states they represent according to Abs . Thetwo concrete states are related by AddBirthday1, and we must show that the two abstractstates are related as prescribed by AddBirthday :

birthday ′ = birthday ∪ {name? 7→ date?}

The above equality is on functions that are both of the type NAME � DATE . To provethat two functions are equal we should prove that their domains are the same, and thatthey map each element of the domain to the same element in the target.The domains of these two functions are the same:

dom birthday ′

= known ′ [invariant after]

= { i : 1 . . hwm ′ • names ′(i) } [from Abs ′]

= { i : 1 . . hwm • names ′(i) } ∪ {names ′(hwm ′)}[since hwm ′ = hwm + 1]

= { i : 1 . . hwm • names(i) } ∪ {name?}[since names ′ = names ⊕ {hwm ′ 7→ name?}]

= known ∪ {name?} [from Abs ]

= dom birthday ∪ {name?}. [invariant before]

To prove that the mapping is done correctly, we will separate the proof in two parts.For all i in the range 1 . . hwm,

names ′(i) = names(i) ∧ dates ′(i) = dates(i).

For any i in this range,

birthday ′(names ′(i))

= dates ′(i) [from Abs ′]

= dates(i) [dates unchanged]

= birthday(names(i)). [from Abs ]

For the new name, stored at index hwm ′ = hwm + 1,

birthday ′(names ′(hwm ′))

= dates ′(hwm ′) [from Abs ′]

= date?. [spec. of AddBirthday1]

72

So the two functions birthday ′ and birthday ∪ {name? 7→ date?} are equal, and the ab-stract states before and after the operation are guaranteed to be related as described byAddBirthday .The concrete operation for FindBirthday is

FindBirthday1ΞBirthdayBook1name? : NAMEdate! : DATE

∃ i : 1 . . hwm •name? = names(i) ∧ date! = dates(i)

The predicate says that there is an index i at which the names array contains the inputname?, and the output date! is the corresponding element of the array dates . For thisto be possible, name? must in fact appear somewhere in the array names : this is thepre-condition of the operation.Since neither the abstract nor the concrete operation changes the state, there is no needto check that the final concrete state is acceptable, but we need to check that the pre-condition of FindBirthday1 is sufficiently liberal, and that the output date! is correct.The pre-conditions of the abstract and concrete operations are in fact the same: that theinput name? is known. The output is correct because for some i , name? = names(i) anddate! = dates(i), so

date!

= dates(i) [spec. of FindBirthday1]

= birthday(names(i)) [from Abs ]

= birthday(name?). [spec. of FindBirthday1]

Let us define now the schema for the initial state of the program:

InitBirthdayBook1BirthdayBook1

hwm = 0

This is correctly representing the initial abstract state (it is a correct initial concrete state),because

known

= { i : 1 . . hwm • names(i) } [from Abs ]

= { i : 1 . . 0 • names(i) } [from InitBirthdayBook1]

= �. [since 1 . . 0 = �]

73

Bibliography

[1] Serge Abiteboul and Richard Hull. IFO : A Formal Semantic Database Model. InGio Wiederhold, editor, ACM Transactions on Database Systems, pages 525–565.Association for Computing Machinery, December 1987.

[2] Herman Balsters, Rolf de By, and Roberto Zicari. Typed Sets as a Basis for Object-Oriented Database Schemas. In Oscar M. Nierstrasz, editor, Proc. European Conf. onObject-oriented Programming (ECOOP’93), pages 161–184. Springer-Verlag, 1993.

[3] Rosalind Barden, Susan Stepney, and David Cooper. Z in Practice. Prentice Hall,Cambridge, England, August 1994.

[4] T. Bittner and A.U. Frank. An Introduction to the Application of Formal Theoriesto GIS. In F. Dollinger and J. Strobl, editors, Informationsverarbeitung IX (AGIT),pages 11–22. Institut fuer Geographie and Universitat Salzburg and Salzburger Ge-ographische Materialien, 1997.

[5] T. Bittner and A.U. Frank. On Representing Geometries of Geographic Space. InT.K. Poiker and N. Chrisman, editors, 8th Int. Symposium on Spatial Data Handling,pages 111–122. International Geographic Union, July 1998.

[6] J.A. Bondy and U.S.R. Murty. Graph Theory with Applications. The Macmillan PressLtd, University of Waterloo, Ontario, Canada, 1976.

[7] Open GIS Consortium. Open GIS Simple Feature Specification for OLE/COM. Tech-nical Report Document 99-050, Open GIS Project, 1999.

[8] Rolf A. de By. Database Design. Course Notes. International Institute for AerospaceSurvey and Earth Sciences (ITC), 1998.

[9] Rolf A. de By and Rom Langerak. Specificatie-Methoden. Universiteit Twente -Faculteit der Informatica, August 1996.

[10] Antoni Diller. An Introduction to Formal Methods. John Wiley & Sons, School ofComputer Science, University of Birmingham, February 1990.

74

[11] Martin Erwig and Ralf Hartmut Guting. Explicit graphs in a functional model forspatial databases. IEEE Transactions on Knowledge and Data Engineering, 6(5):787–804, October 1994.

[12] Martin Erwig and Markus Schneider. Partition And Conquer. Technical Report CH-97-07, Chorochronos, 1997.

[13] Ralf Hartmut Guting, Michael H. Bohlen, Martin Erwig, Christian S. Jensen, Nikos A.Lorentzos, Markus Schneider, and Michalis Vazirgiannis. A Foundation for Represent-ing and Querying Moving Objects. Technical Report CH-98-03, Chorochronos, 1998.

[14] Thanasis Hadzilacos and Nectaria Tryfona. An Extended Entity-Relationship Modelfor Geographic Application. SIGMOD Record, 26(3), September 1997.

[15] P. Haunold, A.U. Frank, S. Grumbach, G. Kuper, and Z. Lacroix. Geometric Ob-jects Represented by Inequalities. In F. Dollinger and J. Strobl, editors, AngewandteGeographische Informationsverarbeitung IX (AGIT), pages 77–86. Institut fuer Ge-ographie, Universitaet Salzburg, Salzburger Geographsiche Materialien, July 1997.

[16] D.M. Mark and A.U. Frank. Experiential and Formal Models of Geographic Space.Environment and Planning, Series B(23), 1996.

[17] Martien Molenaar. An Introduction to the Theory of Spatial Object Modelling. Taylor& Francis, International Institute for Aerospace Survey and Earth Sciences (ITC),Enschede, The Netherlands, 1998.

[18] C. Parent, S. Spaccapietra, E. Zimanyi, P. Donini, C. Plazanet, and C. Vangenot.Modeling Spatial Data in the MADS Conceptual Model. In Proceedings of the 8thInt. Symp. on Spatial Data Handling, SDH’98, July 1998.

[19] C. Parent, S. Spaccapietra, E. Zimanyi, P. Donini, C. Plazanet, C. Vangenot,N. Rognon, J. Pouliot, and P. A. Crausaz. MADS: un modele conceptuel pour desapplications spatio-temporelles. Revue Internationale de Gomatique, 7(3-4), 1997.

[20] Dieter Pfoser and Nectaria Tryfona. Requirements, Definitions, and Notations forSpatiotemporal Application environments. Technical Report CH-98-09, Chorochronos,1998.

[21] Rosanne Price, Nectaria Tryfona, and Christian S. Jensen. A Conceptual Mod-eling Language for Spatiotemporal Applications. Technical Report CH-99-20,Chorochronos, 1999.

[22] J.M. Spivey. The Z Notation. Prentice Hall, Oxford University Computing Laboratory,September 1988.

[23] Nectaria Tryfona and Thanasis Hadzilacos. Evaluation of Database Modeling Methodsfor Geographic Information systems. Technical Report CH-97-05, Chorochronos, 1997.

75

[24] Nectaria Tryfona and Christian S. Jensen. Conceptual Data Modeling for Spatiotem-poral Applications. Technical Report CH-98-08, Chorochronos, 1998.

[25] Gottfried Vossen. Data Models, database languages and database management systems.Addison-Wesley, Rheinisch-Westfalische Tecnische Hochschule Aachen, 1990.

[26] Stephen Willard. General Topology. Addison-Wesley, University of Alberta, April1970.

[27] Michael F. Worboys. GIS : A Computing Perspective. Taylor & Francis, Departmentof Computer Science, University of Keele, Keele, UK, 1998.

76

mapping an extended er model to a spatial relational model · grouping, etc.) entity-relationship...

Documents