ralph-johan back | mikołaj olszewski | damián...

24
Deseo Meeting Scheduler: Database Schema with Verified Business Logic Ralph-Johan Back | Mikołaj Olszewski | Damián Soriano TUCS Technical Report No 937, March 2009

Upload: vannga

Post on 29-Mar-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Ralph-Johan Back | Mikołaj Olszewski | Damián Sorianotucs.fi/publications/attachment.php?fname=TR937.full.pdf · Ralph-Johan Back | Mikołaj Olszewski ... The above constraints

Deseo Meeting Scheduler:

Database Schema with Verified

Business Logic

Ralph-Johan Back | Mikołaj Olszewski | Damián Soriano

TUCS Technical Report No 937, March 2009

Page 2: Ralph-Johan Back | Mikołaj Olszewski | Damián Sorianotucs.fi/publications/attachment.php?fname=TR937.full.pdf · Ralph-Johan Back | Mikołaj Olszewski ... The above constraints

TUCS Technical Report

No 937, March 2009

Deseo Meeting Scheduler: Database Schema with

Verified Business Logic

Ralph-Johan Back Åbo Akademi University, Department of Computer Science

Mikołaj Olszewski

Åbo Akademi University, Department of Computer Science

TUCS Graduate School

Damián Soriano

Åbo Akademi University, Department of Computer Science

National Unversity of Rosario, Faculty of Exact Sciences, Engineering and Surveying

Page 3: Ralph-Johan Back | Mikołaj Olszewski | Damián Sorianotucs.fi/publications/attachment.php?fname=TR937.full.pdf · Ralph-Johan Back | Mikołaj Olszewski ... The above constraints

Abstract

Database applications are the significant part of the software market. With the

expansion of Web 2.0 and web services the need for a well defined database schema and

its correct behaviour is crucial. Techniques of formal verification are usually expensive,

time consuming and require mathematical skills and knowledge, therefore are not used

in the development of simple, not critical systems. A lightweight formal verification

method of database schema and its integrity constraints is needed in order to be

applicable to such systems. There exist such methods for object-oriented software, but

they are not directly applicable to database development. In this paper we try to address

this issue. We present a case study we used to develop a lightweight formal verification

method for database application. The technique we propose is based on simple

mathematical concepts and utilises automated theorem proving system, PVS.

Keywords: formal methods, theorem proving, verification, database applications, PVS

TUCS Laboratory

Software Construction Laboratory

Page 4: Ralph-Johan Back | Mikołaj Olszewski | Damián Sorianotucs.fi/publications/attachment.php?fname=TR937.full.pdf · Ralph-Johan Back | Mikołaj Olszewski ... The above constraints

1

1. Introduction

A typical software project starts with a set of informally stated properties

(requirements), which evolve to semi-formally or formally stated features and

constraints of the developed system. These in sequence are transformed into a model of

the data that is implemented to produce an executable code. The specification is then

used to create a test suite that the program is run against in order to ensure it meets the

requirements. Often mistakes, misunderstandings or bugs are found during testing, what

results in repeated implementation and testing phases.

As opposed to that approach, there are several methods that aim at producing

software that is correct by construction. Most of them require the developers to start

with a formal specification that is written in a more or less mathematical language. The

specification is used to construct a program that is correct in terms of behaviour, but the

final step, the translation from specification to executable code is done mostly manually

and thus is prone to errors.

The concept of constructing correct software is not new. It has been considered

by Dijkstra [1], van Emden [2], Reynolds [3] or Back [4] in different forms, where

software means a complete computer program that can be executed. A relational

database, that is more a set of structural and behavioural properties, is not executable

and therefore this approach can be applied only to some aspects of the database, like

stored procedures and triggers, and not the database as a whole. However, there are

several techniques that map databases to object-oriented software, with Active Record

[5] being one of the most significant. Such patterns in theory allow using the approaches

from object-oriented systems to construct correct by construction databases, but the

correctness only concerns the structure of the database.

Automated theorem proving is often used as a support for formal specification

and verification due to the fact that even the simplest specification can lead to complex

theorems. PVS Specification and Verification System [6] is a language integrated with

support tools and a theorem proving program and can be used to formally state the

properties of the system and prove them correct. PVS does not place additional

requirements on the system to be verified; therefore it is suitable for any system

development. It has been successfully used in the field of invariant based programming

[7], especially in the SOCOS programming platform [8].

Recently verifying entity relationship models in PVS has been investigated by

V. Choppella et al. [9] The authors focused on formally specifying entity relationship

diagrams, an established way of doing data modelling. There are also established

methods of verifying the database invariants at the level of transactions [10]. Our work

attempts to combine and extend both these solutions. We are investigating a method of

possibly automated verification of the database integrity constraints in order for the

database to provide correct functionality for client applications.

The rest of the paper is organised as follows. In the next section we describe the

case study application used to form our theory and briefly cover principles of database

applications design. In section 3 we develop a formal model based on an example table

from our case study. We show how properties of the system can be proven in section 4.

We end the paper with conclusions and directions for our future work.

Comment [RB1]: Don‟t get this

Comment [RB2]: The paper requires a

short introduction to the data base language

and to PVS. Otherwise it becomes unreadable. These should be part of a

preliminary section, where all notation

introduced in PVS or SQL is briefly explained.

Comment [RB3]: Need to rework the intro, it does not really introduce the topick

and the reference to existing work is too scarce and missplaced.

Page 5: Ralph-Johan Back | Mikołaj Olszewski | Damián Sorianotucs.fi/publications/attachment.php?fname=TR937.full.pdf · Ralph-Johan Back | Mikołaj Olszewski ... The above constraints

2

2. Database application development

The process of developing database applications does not differ much from

developing regular, non-database software. There is an extra step of database

development and some extra effort in the testing and integration phases. In smaller

projects, the databases needed for the application are usually defined before the client

application is created. This means that the application software can use the databases

from the start.

2.1. Requirements analysis

The Deseo Meeting Scheduler is an application that allows its users to schedule

meetings. The users access a central database by connecting with client applications

(web interface or standalone software) in order to create and manage meetings. The

lifecycle of a meeting is presented in Figure 1.

Figure 1. Lifecycle of a meeting.

The requirement analysis phase focuses mostly on the general aspect of the data

to be stored by the database. The initial specification is examined in order to identify

logical entities and the dependencies between them. In the case of Deseo this leads us to

the preliminary entity relationship diagram shown in Figure 2.

2.2. Data modelling

The outcome of the requirements analysis phase, the initial entity relationship

diagram, is used in the data modelling phase. The entities and dependencies are mapped

into corresponding tables and relations, to which attributes are added. More detailed

Create meeting

•The user creating a meeting becomes its host

Set meeting time proposals

•Each proposal defines a starting time for the meeting

Invite other users

•Invited users can give positive or negative answers to the proposals

Wait for answers

•Positive answer means that the user can attend the meeting

Confirm the meeting

•All invited users have agreed to some proposal

Page 6: Ralph-Johan Back | Mikołaj Olszewski | Damián Sorianotucs.fi/publications/attachment.php?fname=TR937.full.pdf · Ralph-Johan Back | Mikołaj Olszewski ... The above constraints

3

analysis of the initial requirements then enables us to derive constraints and integrity

rules for the system [11]. These are presented in Figure 3.

Figure 2. Relations between entities.

Figure 3. Entity constraints.

•A user can be the host of a meeting

Users

•Each meeting must be hosted by a user

•A meeting can have only one host

Meetings

•An invitiation is issued to a user

•A user can have only one invitation for a meeting

•Once issued the meeting invitation cannot change

•The invitation can be cancelled by the host or by the invited user at any time

Invitations

•A proposal refers to a meeting

•A proposal defines the starting time of the meeting

•Different proposals for the same meeting must define different starting time

Proposals

•An answer reflects the invited users' attitute towards proposals

•The answer can be positive or negative

•A positive answer can be given only if the user has no other conflicting meetings

Answers

Page 7: Ralph-Johan Back | Mikołaj Olszewski | Damián Sorianotucs.fi/publications/attachment.php?fname=TR937.full.pdf · Ralph-Johan Back | Mikołaj Olszewski ... The above constraints

4

The above constraints can be divided into two equally important groups. Some

constraints describe the structure of the model – they provide details of the relations

between the entities and the basic types of the data. The remaining properties define the

general behaviour of the database, i.e. the responses the database should make after or

before certain operations take place. This division is clearly visible in the development

process of any database application and influences both the design and the formal

modelling we describe in the following parts of this paper.

2.3. Database design

An important decision that has to be made is how the functionality should be

split between the data base server and the data base client. This decision has a major

impact on the project and is usually very hard to change later. Therefore, it should be

made as early in the development process as possible and such that the probability of

change is small [11].

There are two extreme approaches to the design of the database applications.

One is to reduce the role of a database to only storing the data, while its validation is

performed entirely by client applications. The alternative is that the database also

performs all the computations related to the business logic of the software, so the role of

the client application is restricted to user interaction. In practice, we usually aim for a

solution somewhere between these two extremes, so that the database ensures some

essential data integrity, while client applications perform the remaining checks.

Our case study was designed so that the crucial part, i.e. scheduling of meetings,

is done by the database, whereas user authorisation is left for implementation at the

client application site. This division allows us to prove the core business logic without

depending on the client software. We decided to use PostgreSQL [12] as the database

management system, because it has good support for stored procedures, a feature that

we will need quite a lot.

2.4. Structured Query Language (SQL)

Relational database management systems provide a way to store data and related

procedures. In order to obtain them from the database, the client application must use

Structured Query Language. The language grammar consists of five parts:

clauses;

expressions that produce either simple (scalar) values or tables (collections

of rows and columns);

predicates, which specify conditions that can be evaluated to three-valued

logic and that affect the results of queries and statements;

queries that retrieve data from the database;

statements, which may have a persistent effect on database or the data.

The elements of the language are presented in Figure 4Figure 4 [13]. Although the

language is an ISO standard, due to the size of the standard and its complexity, majority

of relational database management systems provide what is a reasonable

implementation of the standard.

Comment [RB4]: I don‟t see this

division in the design up till now.

Page 8: Ralph-Johan Back | Mikołaj Olszewski | Damián Sorianotucs.fi/publications/attachment.php?fname=TR937.full.pdf · Ralph-Johan Back | Mikołaj Olszewski ... The above constraints

5

Figure 4. Elements of SQL language.

The statement presented on Figure 4Figure 4 is an update query – executing this

statement will change the contents of table country. More precisely, a value of column

population will increase by 1 if value of column name equals to a string „USA‟.

3. Formal modelling

The design of the database follows the principles of Stepwise Feature

Introduction [14] – the functionality is introduced to the database in small steps

(refinements), typically one feature at time. The method we present supports this

approach – the model can be proven correct at any time of the design process. The

refined version of the operation, together with its new constraints and triggers, replaces

the more general one; thus it is possible to reuse the old proofs and focus verification

effort on newly added features.

We illustrate our formalisation technique with an example from the case study.

In the rest of this section we operate on the answers table due to a number of properties

it has. Firstly, it references two other tables, proposals and users. Secondly, the table

plays an important role in preserving an invariant of the system, as positive answers can

be given only provided that there are no conflicting meetings. Furthermore, only invited

users are allowed to give answers. And finally, it has a non-trivial behavioural

dependency, as answers must be removed when the invitation for the related meeting is

removed.

3.1. Structure of the database

The structure of the database is given by the collection of tables (i.e., the data

model). The SQL code that constructs the answers table is presented in Listing 1Listing

1.

1. CREATE TABLE answers ( 2. id integer primary key, 3. user_id integer REFERENCES users(id) on delete cascade not null, 4. proposal_id integer REFERENCES proposals(id) on delete cascade not

null, 5. ok boolean default false 6. );

Listing 1. SQL code that constructs answers table.

Formatted: Font color: Text 1

Page 9: Ralph-Johan Back | Mikołaj Olszewski | Damián Sorianotucs.fi/publications/attachment.php?fname=TR937.full.pdf · Ralph-Johan Back | Mikołaj Olszewski ... The above constraints

6

The primary key (defined in line 2) id identifies uniquely each row of the table.

The field can hold integer numbers, and the value of the field does not change once set

for a record. We usually associate a sequence of numbers with the primary key field,

such that every time a new record is inserted, the new value for the id field is obtained

from it. There are in principle no restrictions on the type of this field – it can store any

type of data, as long as no two different rows have the same key. It is also possible to

change the value of the primary key. Relational database management systems store the

records as (unordered) sets.

The answers table also references users (line 3) and proposals (line 4), as an

answer must contain the information about the user that gives it and the proposal it

concerns. The on delete cascade directive ensures that the answer will be deleted

automatically when the referenced row (either in users or proposals table) is removed.

The not null constraint requires that the newly created or updated invitation does

refer to valid rows in the other tables (users table or proposals table).

The primary key is used to identify the record uniquely, so it can be used to

obtain the values of the fields in the record. Thus we can model a field in a table as a

function from the primary key to the value of the field. This gives us the following

formal specification of the answers table, expressed in PVS (Listing 2Listing 2).

1. answerId: TYPE+ 2. answer_id_type: TYPE = set[answerId] 3. answer_user_id_type: TYPE = [answerId -> userId] 4. answer_proposal_id_type: TYPE = [answerId -> proposalId] 5. answer_ok_type: TYPE = [answerId -> bool] 6. 7. answer_table_type: TYPE = [# 8. idnt: answer_id_type, 9. user_id: answer_user_id_type, 10. proposal_id: answer_proposal_id_type, 11. ok: answer_ok_type 12. #]

Listing 2. The PVS model for the answers table.

In line 1 we introduce a type for the values of primary key. Line 2 defines a

primary key of the table as a set. This is the set of all primary key values that denote an

existing row in the table. It is important to note that our model does not enforce the type

of value for the primary key. Lines 3 and 4 introduce fields referring to primary keys of

other tables, users and proposals respectively. In line 5 we declare the Boolean field

holding the user‟s attitude towards the referenced proposal. Lines 7-12 aggregate all the

fields to create a table type that is used in the rest of the model.

All tables in the database are descried formally in a similar way. We then declare

the database schema as a PVS record, where each field corresponds to one table (Listing

3Listing 3).

1. database_type: TYPE [# 2. users: user_table_type, 3. meetings: meeting_table_type, 4. invitations: invitation_table_type, 5. proposals: proposal_table_type, 6. answers: answer_table_type

Page 10: Ralph-Johan Back | Mikołaj Olszewski | Damián Sorianotucs.fi/publications/attachment.php?fname=TR937.full.pdf · Ralph-Johan Back | Mikołaj Olszewski ... The above constraints

7

7. #] Listing 3. The declaration of the database in PVS.

3.2. Modelling simple operations

The structure of the database in itself provides only a model for the data. The

dependencies between entities are stated and the type of data is set. However, the

structure does not provide the behavioural properties and constraints that we demand

from our system, these are described by the operations performed on the data. All of

them, regardless of how complex they are, are translated to a sequence of four basic

operations: select, update, delete and insert (also known as CRUD, for create, read,

update, and destroy [15][16]). We have to formally define these basic operations as

well, so that it is possible to state properties of the system, reason about the correctness

of both specification and implementation and prove that the properties hold when the

basic operations are executed.

3.2.1. Select

The select operation is used to read data from the table. The use of a predicative

part of the query (also called the where clause) allows the query to return only those

rows, for which the predicate is true. Since the query operates on a specific table, the

returned result depends on the type of the table in question. The selection predicate is

described as shown in Listing 4Listing 4: it is a predicate on the fields of the answers

table.

1. selector_answer_type: TYPE = 2. [[answerId, userId, proposalId, bool] -> bool]

Listing 4. The selector for the answers table.

With the selector we can model the operation that selects proper rows from the

table (Listing 5Listing 5). The return value of the operation matches the type of the table

the query is executed on. Note that contrary to the SQL behaviour, it is not possible to

limit the query to return only certain columns.

1. select_answer(db: database_type, selector: selector_answer_type): 2. answer_table_type = 3. (# idnt := {x: answerId | 4. member(x, idnt(answers(db))) AND 5. selector(x, user_id(answers(db))(x), 6. proposal_id(answers(db))(x), 7. ok(answers(db))(x))}, 8. user_id := user_id(answers(db)), 9. proposal_id := proposal_id(answers(db)), 10. ok := ok(answers(db)) 11. #)

Listing 5. The PVS model for the select operation on the answers table.

The function returns the set of row identifiers in the answers table of the data

base (answers(db)), for which the selector predicate is satisfied (lines 3-7). The value of

the primary key is used to determine the values of the remaining columns (lines 8-10) in

the returned rows. The model thus reflects the behaviour of the underlying SQL query.

Comment [Miki5]: The value of primary key is the only one limited. The

functions are just copied, so in theory it is possible to obtain every value.

Comment [RB6]: This does not sound right. In stead: the other fields are assigned

the corresponding access functions from the input data base. (Not very clear this either).

Page 11: Ralph-Johan Back | Mikołaj Olszewski | Damián Sorianotucs.fi/publications/attachment.php?fname=TR937.full.pdf · Ralph-Johan Back | Mikołaj Olszewski ... The above constraints

8

Whenever there is a need for selecting some records, we first need to construct a

selector, before calling the select function, and then pass the selector to the select

operation. For example, selecting all answers by one specific user U can be done with

the code shown in Listing 6Listing 6. The selector is defined in lines 1-2; the Boolean

condition in line 2 exactly matches the condition we want the rows to satisfy. Line 3

executes the select function.

1. selector(ans: answerId, usr: userId, pr: proposalId, ok: bool): bool = 2. (usr = U) 3. select_answer(db, selector)

Listing 6. Example use of the selector and the select function.

1.2.2. Delete

Deletion removes the rows from the specified table. Again, a predicate is used to

determine the conditions that must be fulfilled in order for the row to be deleted. After

the deletion has taken place, the database is in the new state – the rows affected by the

query are removed. The formal model of delete is shown in Listing 7Listing 7.

1. delete_answer(db: database_type, selector: selector_answer_type): 2. database_type = 3. LET new_answers = (# 4. idnt := {x: answerId | member(x, idnt(answers(db))) AND 5. NOT selector(x, user_id(answers(db))(x), 6. proposal_id(answers(db))(x), 7. ok(answers(db))(x))}, 8. user_id := user_id(answers(db)), 9. proposal_id := proposal_id(answers(db)), 10. ok := ok(answers(db)) #) 11. IN db WITH [answers := new_answers]

Listing 7. The PVS model for the delete operation on the answers table.

The model captures the behaviour delete in SQL. After the removal (lines 3-10)

the table contains all the existing rows for which the selector is not true (lines 4-7). In

the new database state only the answers table is affected (line 11). Other tables are left

intact, as none of them depend on answers (see diagram in Figure 2). If this would not

be the case (for example when deleting a record in meetings table), then all referencing

tables would have to be included in the new database state [17].

As with the select operation, a deletion consists of first constructing a selector

and then executing the operation. For example, to delete all answers that were given to

the proposal P, one has to write the code given in Listing 8Listing 8.

1. select(ans: answerId, usr: userId, pr: proposalId, ok: bool): bool = 2. (pr = P) 3. delete_answer(db, select)

Listing 8. Example use of the delete function.

Page 12: Ralph-Johan Back | Mikołaj Olszewski | Damián Sorianotucs.fi/publications/attachment.php?fname=TR937.full.pdf · Ralph-Johan Back | Mikołaj Olszewski ... The above constraints

9

1.2.3. Update

The update operation differs from the two previously described, as it not only

selects proper rows, but also updates them respectively. The updater function that

specifies how the values of the rows should be changed is needed in addition to the

selector. The updater for the answers table is given in Listing 9Listing 9.

1. updater_answer_type: TYPE = 2. [[answerId, userId, proposalId, bool] -> 3. [userId, proposalId, bool]]

Listing 9. The updater function for the answers table.

The updater returns a tuple of new field values for a given record. It is executed

for every row returned by the selector. Notice that the updater function is defined so that

it does not alter the value of the primary key. Our model disallows changing the primary

key to prevent inconsistencies and difficulties that are hard to track. Such a practice is

considered “harmful” and is almost never done in real-life projects.

As with the delete function the database is in the new state after the operation is

performed. The old rows are updated with new values, in accordance with the updater

function. The model of the updating function for table answers is given in Listing

10Listing 10.

1. update_answer(db: database_type, selector: selector_answer_type, 2. updater: updater_answer_type): database_type = 3. LET cond1(x: answerId): bool = 4. ...reference integrity check... , 5. new_user_id(x: answerId): userId = 6. ...updating user_id... , 7. new_proposal_id(x: answerId): proposalId = 8. ...updating proposal_id... , 9. new_ok(x: answerId): bool = 10. ...updating ok... , 11. new_answers = (# idnt := idnt(answers(db)), 12. user_id := new_user_id, 13. proposal_id := new_proposal_id, 14. ok := new_ok #), 15. db1 = db WITH [answers := new_answers] 16. IN db1

Listing 10. The PVS model for the update query on the table answers.

The structure of the formal operation resembles the behaviour of the SQL

statement. The update must preserve basic integrity constraints, especially the validity

of the relations between tables (lines 3-4, detailed in Listing 11), before the functions

that correspond to the fields of the table are updated (lines 5-10). Once this is done, the

new state of the database is created (line 15), where the contents of the answers table

are updated (lines 11-14) with new functions.

1. LET cond1(x: answerId): bool = 2. LET old_usr = user_id(answers(db))(x), 3. old_prop = proposal_id(answers(db))(x), 4. old_ok = ok(answers(db))(x),

Comment [RB7]: Puh. This is

complicated. Looks right, but surely it can

be expressed more concisely.

Page 13: Ralph-Johan Back | Mikołaj Olszewski | Damián Sorianotucs.fi/publications/attachment.php?fname=TR937.full.pdf · Ralph-Johan Back | Mikołaj Olszewski ... The above constraints

10

5. new_usr = proj_1(updater(x, old_usr, old_prop, old_ok)), 6. new_prop = proj_2(updater(x, old_usr, old_prop, old_ok)), 7. IN member(new_usr, idnt(users(db))) AND 8. member(new_prop, idnt(proposals(db)))

Listing 11. Reference checking for table answers.

The condition for reference integrity is satisfied for a given an answer, if its user

and proposal fields point to existing objects in the database (lines 7-8). We use the

projection functions built in PVS, named proj_x (lines 5-6) that return xth

parameter

from those passed to them to narrow the results of the updater only to a specific column.

In case the condition is not satisfied, updating has no effect, as shown in Listing 12.

Similar code is used to update values of other fields (proposal_id and ok) of this table. 1. (LET) new_user_id(x: answerId): userId = 2. LET usr = user_id(answers(db))(x), 3. prop = proposal_id(answers(db))(x), 4. okk = ok(answers(db))(x) 5. IN (COND NOT selector(x, usr, prop, okk) -> usr, 6. NOT cond1(x) -> usr, 7. ELSE -> proj_1(updater(x, usr, prop, okk)) 8. ENDCOND)

Listing 12. Updating a field in table answers.

Listing 13Listing 13 gives an example of how the function can be used. Assume

we want to change the answers made by the user U so that all positive answers are

negative. Apart from a selector (lines 3-4) we create the updater function (lines 5-6)

before executing the update operation (line 8).

1. U: userId 2. 3. selector(ans: answerId, usr: userId, pr: proposalId, ok: bool): bool = 4. usr = U AND ok 5. updater(ans: answerId, usr: userId, pr: proposalId, ok: bool): 6. [userId, meetingId, proposalId, bool] = (usr, pr, false) 7. 8. update_invitation(db, selector, updater)

Listing 13. Example of the update function.

1.2.4. Insert

Adding a record to a database does not require selectors or updaters, as it

requires the data of a new row and simply adds it to the database. Inserting must take

care of initial integrity constraints, as the update operation, and ensure that the record to

be inserted has a new, unused primary key value. We define the insert operation as in

Listing 14Listing 14.

1. insert_answer(db: database_type, ans: answerId, usr: userId, 2. prop: proposalId, ok: bool): database_type = 3. LET meeting = meeting_id(proposals(db))(prop), 4. new_answers = (# 5. idnt := idnt(answers(db)) WITH [ans := TRUE],

Page 14: Ralph-Johan Back | Mikołaj Olszewski | Damián Sorianotucs.fi/publications/attachment.php?fname=TR937.full.pdf · Ralph-Johan Back | Mikołaj Olszewski ... The above constraints

11

6. user_id := user_id(answers(db)) WITH [ans := usr], 7. proposal_id := proposal_id(answers(db)) WITH [ans := prop], 8. ok := ok(answers(db)) WITH [ans := ok] #), 9. db1 = (COND member(ans, idnt(answers(db))) -> db, 10. NOT member(usr, idnt(users(db))) -> db, 11. NOT member(prop, idnt(proposals(db))) -> db, 12. ELSE -> db WITH [answers := new_answers] 13. ENDCOND) 14. IN db1

Listing 14. The PVS model for the insert operation on answers table.

The insertion function checks whether the new identifier is unused (line 9) and

that the user and proposal of new answer exist in the database (lines 10-11). In case of

not fulfilling the initial constraints no change to the database is made. Otherwise, the

new state of the database is returned, with the answers table containing the new row

(lines 4-8, 12-14).

1.3. Modelling complex operations

Complex constraints are maintained by performing a fixed sequence of simple

commands. This typically happens when an operation modifies the data. Such a fixed

sequence is triggered before or after the operation itself, and is therefore known as a

trigger. There can be any number of triggers for any given operation. Virtually any data

constraint can be preserved by using trigger – the data not meeting the condition is not

processed and the execution of operation is abandoned.

Often triggers are used implicitly, without stating them in code. For example, the

SQL directive on delete cascade that can be used in row declaration is executed in

response to the delete operation. Although being part of the de facto standard syntax, its

behaviour is converted to a trigger function.

Triggers are an integral part of a database that is transparent to clients, so the

client application is not altered when the triggers change. Triggers are used in our case

study to maintain data integrity and to provide desired functionality. In order to reason

about the correctness of our software, we need to formally model triggers, so that

theorems corresponding to different properties of the system can be stated and proven.

3.3.1. Triggers in delete

In the case study one of the functional requirements states that whenever an

invitation to a meeting is deleted, all answers made by the invited user to the proposals

concerning the meeting should be deleted. The dependency is not direct, as the answers

are not pointing to the invitations, but to the users (this follows the intuitive

understanding – it is the user who answers, not the invitation). Such a condition cannot

be expressed with the standard declarative SQL syntax, so triggers have to be used.

Otherwise, the client application would have to implement this behaviour. This

functionality is provided by the trigger presented in Listing 15Listing 15.

1. create function delete_answers_uninvited() returns trigger as $dai$ 2. begin

Comment [RB8]: rephrase the sentence.

Comment [RB9]: Not visible?

Page 15: Ralph-Johan Back | Mikołaj Olszewski | Damián Sorianotucs.fi/publications/attachment.php?fname=TR937.full.pdf · Ralph-Johan Back | Mikołaj Olszewski ... The above constraints

12

3. DELETE FROM answers 4. WHERE user_id = OLD.user_id AND proposal_id IN 5. (SELECT id FROM proposals 6. WHERE meeting_id = OLD. meeting_id); 7. RETURN NULL; 8. end; 9. $dai$ language 'plpgsql'; 10. create trigger tr_delete_answers_uninvited after delete on invitations 11. for each row execute procedure delete_answers_uninvited();

Listing 15. The trigger for deleting answers related to the removed invitation.

Nested selection (lines 5-6) is used to delete only those answers related to the

proposal that concerns the meeting of the invitation that are given by the user the

invitation was issued to (lines 3-4). Since the trigger is executed after the deletion takes

place, the value it returns is ignored. Returning NULL value (line 7) in this case is

considered to be a good practice of programming.

The trigger enhances the functionality of the original delete operation on the

invitations table. In other words, deleting a row in the invitations table automatically

removes all answers that are related to it through the user and meeting fields. Once the

trigger is created, it is not possible to delete only the invitation row without also altering

the other tables. Therefore, we decided to model the trigger in PVS as if it was part of

the delete function. The structure of a formal model is as shown in Listing 16Listing 16.

1. delete_invitation(db: database_type, 2. selector: selector_invitation_type): 3. database_type = 4. LET new_invitations = 5. ...delete invitations based on selector (as in Listing 7Listing

7)..., 6. db1 = db WITH [invitations := new_invitations], 7. del_rows = select_invitation(db, selector), 8. sel1(ans: answerId,usr: userId,prop: proposalId,ok: bool): bool = 9. ...select answers related to deleted invitations... 10. db2 = delete_answer(db1, sel1) 11. IN db2

Listing 16. The PVS model of modified delete operation on table invitations.

The invitations are deleted based on the selector passed to the operation. It is

achieved similarly to what we shown in Listing 7Listing 7. Subsequently we store the

deleted rows into a variable (line 7) that is used to construct a selector (lines 8-9, Listing

17) responsible for deleting related answers (line 10).

1. (LET) sel1(ans: answerId, usr: userId, 2. prop: proposalId, ok: bool): bool = 3. EXISTS (x: invitationId): 4. LET sel2(proposal: proposalId, meeting: meetingId, 5. start: nat, confirmed: bool): bool = 6. meeting = meeting_id(del_rows)(x), 7. related_proposals = select_proposal(db1, sel2) 8. IN member(x, idnt(del_rows)) AND 9. usr = user_id(del_rows)(x) AND 10. member(prop, idnt(related_proposals))

Comment [RB10]: this code is really hard to read. What is OLD.user_id

Page 16: Ralph-Johan Back | Mikołaj Olszewski | Damián Sorianotucs.fi/publications/attachment.php?fname=TR937.full.pdf · Ralph-Johan Back | Mikołaj Olszewski ... The above constraints

13

Listing 17. Formal model of a trigger that deletes answers related to invitation.

We construct a selector sel1 for the answers table to indicate answers that relate

to the deleted invitation (condition in line 8). In addition, the answer must be made by

the user the removed invitation was issued to (line 9) and must concern a proposal (line

10) that refers to the meeting of the invitation (selector sel2 defined in lines 4-6 and

used in line 7).

Having proven this function with a number of theorems that check the desired

behaviour, we can assume that the database state is consistent after deleting the

invitation.

3.3.2. Triggers in update

Whenever the users change their answer to some proposal from negative to

positive, a check must be made in order to ensure there are no meetings that may

conflict with the proposal the answer refers to. By conflict we understand another

meeting the answering user is invited to or is a host of, that is already confirmed and the

time of that meeting overlaps with the time of the proposal the answer refers to. This

concept of conflict is essential to the behaviour of the software, as it disallows users to

agree to two meetings that are held at the same time. The SQL code for the trigger is

shown in Listing 18Listing 18.

1. create function disallow_agreeing_collision() returns trigger as $dac$ 2. declare 3. clm boolean; 4. prop timestamp; 5. dur integer; 6. begin 7. IF NEW.ok THEN 8. SELECT INTO prop,dur p.starts,m.duration FROM proposals p,meetings m 9. WHERE p.id=NEW.proposal_id AND m.id=p.meeting_id; 10. SELECT INTO clm collides(prop,dur,NEW.user_id); 11. IF clm THEN 12. RAISE notice 'cannot_agree_collision'; 13. RETURN NULL; 14. END IF; 15. END IF; 16. RETURN NEW; 17. end; 18. $dac$ language 'plpgsql'; 19. 20. create trigger tr_disallow_agreeing_collision 21. before insert or update on answers 22. for each row execute procedure disallow_agreeing_collision();

Listing 18. The trigger that disallows users to agree to a colliding proposal.

In the code we first collect the information about the proposal and the meeting

the answer refers to (lines 8-9) and then call the external function that checks for

overlapping meetings (line 10). In case a collision has been found, no update occurs

(lines 11-14). Note that the body of the function is executed only if the answer is

positive (line 7). As with the previous situation, the trigger extends the original update

Comment [RB11]: Premture to talk about this yet.

Page 17: Ralph-Johan Back | Mikołaj Olszewski | Damián Sorianotucs.fi/publications/attachment.php?fname=TR937.full.pdf · Ralph-Johan Back | Mikołaj Olszewski ... The above constraints

14

function, this time placing additional condition that must be fulfilled before the

operation can be performed. We use the same approach as previously by incorporating

the trigger into the operation, as generally shown in Listing 19Listing 19.

1. update_answer(db: database_type, selector: selector_answer_type, 2. updater: updater_answer_type): database_type = 3. LET cond1(x: answerId): bool = 4. ...reference integrity check... 5. cond3(x: answerId): bool = 6. ...collision check... 7. new_user_id(x: answerId): userId = 8. ...updating user_id... 9. new_proposal_id(x: answerId): proposalId = 10. ...updating proposal_id... 11. new_ok(x: answerId): bool = 12. ...updating ok... 13. new_answers = (# idnt := idnt(answers(db)), 14. user_id := new_user_id, 15. proposal_id := new_proposal_id, 16. ok := new_ok #), 17. db1 = db WITH [answers := new_answers] 18. IN db1

Listing 19. The PVS model of modified update operation on table answers.

The code structure much resembles the one presented in Listing 10Listing 10. It

contains an additional condition (lines 5-6) that checks whether the answer is colliding

with already confirmed meeting. This condition, seen in details in Listing 20,

corresponds to the SQL code presented in Listing 18Listing 18.

1. (LET) cond3(x: answerId): bool = 2. LET old_usr = user_id(answers(db))(x), 3. old_prop = proposal_id(answers(db))(x), 4. old_ok = ok(answers(db))(x), 5. new_usr = proj_1(updater(x, old_usr, old_prop, old_ok)), 6. new_prop = proj_2(updater(x, old_usr, old_prop, old_ok)), 7. new_ok = proj_3(updater(x, old_usr, old_prop, old_ok)), 8. meet = meeting_id(proposals(db))(new_prop) 9. IN NOT collides(starts(proposals(db))(new_prop), 10. duration(meetings(db))(meet), 11. new_usr, db) 12. OR NOT new_ok

Listing 20. Condition checking whether an answer collides with already confirmed meetings.

By analysing the code of the condition we can see it evaluates to true whenever

the answer is negative (line 12) or there is no collision with the proposal the answer

refers to (lines 9-11). To simplify the code we also use local variable meet that is the

direct reference to a meeting the answer refers to through table proposals (line 8). The

condition is evaluated before the value of the fields is changed, as shown in Listing 21.

1. (LET) new_proposal_id(x: answerId): proposalId = 2. LET usr = user_id(answers(db))(x), 3. prop = proposal_id(answers(db))(x),

Page 18: Ralph-Johan Back | Mikołaj Olszewski | Damián Sorianotucs.fi/publications/attachment.php?fname=TR937.full.pdf · Ralph-Johan Back | Mikołaj Olszewski ... The above constraints

15

4. okk = ok(answers(db))(x) 5. IN (COND NOT selector(x, usr, prop, okk) -> prop, 6. NOT cond1(x) -> prop, 7. NOT cond3(x) -> prop, 8. ELSE -> proj_2(updater(x, usr, prop, okk)) 9. ENDCOND),

Listing 21. Updating the values of a column with additional constraint.

When compared with previously described situation (Listing 12) it can be seen

that the non-colliding constraint is added as a new branch in the evaluation part of the

code (line 7). Thus, the formal model of the operation follows the behaviour of its SQL

correspondence.

3.3.3. Triggers in insert

The requirements of the system state that only the invited users are allowed to

give answers to proposals. In other words, before an answer is inserted into the

database, it has to be checked whether the user that gives the answer to a proposal is

invited to a meeting the proposal refers to. This complex condition must be ensured by a

trigger. Its SQL code is presented in Listing 22Listing 22.

1. create function disallow_uninvited_answers() returns trigger as $dua$ 2. declare 3. meeting integer; 4. zmp1 integer; 5. begin 6. SELECT INTO meeting meeting_id FROM proposals 7. WHERE id=NEW.proposal_id; 8. SELECT INTO zmp1 user_id FROM invitations 9. WHERE meeting_id=meeting AND user_id=NEW.user_id; 10. IF NOT FOUND THEN 11. RAISE notice 'cannot_answer_if_not_invited'; 12. RETURN NULL; 13. END IF; 14. RETURN NEW; 15. end; 16. $dua$ language 'plpgsql'; 17. 18. create trigger tr_disallow_uninvited_answers before insert on answers 18.19. for each row execute procedure disallow_uninvited_answers();

Listing 22. The trigger that disallows inserting new answers by uninvited users.

The information about the meeting the answer‟s proposal refers to (lines 6-7) is

used to find an invitation to that meeting issued to the answering user (lines 8-9). If such

invitation is not found, the insert operation is abandoned (lies 10-13). The trigger can be

modelled in PVS as shown in Listing 23Listing 23.

1. insert_answer(db: database_type, ans: answerId, usr: userId, 2. prop: proposalId, ok: bool): database_type = 3. LET meeting = meeting_id(proposals(db))(prop), 4. cond2 = ok AND collides(starts(proposals(db))(prop), 5. duration(meetings(db))(meeting), usr, db),

Page 19: Ralph-Johan Back | Mikołaj Olszewski | Damián Sorianotucs.fi/publications/attachment.php?fname=TR937.full.pdf · Ralph-Johan Back | Mikołaj Olszewski ... The above constraints

16

6. aux_sel2(i: invitationId, u: userId, m: meetingId): bool = 7. (m = meeting AND u = usr), 8. zmp2 = select_invitation(db, aux_sel2), 9. cond3 = empty?(idnt(zmp2)), 10. new_answers = (# 11. idnt := idnt(answers(db)) WITH [ans := TRUE], 12. user_id := user_id(answers(db)) WITH [ans := usr], 13. proposal_id := proposal_id(answers(db)) WITH [ans := prop], 14. ok := ok(answers(db)) WITH [ans := ok] 15. #), 16. db1 = (COND member(ans, idnt(answers(db))) -> db, 17. NOT member(usr, idnt(users(db))) -> db, 18. NOT member(prop, idnt(proposals(db))) -> db, 19. cond1 -> db, 20. cond2 -> db, 21. cond3 -> db, 22. ELSE -> db WITH [answers := new_answers] 23. ENDCOND) 24. IN db1

Listing 23. The PVS model of modified insert operation.

The trigger is, as previously, merged (lines 6-9, 20) into the model of the

underlying operation. The code also contains a check for colliding meetings (lines 3-5),

similar to the one described in the previous subsection. It can be seen that the

constraints are introduced into the model the same way regardless of the operation: a

corresponding condition is stated and the operation takes place only if no constraint is

violated.

4. Verifying properties

In previous chapter we have presented how to construct a formal model that

corresponds to the database schema. In order to prove that the system behaves correctly

we need to state theorems that correspond to different properties and prove them in

PVS. For example, one property of the system states that users cannot agree to

proposals that collide with other meetings they attend or host. In order to verify this

property several theorems have to be constructed and proven. The one presented in

Listing 24Listing 24 states that inserting a positive answer that conflicts with some

other meeting is not possible. In other words, if there were no answers that violate this

constraint before the insertion, the operation itself does preserve this property.

1. insert_answer_th9: THEOREM 2. FORALL (db: database_type, ans: answerId, usr: userId, 3. prop: proposalId, ok: bool): 4. LET new_db = insert_answer(db, ans, usr, prop, ok) 5. IN (EXISTS (p: proposalId): member(p, idnt(proposals(db))) 6. AND 7. user_id(meetings(db))(meeting_id(proposals(db))(p)) = usr 8. AND 9. confirmed(proposals(db))(p) = TRUE 10. AND

Page 20: Ralph-Johan Back | Mikołaj Olszewski | Damián Sorianotucs.fi/publications/attachment.php?fname=TR937.full.pdf · Ralph-Johan Back | Mikołaj Olszewski ... The above constraints

17

11. starts(proposals(db))(prop) < starts(proposals(db))(p) + 12. duration(meetings(db))(meeting_id(proposals(db))(p)) 13. AND 14. starts(proposals(db))(prop) > starts(proposals(db))(p)) 15. AND 16. ok = TRUE 17. => new_db = db

Listing 24. The theorem that disallows inserting answers that conflict with other meetings.

This theorem was proven by using a basic (grind) strategy in PVS. In total four

similar theorems are needed to prove the property correct – however, all of them are

proven automatically by PVS.

Our system also disallows issuing an invitation to the host of the meeting. This

constraint can be ensured by two SQL triggers, one executed before inserting new

invitation and one before updating an existing invitation. In other words, the host not

being invited to a meeting is an invariant, in particular over the insert operation. By

proving this fact we gain the confidence that no invitation can be issued to the host of a

meeting. To accomplish this we first define a predicate that corresponds to the property,

as stated in Listing 24Listing 24.

1. not_inv_host(db: database_type): bool = 2. FORALL (i:invitationId): member(i, idnt(invitations(db))) => 3. user_id(invitations(db))(i) /= 4. user_id(meetings(db))(meeting_id(invitations(db))(id))

Listing 25. The predicate corresponding to a property of the system.

This predicate is used to form a theorem presented in Listing 26Listing 26. It

states that the host not being invited to any meeting is an invariant over the operation of

adding new invitations. In other words, adding new invitation preserves the property. In

the theory our model forms this theorem was proven simply by expanding the definition

of the predicate and then applying basic (grind) strategy in PVS.

1. insert_invitation_th7: THEOREM 2. FORALL (db:database_type, inv: invitationId, 3. usr: userId, meet: meetingId): 4. LET new_db = insert_invitation(db, inv, usr, meet) 5. IN 6. not_inv_host(db) => not_inv_host(new_db)

Listing 26. The theorem stating that the predicate is an invariant of the insert operation.

To ensure that there is no invitation issued to the host we need to prove that this

invariant holds also over the update operation. However, another property of our system

states that an invitation cannot be modified once issued. Therefore, this invariant is also

preserved by the update on the invitations table.

Similar theorems have been stated and proven for all other properties of the

system. The majority of theorems describe the desired behaviour of operations that can

be performed on database tables and is relatively simple to prove. They can be

considered as equivalent to unit tests in the SQL development. By proving them we

ensure that the operations behave as expected and yield desired results.

Page 21: Ralph-Johan Back | Mikołaj Olszewski | Damián Sorianotucs.fi/publications/attachment.php?fname=TR937.full.pdf · Ralph-Johan Back | Mikołaj Olszewski ... The above constraints

18

5. Conclusions and future work

By using our approach we have verified the database schema of Deseo Meeting

Scheduler with its structural and behavioural features. We have stated and proven 167

theorems and lemmas representing properties of the system and gained the confidence

that the system works according to the specification.

Main advantage of our formalisation method is that the theorems can be proven

with straightforward PVS proving strategies. Most of the theorems in the case study

were proven with (grind) command, and only a few required a more dedicated

approach. During the work we have noticed that the formation of a model and the

stating of the theorems is, to some degree, repetitive and possibly could be automated.

Our current research efforts are directed towards finding such generalisation scheme.

Acknowledgements

We would like to thank Prof. Iván Porres, Dr Viorel Preoteasa and M.Sc. Johannes

Eriksson for collaboration on the Deseo Meeting Scheduler application and for fruitful

discussions on the topics of formal verification and modelling.

References

1. Dijkstra Edsger, A constructive approach to the problem of program correctness. BIT

(1968).

2. van Emden M., Programming with verification conditions. In: IEEE Transactions on

Software Engineering, (1979).

3. Reynolds J., Programming with transition diagrams. In: Gries D., Programming

methodology, Springer-Verlag, Berlin (1978).

4. Back Ralph-Johan, Invariant based programs and their correctness. In: Biermann

W., Guiho G., Kodratoff Y., Automatic Program Construction Techniques, MacMillan

Publishing Company (1983).

5. Fowler Martin, Rice David, Patterns of Enterprise Application Architecture.

Addison-Weasley (2003).

6. PVS Verification and Specification System, http://pvs.csl.sri.com/

7. Back Ralph-Johan, Invariant based programming. In: Donatelli Susanna, Thiagarajan

P., Lecture Notes in Computer Science, Springer (2006).

8. Back Ralph-Johan, Eriksson Johannes, Myréen Magnus, Testing and Verifying

Invariant Based Programs in the SOCOS Environment. TUCS Technical Report (2006)

9. Choppella Venkatesh, Sengupta Arijit, Robertson Edward, Johnson Steven,

Preliminary Explorations in Specifying and Validating Entity-Relationship Models in

PVS. AFM, Atlanta, USA (2007)

10. Sheard T., Stemple D., Automatic verification of database transaction safety, ACM

Transactions on Database Systems, (1989), pp.322-368.

11. Pląska Marta, Development of Database Applications (M.Sc. Thesis). University of

Gdańsk, Gdańsk (2006).

12. PostgreSQL, http://www.postgresql.org

13. Wikipedia, http://en.wikipedia.org/wiki/SQL

Page 22: Ralph-Johan Back | Mikołaj Olszewski | Damián Sorianotucs.fi/publications/attachment.php?fname=TR937.full.pdf · Ralph-Johan Back | Mikołaj Olszewski ... The above constraints

19

14. Back Ralph, Software Construction by Stepwise Feature Introduction. In: ZB 02:

Proceedings of the 2nd International Conference of B and Z Users of Formal

Specification and Development in Z and B, Springer-Verlag (2002).

15. Kilov H., From semantic to object-oriented data modelling, Proceedings of the First

International Conference on Volume, (1990).

16. Kilov Haim, Business Specifications: The Key to Successful Software Engineering.

Prentice Hall (1998).

17. Back Ralph-Johan, Olszewski Mikołaj, Soriano Damián, Deseo Meeting Scheduler:

Towards Automated Verification of Database Integrity Constraints, Automated

Program Verification 2009. ETH Zurich, Rio Cuarto, Argentina (2009)

Page 23: Ralph-Johan Back | Mikołaj Olszewski | Damián Sorianotucs.fi/publications/attachment.php?fname=TR937.full.pdf · Ralph-Johan Back | Mikołaj Olszewski ... The above constraints

ISBN 978-952-12-2275-7

ISSN 1239-1891

Page 24: Ralph-Johan Back | Mikołaj Olszewski | Damián Sorianotucs.fi/publications/attachment.php?fname=TR937.full.pdf · Ralph-Johan Back | Mikołaj Olszewski ... The above constraints