ralph-johan back | mikołaj olszewski | damián...
TRANSCRIPT
Deseo Meeting Scheduler:
Database Schema with Verified
Business Logic
Ralph-Johan Back | Mikołaj Olszewski | Damián Soriano
TUCS Technical Report No 937, March 2009
TUCS Technical Report
No 937, March 2009
Deseo Meeting Scheduler: Database Schema with
Verified Business Logic
Ralph-Johan Back Åbo Akademi University, Department of Computer Science
Mikołaj Olszewski
Åbo Akademi University, Department of Computer Science
TUCS Graduate School
Damián Soriano
Åbo Akademi University, Department of Computer Science
National Unversity of Rosario, Faculty of Exact Sciences, Engineering and Surveying
Abstract
Database applications are the significant part of the software market. With the
expansion of Web 2.0 and web services the need for a well defined database schema and
its correct behaviour is crucial. Techniques of formal verification are usually expensive,
time consuming and require mathematical skills and knowledge, therefore are not used
in the development of simple, not critical systems. A lightweight formal verification
method of database schema and its integrity constraints is needed in order to be
applicable to such systems. There exist such methods for object-oriented software, but
they are not directly applicable to database development. In this paper we try to address
this issue. We present a case study we used to develop a lightweight formal verification
method for database application. The technique we propose is based on simple
mathematical concepts and utilises automated theorem proving system, PVS.
Keywords: formal methods, theorem proving, verification, database applications, PVS
TUCS Laboratory
Software Construction Laboratory
1
1. Introduction
A typical software project starts with a set of informally stated properties
(requirements), which evolve to semi-formally or formally stated features and
constraints of the developed system. These in sequence are transformed into a model of
the data that is implemented to produce an executable code. The specification is then
used to create a test suite that the program is run against in order to ensure it meets the
requirements. Often mistakes, misunderstandings or bugs are found during testing, what
results in repeated implementation and testing phases.
As opposed to that approach, there are several methods that aim at producing
software that is correct by construction. Most of them require the developers to start
with a formal specification that is written in a more or less mathematical language. The
specification is used to construct a program that is correct in terms of behaviour, but the
final step, the translation from specification to executable code is done mostly manually
and thus is prone to errors.
The concept of constructing correct software is not new. It has been considered
by Dijkstra [1], van Emden [2], Reynolds [3] or Back [4] in different forms, where
software means a complete computer program that can be executed. A relational
database, that is more a set of structural and behavioural properties, is not executable
and therefore this approach can be applied only to some aspects of the database, like
stored procedures and triggers, and not the database as a whole. However, there are
several techniques that map databases to object-oriented software, with Active Record
[5] being one of the most significant. Such patterns in theory allow using the approaches
from object-oriented systems to construct correct by construction databases, but the
correctness only concerns the structure of the database.
Automated theorem proving is often used as a support for formal specification
and verification due to the fact that even the simplest specification can lead to complex
theorems. PVS Specification and Verification System [6] is a language integrated with
support tools and a theorem proving program and can be used to formally state the
properties of the system and prove them correct. PVS does not place additional
requirements on the system to be verified; therefore it is suitable for any system
development. It has been successfully used in the field of invariant based programming
[7], especially in the SOCOS programming platform [8].
Recently verifying entity relationship models in PVS has been investigated by
V. Choppella et al. [9] The authors focused on formally specifying entity relationship
diagrams, an established way of doing data modelling. There are also established
methods of verifying the database invariants at the level of transactions [10]. Our work
attempts to combine and extend both these solutions. We are investigating a method of
possibly automated verification of the database integrity constraints in order for the
database to provide correct functionality for client applications.
The rest of the paper is organised as follows. In the next section we describe the
case study application used to form our theory and briefly cover principles of database
applications design. In section 3 we develop a formal model based on an example table
from our case study. We show how properties of the system can be proven in section 4.
We end the paper with conclusions and directions for our future work.
Comment [RB1]: Don‟t get this
Comment [RB2]: The paper requires a
short introduction to the data base language
and to PVS. Otherwise it becomes unreadable. These should be part of a
preliminary section, where all notation
introduced in PVS or SQL is briefly explained.
Comment [RB3]: Need to rework the intro, it does not really introduce the topick
and the reference to existing work is too scarce and missplaced.
2
2. Database application development
The process of developing database applications does not differ much from
developing regular, non-database software. There is an extra step of database
development and some extra effort in the testing and integration phases. In smaller
projects, the databases needed for the application are usually defined before the client
application is created. This means that the application software can use the databases
from the start.
2.1. Requirements analysis
The Deseo Meeting Scheduler is an application that allows its users to schedule
meetings. The users access a central database by connecting with client applications
(web interface or standalone software) in order to create and manage meetings. The
lifecycle of a meeting is presented in Figure 1.
Figure 1. Lifecycle of a meeting.
The requirement analysis phase focuses mostly on the general aspect of the data
to be stored by the database. The initial specification is examined in order to identify
logical entities and the dependencies between them. In the case of Deseo this leads us to
the preliminary entity relationship diagram shown in Figure 2.
2.2. Data modelling
The outcome of the requirements analysis phase, the initial entity relationship
diagram, is used in the data modelling phase. The entities and dependencies are mapped
into corresponding tables and relations, to which attributes are added. More detailed
Create meeting
•The user creating a meeting becomes its host
Set meeting time proposals
•Each proposal defines a starting time for the meeting
Invite other users
•Invited users can give positive or negative answers to the proposals
Wait for answers
•Positive answer means that the user can attend the meeting
Confirm the meeting
•All invited users have agreed to some proposal
3
analysis of the initial requirements then enables us to derive constraints and integrity
rules for the system [11]. These are presented in Figure 3.
Figure 2. Relations between entities.
Figure 3. Entity constraints.
•A user can be the host of a meeting
Users
•Each meeting must be hosted by a user
•A meeting can have only one host
Meetings
•An invitiation is issued to a user
•A user can have only one invitation for a meeting
•Once issued the meeting invitation cannot change
•The invitation can be cancelled by the host or by the invited user at any time
Invitations
•A proposal refers to a meeting
•A proposal defines the starting time of the meeting
•Different proposals for the same meeting must define different starting time
Proposals
•An answer reflects the invited users' attitute towards proposals
•The answer can be positive or negative
•A positive answer can be given only if the user has no other conflicting meetings
Answers
4
The above constraints can be divided into two equally important groups. Some
constraints describe the structure of the model – they provide details of the relations
between the entities and the basic types of the data. The remaining properties define the
general behaviour of the database, i.e. the responses the database should make after or
before certain operations take place. This division is clearly visible in the development
process of any database application and influences both the design and the formal
modelling we describe in the following parts of this paper.
2.3. Database design
An important decision that has to be made is how the functionality should be
split between the data base server and the data base client. This decision has a major
impact on the project and is usually very hard to change later. Therefore, it should be
made as early in the development process as possible and such that the probability of
change is small [11].
There are two extreme approaches to the design of the database applications.
One is to reduce the role of a database to only storing the data, while its validation is
performed entirely by client applications. The alternative is that the database also
performs all the computations related to the business logic of the software, so the role of
the client application is restricted to user interaction. In practice, we usually aim for a
solution somewhere between these two extremes, so that the database ensures some
essential data integrity, while client applications perform the remaining checks.
Our case study was designed so that the crucial part, i.e. scheduling of meetings,
is done by the database, whereas user authorisation is left for implementation at the
client application site. This division allows us to prove the core business logic without
depending on the client software. We decided to use PostgreSQL [12] as the database
management system, because it has good support for stored procedures, a feature that
we will need quite a lot.
2.4. Structured Query Language (SQL)
Relational database management systems provide a way to store data and related
procedures. In order to obtain them from the database, the client application must use
Structured Query Language. The language grammar consists of five parts:
clauses;
expressions that produce either simple (scalar) values or tables (collections
of rows and columns);
predicates, which specify conditions that can be evaluated to three-valued
logic and that affect the results of queries and statements;
queries that retrieve data from the database;
statements, which may have a persistent effect on database or the data.
The elements of the language are presented in Figure 4Figure 4 [13]. Although the
language is an ISO standard, due to the size of the standard and its complexity, majority
of relational database management systems provide what is a reasonable
implementation of the standard.
Comment [RB4]: I don‟t see this
division in the design up till now.
5
Figure 4. Elements of SQL language.
The statement presented on Figure 4Figure 4 is an update query – executing this
statement will change the contents of table country. More precisely, a value of column
population will increase by 1 if value of column name equals to a string „USA‟.
3. Formal modelling
The design of the database follows the principles of Stepwise Feature
Introduction [14] – the functionality is introduced to the database in small steps
(refinements), typically one feature at time. The method we present supports this
approach – the model can be proven correct at any time of the design process. The
refined version of the operation, together with its new constraints and triggers, replaces
the more general one; thus it is possible to reuse the old proofs and focus verification
effort on newly added features.
We illustrate our formalisation technique with an example from the case study.
In the rest of this section we operate on the answers table due to a number of properties
it has. Firstly, it references two other tables, proposals and users. Secondly, the table
plays an important role in preserving an invariant of the system, as positive answers can
be given only provided that there are no conflicting meetings. Furthermore, only invited
users are allowed to give answers. And finally, it has a non-trivial behavioural
dependency, as answers must be removed when the invitation for the related meeting is
removed.
3.1. Structure of the database
The structure of the database is given by the collection of tables (i.e., the data
model). The SQL code that constructs the answers table is presented in Listing 1Listing
1.
1. CREATE TABLE answers ( 2. id integer primary key, 3. user_id integer REFERENCES users(id) on delete cascade not null, 4. proposal_id integer REFERENCES proposals(id) on delete cascade not
null, 5. ok boolean default false 6. );
Listing 1. SQL code that constructs answers table.
Formatted: Font color: Text 1
6
The primary key (defined in line 2) id identifies uniquely each row of the table.
The field can hold integer numbers, and the value of the field does not change once set
for a record. We usually associate a sequence of numbers with the primary key field,
such that every time a new record is inserted, the new value for the id field is obtained
from it. There are in principle no restrictions on the type of this field – it can store any
type of data, as long as no two different rows have the same key. It is also possible to
change the value of the primary key. Relational database management systems store the
records as (unordered) sets.
The answers table also references users (line 3) and proposals (line 4), as an
answer must contain the information about the user that gives it and the proposal it
concerns. The on delete cascade directive ensures that the answer will be deleted
automatically when the referenced row (either in users or proposals table) is removed.
The not null constraint requires that the newly created or updated invitation does
refer to valid rows in the other tables (users table or proposals table).
The primary key is used to identify the record uniquely, so it can be used to
obtain the values of the fields in the record. Thus we can model a field in a table as a
function from the primary key to the value of the field. This gives us the following
formal specification of the answers table, expressed in PVS (Listing 2Listing 2).
1. answerId: TYPE+ 2. answer_id_type: TYPE = set[answerId] 3. answer_user_id_type: TYPE = [answerId -> userId] 4. answer_proposal_id_type: TYPE = [answerId -> proposalId] 5. answer_ok_type: TYPE = [answerId -> bool] 6. 7. answer_table_type: TYPE = [# 8. idnt: answer_id_type, 9. user_id: answer_user_id_type, 10. proposal_id: answer_proposal_id_type, 11. ok: answer_ok_type 12. #]
Listing 2. The PVS model for the answers table.
In line 1 we introduce a type for the values of primary key. Line 2 defines a
primary key of the table as a set. This is the set of all primary key values that denote an
existing row in the table. It is important to note that our model does not enforce the type
of value for the primary key. Lines 3 and 4 introduce fields referring to primary keys of
other tables, users and proposals respectively. In line 5 we declare the Boolean field
holding the user‟s attitude towards the referenced proposal. Lines 7-12 aggregate all the
fields to create a table type that is used in the rest of the model.
All tables in the database are descried formally in a similar way. We then declare
the database schema as a PVS record, where each field corresponds to one table (Listing
3Listing 3).
1. database_type: TYPE [# 2. users: user_table_type, 3. meetings: meeting_table_type, 4. invitations: invitation_table_type, 5. proposals: proposal_table_type, 6. answers: answer_table_type
7
7. #] Listing 3. The declaration of the database in PVS.
3.2. Modelling simple operations
The structure of the database in itself provides only a model for the data. The
dependencies between entities are stated and the type of data is set. However, the
structure does not provide the behavioural properties and constraints that we demand
from our system, these are described by the operations performed on the data. All of
them, regardless of how complex they are, are translated to a sequence of four basic
operations: select, update, delete and insert (also known as CRUD, for create, read,
update, and destroy [15][16]). We have to formally define these basic operations as
well, so that it is possible to state properties of the system, reason about the correctness
of both specification and implementation and prove that the properties hold when the
basic operations are executed.
3.2.1. Select
The select operation is used to read data from the table. The use of a predicative
part of the query (also called the where clause) allows the query to return only those
rows, for which the predicate is true. Since the query operates on a specific table, the
returned result depends on the type of the table in question. The selection predicate is
described as shown in Listing 4Listing 4: it is a predicate on the fields of the answers
table.
1. selector_answer_type: TYPE = 2. [[answerId, userId, proposalId, bool] -> bool]
Listing 4. The selector for the answers table.
With the selector we can model the operation that selects proper rows from the
table (Listing 5Listing 5). The return value of the operation matches the type of the table
the query is executed on. Note that contrary to the SQL behaviour, it is not possible to
limit the query to return only certain columns.
1. select_answer(db: database_type, selector: selector_answer_type): 2. answer_table_type = 3. (# idnt := {x: answerId | 4. member(x, idnt(answers(db))) AND 5. selector(x, user_id(answers(db))(x), 6. proposal_id(answers(db))(x), 7. ok(answers(db))(x))}, 8. user_id := user_id(answers(db)), 9. proposal_id := proposal_id(answers(db)), 10. ok := ok(answers(db)) 11. #)
Listing 5. The PVS model for the select operation on the answers table.
The function returns the set of row identifiers in the answers table of the data
base (answers(db)), for which the selector predicate is satisfied (lines 3-7). The value of
the primary key is used to determine the values of the remaining columns (lines 8-10) in
the returned rows. The model thus reflects the behaviour of the underlying SQL query.
Comment [Miki5]: The value of primary key is the only one limited. The
functions are just copied, so in theory it is possible to obtain every value.
Comment [RB6]: This does not sound right. In stead: the other fields are assigned
the corresponding access functions from the input data base. (Not very clear this either).
8
Whenever there is a need for selecting some records, we first need to construct a
selector, before calling the select function, and then pass the selector to the select
operation. For example, selecting all answers by one specific user U can be done with
the code shown in Listing 6Listing 6. The selector is defined in lines 1-2; the Boolean
condition in line 2 exactly matches the condition we want the rows to satisfy. Line 3
executes the select function.
1. selector(ans: answerId, usr: userId, pr: proposalId, ok: bool): bool = 2. (usr = U) 3. select_answer(db, selector)
Listing 6. Example use of the selector and the select function.
1.2.2. Delete
Deletion removes the rows from the specified table. Again, a predicate is used to
determine the conditions that must be fulfilled in order for the row to be deleted. After
the deletion has taken place, the database is in the new state – the rows affected by the
query are removed. The formal model of delete is shown in Listing 7Listing 7.
1. delete_answer(db: database_type, selector: selector_answer_type): 2. database_type = 3. LET new_answers = (# 4. idnt := {x: answerId | member(x, idnt(answers(db))) AND 5. NOT selector(x, user_id(answers(db))(x), 6. proposal_id(answers(db))(x), 7. ok(answers(db))(x))}, 8. user_id := user_id(answers(db)), 9. proposal_id := proposal_id(answers(db)), 10. ok := ok(answers(db)) #) 11. IN db WITH [answers := new_answers]
Listing 7. The PVS model for the delete operation on the answers table.
The model captures the behaviour delete in SQL. After the removal (lines 3-10)
the table contains all the existing rows for which the selector is not true (lines 4-7). In
the new database state only the answers table is affected (line 11). Other tables are left
intact, as none of them depend on answers (see diagram in Figure 2). If this would not
be the case (for example when deleting a record in meetings table), then all referencing
tables would have to be included in the new database state [17].
As with the select operation, a deletion consists of first constructing a selector
and then executing the operation. For example, to delete all answers that were given to
the proposal P, one has to write the code given in Listing 8Listing 8.
1. select(ans: answerId, usr: userId, pr: proposalId, ok: bool): bool = 2. (pr = P) 3. delete_answer(db, select)
Listing 8. Example use of the delete function.
9
1.2.3. Update
The update operation differs from the two previously described, as it not only
selects proper rows, but also updates them respectively. The updater function that
specifies how the values of the rows should be changed is needed in addition to the
selector. The updater for the answers table is given in Listing 9Listing 9.
1. updater_answer_type: TYPE = 2. [[answerId, userId, proposalId, bool] -> 3. [userId, proposalId, bool]]
Listing 9. The updater function for the answers table.
The updater returns a tuple of new field values for a given record. It is executed
for every row returned by the selector. Notice that the updater function is defined so that
it does not alter the value of the primary key. Our model disallows changing the primary
key to prevent inconsistencies and difficulties that are hard to track. Such a practice is
considered “harmful” and is almost never done in real-life projects.
As with the delete function the database is in the new state after the operation is
performed. The old rows are updated with new values, in accordance with the updater
function. The model of the updating function for table answers is given in Listing
10Listing 10.
1. update_answer(db: database_type, selector: selector_answer_type, 2. updater: updater_answer_type): database_type = 3. LET cond1(x: answerId): bool = 4. ...reference integrity check... , 5. new_user_id(x: answerId): userId = 6. ...updating user_id... , 7. new_proposal_id(x: answerId): proposalId = 8. ...updating proposal_id... , 9. new_ok(x: answerId): bool = 10. ...updating ok... , 11. new_answers = (# idnt := idnt(answers(db)), 12. user_id := new_user_id, 13. proposal_id := new_proposal_id, 14. ok := new_ok #), 15. db1 = db WITH [answers := new_answers] 16. IN db1
Listing 10. The PVS model for the update query on the table answers.
The structure of the formal operation resembles the behaviour of the SQL
statement. The update must preserve basic integrity constraints, especially the validity
of the relations between tables (lines 3-4, detailed in Listing 11), before the functions
that correspond to the fields of the table are updated (lines 5-10). Once this is done, the
new state of the database is created (line 15), where the contents of the answers table
are updated (lines 11-14) with new functions.
1. LET cond1(x: answerId): bool = 2. LET old_usr = user_id(answers(db))(x), 3. old_prop = proposal_id(answers(db))(x), 4. old_ok = ok(answers(db))(x),
Comment [RB7]: Puh. This is
complicated. Looks right, but surely it can
be expressed more concisely.
10
5. new_usr = proj_1(updater(x, old_usr, old_prop, old_ok)), 6. new_prop = proj_2(updater(x, old_usr, old_prop, old_ok)), 7. IN member(new_usr, idnt(users(db))) AND 8. member(new_prop, idnt(proposals(db)))
Listing 11. Reference checking for table answers.
The condition for reference integrity is satisfied for a given an answer, if its user
and proposal fields point to existing objects in the database (lines 7-8). We use the
projection functions built in PVS, named proj_x (lines 5-6) that return xth
parameter
from those passed to them to narrow the results of the updater only to a specific column.
In case the condition is not satisfied, updating has no effect, as shown in Listing 12.
Similar code is used to update values of other fields (proposal_id and ok) of this table. 1. (LET) new_user_id(x: answerId): userId = 2. LET usr = user_id(answers(db))(x), 3. prop = proposal_id(answers(db))(x), 4. okk = ok(answers(db))(x) 5. IN (COND NOT selector(x, usr, prop, okk) -> usr, 6. NOT cond1(x) -> usr, 7. ELSE -> proj_1(updater(x, usr, prop, okk)) 8. ENDCOND)
Listing 12. Updating a field in table answers.
Listing 13Listing 13 gives an example of how the function can be used. Assume
we want to change the answers made by the user U so that all positive answers are
negative. Apart from a selector (lines 3-4) we create the updater function (lines 5-6)
before executing the update operation (line 8).
1. U: userId 2. 3. selector(ans: answerId, usr: userId, pr: proposalId, ok: bool): bool = 4. usr = U AND ok 5. updater(ans: answerId, usr: userId, pr: proposalId, ok: bool): 6. [userId, meetingId, proposalId, bool] = (usr, pr, false) 7. 8. update_invitation(db, selector, updater)
Listing 13. Example of the update function.
1.2.4. Insert
Adding a record to a database does not require selectors or updaters, as it
requires the data of a new row and simply adds it to the database. Inserting must take
care of initial integrity constraints, as the update operation, and ensure that the record to
be inserted has a new, unused primary key value. We define the insert operation as in
Listing 14Listing 14.
1. insert_answer(db: database_type, ans: answerId, usr: userId, 2. prop: proposalId, ok: bool): database_type = 3. LET meeting = meeting_id(proposals(db))(prop), 4. new_answers = (# 5. idnt := idnt(answers(db)) WITH [ans := TRUE],
11
6. user_id := user_id(answers(db)) WITH [ans := usr], 7. proposal_id := proposal_id(answers(db)) WITH [ans := prop], 8. ok := ok(answers(db)) WITH [ans := ok] #), 9. db1 = (COND member(ans, idnt(answers(db))) -> db, 10. NOT member(usr, idnt(users(db))) -> db, 11. NOT member(prop, idnt(proposals(db))) -> db, 12. ELSE -> db WITH [answers := new_answers] 13. ENDCOND) 14. IN db1
Listing 14. The PVS model for the insert operation on answers table.
The insertion function checks whether the new identifier is unused (line 9) and
that the user and proposal of new answer exist in the database (lines 10-11). In case of
not fulfilling the initial constraints no change to the database is made. Otherwise, the
new state of the database is returned, with the answers table containing the new row
(lines 4-8, 12-14).
1.3. Modelling complex operations
Complex constraints are maintained by performing a fixed sequence of simple
commands. This typically happens when an operation modifies the data. Such a fixed
sequence is triggered before or after the operation itself, and is therefore known as a
trigger. There can be any number of triggers for any given operation. Virtually any data
constraint can be preserved by using trigger – the data not meeting the condition is not
processed and the execution of operation is abandoned.
Often triggers are used implicitly, without stating them in code. For example, the
SQL directive on delete cascade that can be used in row declaration is executed in
response to the delete operation. Although being part of the de facto standard syntax, its
behaviour is converted to a trigger function.
Triggers are an integral part of a database that is transparent to clients, so the
client application is not altered when the triggers change. Triggers are used in our case
study to maintain data integrity and to provide desired functionality. In order to reason
about the correctness of our software, we need to formally model triggers, so that
theorems corresponding to different properties of the system can be stated and proven.
3.3.1. Triggers in delete
In the case study one of the functional requirements states that whenever an
invitation to a meeting is deleted, all answers made by the invited user to the proposals
concerning the meeting should be deleted. The dependency is not direct, as the answers
are not pointing to the invitations, but to the users (this follows the intuitive
understanding – it is the user who answers, not the invitation). Such a condition cannot
be expressed with the standard declarative SQL syntax, so triggers have to be used.
Otherwise, the client application would have to implement this behaviour. This
functionality is provided by the trigger presented in Listing 15Listing 15.
1. create function delete_answers_uninvited() returns trigger as $dai$ 2. begin
Comment [RB8]: rephrase the sentence.
Comment [RB9]: Not visible?
12
3. DELETE FROM answers 4. WHERE user_id = OLD.user_id AND proposal_id IN 5. (SELECT id FROM proposals 6. WHERE meeting_id = OLD. meeting_id); 7. RETURN NULL; 8. end; 9. $dai$ language 'plpgsql'; 10. create trigger tr_delete_answers_uninvited after delete on invitations 11. for each row execute procedure delete_answers_uninvited();
Listing 15. The trigger for deleting answers related to the removed invitation.
Nested selection (lines 5-6) is used to delete only those answers related to the
proposal that concerns the meeting of the invitation that are given by the user the
invitation was issued to (lines 3-4). Since the trigger is executed after the deletion takes
place, the value it returns is ignored. Returning NULL value (line 7) in this case is
considered to be a good practice of programming.
The trigger enhances the functionality of the original delete operation on the
invitations table. In other words, deleting a row in the invitations table automatically
removes all answers that are related to it through the user and meeting fields. Once the
trigger is created, it is not possible to delete only the invitation row without also altering
the other tables. Therefore, we decided to model the trigger in PVS as if it was part of
the delete function. The structure of a formal model is as shown in Listing 16Listing 16.
1. delete_invitation(db: database_type, 2. selector: selector_invitation_type): 3. database_type = 4. LET new_invitations = 5. ...delete invitations based on selector (as in Listing 7Listing
7)..., 6. db1 = db WITH [invitations := new_invitations], 7. del_rows = select_invitation(db, selector), 8. sel1(ans: answerId,usr: userId,prop: proposalId,ok: bool): bool = 9. ...select answers related to deleted invitations... 10. db2 = delete_answer(db1, sel1) 11. IN db2
Listing 16. The PVS model of modified delete operation on table invitations.
The invitations are deleted based on the selector passed to the operation. It is
achieved similarly to what we shown in Listing 7Listing 7. Subsequently we store the
deleted rows into a variable (line 7) that is used to construct a selector (lines 8-9, Listing
17) responsible for deleting related answers (line 10).
1. (LET) sel1(ans: answerId, usr: userId, 2. prop: proposalId, ok: bool): bool = 3. EXISTS (x: invitationId): 4. LET sel2(proposal: proposalId, meeting: meetingId, 5. start: nat, confirmed: bool): bool = 6. meeting = meeting_id(del_rows)(x), 7. related_proposals = select_proposal(db1, sel2) 8. IN member(x, idnt(del_rows)) AND 9. usr = user_id(del_rows)(x) AND 10. member(prop, idnt(related_proposals))
Comment [RB10]: this code is really hard to read. What is OLD.user_id
13
Listing 17. Formal model of a trigger that deletes answers related to invitation.
We construct a selector sel1 for the answers table to indicate answers that relate
to the deleted invitation (condition in line 8). In addition, the answer must be made by
the user the removed invitation was issued to (line 9) and must concern a proposal (line
10) that refers to the meeting of the invitation (selector sel2 defined in lines 4-6 and
used in line 7).
Having proven this function with a number of theorems that check the desired
behaviour, we can assume that the database state is consistent after deleting the
invitation.
3.3.2. Triggers in update
Whenever the users change their answer to some proposal from negative to
positive, a check must be made in order to ensure there are no meetings that may
conflict with the proposal the answer refers to. By conflict we understand another
meeting the answering user is invited to or is a host of, that is already confirmed and the
time of that meeting overlaps with the time of the proposal the answer refers to. This
concept of conflict is essential to the behaviour of the software, as it disallows users to
agree to two meetings that are held at the same time. The SQL code for the trigger is
shown in Listing 18Listing 18.
1. create function disallow_agreeing_collision() returns trigger as $dac$ 2. declare 3. clm boolean; 4. prop timestamp; 5. dur integer; 6. begin 7. IF NEW.ok THEN 8. SELECT INTO prop,dur p.starts,m.duration FROM proposals p,meetings m 9. WHERE p.id=NEW.proposal_id AND m.id=p.meeting_id; 10. SELECT INTO clm collides(prop,dur,NEW.user_id); 11. IF clm THEN 12. RAISE notice 'cannot_agree_collision'; 13. RETURN NULL; 14. END IF; 15. END IF; 16. RETURN NEW; 17. end; 18. $dac$ language 'plpgsql'; 19. 20. create trigger tr_disallow_agreeing_collision 21. before insert or update on answers 22. for each row execute procedure disallow_agreeing_collision();
Listing 18. The trigger that disallows users to agree to a colliding proposal.
In the code we first collect the information about the proposal and the meeting
the answer refers to (lines 8-9) and then call the external function that checks for
overlapping meetings (line 10). In case a collision has been found, no update occurs
(lines 11-14). Note that the body of the function is executed only if the answer is
positive (line 7). As with the previous situation, the trigger extends the original update
Comment [RB11]: Premture to talk about this yet.
14
function, this time placing additional condition that must be fulfilled before the
operation can be performed. We use the same approach as previously by incorporating
the trigger into the operation, as generally shown in Listing 19Listing 19.
1. update_answer(db: database_type, selector: selector_answer_type, 2. updater: updater_answer_type): database_type = 3. LET cond1(x: answerId): bool = 4. ...reference integrity check... 5. cond3(x: answerId): bool = 6. ...collision check... 7. new_user_id(x: answerId): userId = 8. ...updating user_id... 9. new_proposal_id(x: answerId): proposalId = 10. ...updating proposal_id... 11. new_ok(x: answerId): bool = 12. ...updating ok... 13. new_answers = (# idnt := idnt(answers(db)), 14. user_id := new_user_id, 15. proposal_id := new_proposal_id, 16. ok := new_ok #), 17. db1 = db WITH [answers := new_answers] 18. IN db1
Listing 19. The PVS model of modified update operation on table answers.
The code structure much resembles the one presented in Listing 10Listing 10. It
contains an additional condition (lines 5-6) that checks whether the answer is colliding
with already confirmed meeting. This condition, seen in details in Listing 20,
corresponds to the SQL code presented in Listing 18Listing 18.
1. (LET) cond3(x: answerId): bool = 2. LET old_usr = user_id(answers(db))(x), 3. old_prop = proposal_id(answers(db))(x), 4. old_ok = ok(answers(db))(x), 5. new_usr = proj_1(updater(x, old_usr, old_prop, old_ok)), 6. new_prop = proj_2(updater(x, old_usr, old_prop, old_ok)), 7. new_ok = proj_3(updater(x, old_usr, old_prop, old_ok)), 8. meet = meeting_id(proposals(db))(new_prop) 9. IN NOT collides(starts(proposals(db))(new_prop), 10. duration(meetings(db))(meet), 11. new_usr, db) 12. OR NOT new_ok
Listing 20. Condition checking whether an answer collides with already confirmed meetings.
By analysing the code of the condition we can see it evaluates to true whenever
the answer is negative (line 12) or there is no collision with the proposal the answer
refers to (lines 9-11). To simplify the code we also use local variable meet that is the
direct reference to a meeting the answer refers to through table proposals (line 8). The
condition is evaluated before the value of the fields is changed, as shown in Listing 21.
1. (LET) new_proposal_id(x: answerId): proposalId = 2. LET usr = user_id(answers(db))(x), 3. prop = proposal_id(answers(db))(x),
15
4. okk = ok(answers(db))(x) 5. IN (COND NOT selector(x, usr, prop, okk) -> prop, 6. NOT cond1(x) -> prop, 7. NOT cond3(x) -> prop, 8. ELSE -> proj_2(updater(x, usr, prop, okk)) 9. ENDCOND),
Listing 21. Updating the values of a column with additional constraint.
When compared with previously described situation (Listing 12) it can be seen
that the non-colliding constraint is added as a new branch in the evaluation part of the
code (line 7). Thus, the formal model of the operation follows the behaviour of its SQL
correspondence.
3.3.3. Triggers in insert
The requirements of the system state that only the invited users are allowed to
give answers to proposals. In other words, before an answer is inserted into the
database, it has to be checked whether the user that gives the answer to a proposal is
invited to a meeting the proposal refers to. This complex condition must be ensured by a
trigger. Its SQL code is presented in Listing 22Listing 22.
1. create function disallow_uninvited_answers() returns trigger as $dua$ 2. declare 3. meeting integer; 4. zmp1 integer; 5. begin 6. SELECT INTO meeting meeting_id FROM proposals 7. WHERE id=NEW.proposal_id; 8. SELECT INTO zmp1 user_id FROM invitations 9. WHERE meeting_id=meeting AND user_id=NEW.user_id; 10. IF NOT FOUND THEN 11. RAISE notice 'cannot_answer_if_not_invited'; 12. RETURN NULL; 13. END IF; 14. RETURN NEW; 15. end; 16. $dua$ language 'plpgsql'; 17. 18. create trigger tr_disallow_uninvited_answers before insert on answers 18.19. for each row execute procedure disallow_uninvited_answers();
Listing 22. The trigger that disallows inserting new answers by uninvited users.
The information about the meeting the answer‟s proposal refers to (lines 6-7) is
used to find an invitation to that meeting issued to the answering user (lines 8-9). If such
invitation is not found, the insert operation is abandoned (lies 10-13). The trigger can be
modelled in PVS as shown in Listing 23Listing 23.
1. insert_answer(db: database_type, ans: answerId, usr: userId, 2. prop: proposalId, ok: bool): database_type = 3. LET meeting = meeting_id(proposals(db))(prop), 4. cond2 = ok AND collides(starts(proposals(db))(prop), 5. duration(meetings(db))(meeting), usr, db),
16
6. aux_sel2(i: invitationId, u: userId, m: meetingId): bool = 7. (m = meeting AND u = usr), 8. zmp2 = select_invitation(db, aux_sel2), 9. cond3 = empty?(idnt(zmp2)), 10. new_answers = (# 11. idnt := idnt(answers(db)) WITH [ans := TRUE], 12. user_id := user_id(answers(db)) WITH [ans := usr], 13. proposal_id := proposal_id(answers(db)) WITH [ans := prop], 14. ok := ok(answers(db)) WITH [ans := ok] 15. #), 16. db1 = (COND member(ans, idnt(answers(db))) -> db, 17. NOT member(usr, idnt(users(db))) -> db, 18. NOT member(prop, idnt(proposals(db))) -> db, 19. cond1 -> db, 20. cond2 -> db, 21. cond3 -> db, 22. ELSE -> db WITH [answers := new_answers] 23. ENDCOND) 24. IN db1
Listing 23. The PVS model of modified insert operation.
The trigger is, as previously, merged (lines 6-9, 20) into the model of the
underlying operation. The code also contains a check for colliding meetings (lines 3-5),
similar to the one described in the previous subsection. It can be seen that the
constraints are introduced into the model the same way regardless of the operation: a
corresponding condition is stated and the operation takes place only if no constraint is
violated.
4. Verifying properties
In previous chapter we have presented how to construct a formal model that
corresponds to the database schema. In order to prove that the system behaves correctly
we need to state theorems that correspond to different properties and prove them in
PVS. For example, one property of the system states that users cannot agree to
proposals that collide with other meetings they attend or host. In order to verify this
property several theorems have to be constructed and proven. The one presented in
Listing 24Listing 24 states that inserting a positive answer that conflicts with some
other meeting is not possible. In other words, if there were no answers that violate this
constraint before the insertion, the operation itself does preserve this property.
1. insert_answer_th9: THEOREM 2. FORALL (db: database_type, ans: answerId, usr: userId, 3. prop: proposalId, ok: bool): 4. LET new_db = insert_answer(db, ans, usr, prop, ok) 5. IN (EXISTS (p: proposalId): member(p, idnt(proposals(db))) 6. AND 7. user_id(meetings(db))(meeting_id(proposals(db))(p)) = usr 8. AND 9. confirmed(proposals(db))(p) = TRUE 10. AND
17
11. starts(proposals(db))(prop) < starts(proposals(db))(p) + 12. duration(meetings(db))(meeting_id(proposals(db))(p)) 13. AND 14. starts(proposals(db))(prop) > starts(proposals(db))(p)) 15. AND 16. ok = TRUE 17. => new_db = db
Listing 24. The theorem that disallows inserting answers that conflict with other meetings.
This theorem was proven by using a basic (grind) strategy in PVS. In total four
similar theorems are needed to prove the property correct – however, all of them are
proven automatically by PVS.
Our system also disallows issuing an invitation to the host of the meeting. This
constraint can be ensured by two SQL triggers, one executed before inserting new
invitation and one before updating an existing invitation. In other words, the host not
being invited to a meeting is an invariant, in particular over the insert operation. By
proving this fact we gain the confidence that no invitation can be issued to the host of a
meeting. To accomplish this we first define a predicate that corresponds to the property,
as stated in Listing 24Listing 24.
1. not_inv_host(db: database_type): bool = 2. FORALL (i:invitationId): member(i, idnt(invitations(db))) => 3. user_id(invitations(db))(i) /= 4. user_id(meetings(db))(meeting_id(invitations(db))(id))
Listing 25. The predicate corresponding to a property of the system.
This predicate is used to form a theorem presented in Listing 26Listing 26. It
states that the host not being invited to any meeting is an invariant over the operation of
adding new invitations. In other words, adding new invitation preserves the property. In
the theory our model forms this theorem was proven simply by expanding the definition
of the predicate and then applying basic (grind) strategy in PVS.
1. insert_invitation_th7: THEOREM 2. FORALL (db:database_type, inv: invitationId, 3. usr: userId, meet: meetingId): 4. LET new_db = insert_invitation(db, inv, usr, meet) 5. IN 6. not_inv_host(db) => not_inv_host(new_db)
Listing 26. The theorem stating that the predicate is an invariant of the insert operation.
To ensure that there is no invitation issued to the host we need to prove that this
invariant holds also over the update operation. However, another property of our system
states that an invitation cannot be modified once issued. Therefore, this invariant is also
preserved by the update on the invitations table.
Similar theorems have been stated and proven for all other properties of the
system. The majority of theorems describe the desired behaviour of operations that can
be performed on database tables and is relatively simple to prove. They can be
considered as equivalent to unit tests in the SQL development. By proving them we
ensure that the operations behave as expected and yield desired results.
18
5. Conclusions and future work
By using our approach we have verified the database schema of Deseo Meeting
Scheduler with its structural and behavioural features. We have stated and proven 167
theorems and lemmas representing properties of the system and gained the confidence
that the system works according to the specification.
Main advantage of our formalisation method is that the theorems can be proven
with straightforward PVS proving strategies. Most of the theorems in the case study
were proven with (grind) command, and only a few required a more dedicated
approach. During the work we have noticed that the formation of a model and the
stating of the theorems is, to some degree, repetitive and possibly could be automated.
Our current research efforts are directed towards finding such generalisation scheme.
Acknowledgements
We would like to thank Prof. Iván Porres, Dr Viorel Preoteasa and M.Sc. Johannes
Eriksson for collaboration on the Deseo Meeting Scheduler application and for fruitful
discussions on the topics of formal verification and modelling.
References
1. Dijkstra Edsger, A constructive approach to the problem of program correctness. BIT
(1968).
2. van Emden M., Programming with verification conditions. In: IEEE Transactions on
Software Engineering, (1979).
3. Reynolds J., Programming with transition diagrams. In: Gries D., Programming
methodology, Springer-Verlag, Berlin (1978).
4. Back Ralph-Johan, Invariant based programs and their correctness. In: Biermann
W., Guiho G., Kodratoff Y., Automatic Program Construction Techniques, MacMillan
Publishing Company (1983).
5. Fowler Martin, Rice David, Patterns of Enterprise Application Architecture.
Addison-Weasley (2003).
6. PVS Verification and Specification System, http://pvs.csl.sri.com/
7. Back Ralph-Johan, Invariant based programming. In: Donatelli Susanna, Thiagarajan
P., Lecture Notes in Computer Science, Springer (2006).
8. Back Ralph-Johan, Eriksson Johannes, Myréen Magnus, Testing and Verifying
Invariant Based Programs in the SOCOS Environment. TUCS Technical Report (2006)
9. Choppella Venkatesh, Sengupta Arijit, Robertson Edward, Johnson Steven,
Preliminary Explorations in Specifying and Validating Entity-Relationship Models in
PVS. AFM, Atlanta, USA (2007)
10. Sheard T., Stemple D., Automatic verification of database transaction safety, ACM
Transactions on Database Systems, (1989), pp.322-368.
11. Pląska Marta, Development of Database Applications (M.Sc. Thesis). University of
Gdańsk, Gdańsk (2006).
12. PostgreSQL, http://www.postgresql.org
13. Wikipedia, http://en.wikipedia.org/wiki/SQL
19
14. Back Ralph, Software Construction by Stepwise Feature Introduction. In: ZB 02:
Proceedings of the 2nd International Conference of B and Z Users of Formal
Specification and Development in Z and B, Springer-Verlag (2002).
15. Kilov H., From semantic to object-oriented data modelling, Proceedings of the First
International Conference on Volume, (1990).
16. Kilov Haim, Business Specifications: The Key to Successful Software Engineering.
Prentice Hall (1998).
17. Back Ralph-Johan, Olszewski Mikołaj, Soriano Damián, Deseo Meeting Scheduler:
Towards Automated Verification of Database Integrity Constraints, Automated
Program Verification 2009. ETH Zurich, Rio Cuarto, Argentina (2009)
ISBN 978-952-12-2275-7
ISSN 1239-1891