chapter 3: the relational data model and relational databases

127
Chapter 3: The Relational Data Model and Relational Databases

Upload: others

Post on 12-Nov-2021

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 3: The Relational Data Model and Relational Databases

Chapter 3:

The Relational Data Model andRelational Databases

Page 2: Chapter 3: The Relational Data Model and Relational Databases

2

The Relational Model of Data

a

b

c

d

lm

ns

t

A B

e

abcde

l m n s t

A relation between sets A and B

A subset of A x B

Page 3: Chapter 3: The Relational Data Model and Relational Databases

3

The basis of the model is the concept of relation, as foundin mathematics, set theory, mathematical logic, in particular inpredicate logic

A model is given in terms of relations between elements of adomain

A relational schema contains the basic elements of a relationaldata model

The schema is application dependent

Page 4: Chapter 3: The Relational Data Model and Relational Databases

4A relational schema S contains:

The data domain D, which is a possibly infinite set

A finite collection of relations (relation names) R1, . . . , Rn

over D of finite and fixed arity

That is, for each relation name R ∈ S, its potential exten-sions will be subsets of Dk = D × · · · × D (k times) forsome natural number k that depends on R

R(·, . . . , ·) k arguments

An k-ary relation can be seen as a table with k columns

R · · · · · ·· · · · · · · ·· · · · · · · ·

↑←− k columns−→↑

Page 5: Chapter 3: The Relational Data Model and Relational Databases

5

A finite collection of attributes (attribute names) A1, . . . , Am

They are associated to the different relations to denotetheir arguments, or “columns”

They can be identified with/by unary relations (unary pred-icates, properties) over DThat is, they can be identified with subsets (sub-domains)of the domain D

R A · · · · · · C· · · · · · · ·· · · · · · · ·

A, . . . , C are attributes of relation R

Page 6: Chapter 3: The Relational Data Model and Relational Databases

6

Example: Schema S with

Domain D = {john, peter ,mary , ..., 1, 2, 3, 4, ....}Binary relation People(·, ·)Attributes for People, in this order: Name,Age

People Name Age· ·

No contents or extensions so far; the schema describes the struc-ture of the model

The schemas are domain/application dependent

We can see that attributes can be seen as subsets (to be) of thedomain

Page 7: Chapter 3: The Relational Data Model and Relational Databases

7

Two attributes, with different names, can have later the sameextensions (and still be different; that’s why treating them asfunctions is more precise)

Example: Schema with domain D = {john, peter ,mary , . . .}and relation

Manager Boss Subordinate· ·

Schemas can be filled with data in many different ways

A database instance D compatible with a given schema S isa collection of finite extensions for the relation names in theschema

Page 8: Chapter 3: The Relational Data Model and Relational Databases

8

Example: For the schema S with

Domain D = {john, peter,mary, ken, carol, steve, ...,1, 2, 3, 4, ....}

Binary relations People(·, ·), Manager(·, ·)Attributes for People, in this order: Name,Age

Attributes for Manager , in this order: Boss , Subordinate

This is an instance compatible with the schema:

D1:People Name Age

john 35mary 25ken 40

Manager Boss Subordinateken johnjohn mary

Page 9: Chapter 3: The Relational Data Model and Relational Databases

9

This is another compatible instance:

D2:People Name Age

mary 35mary 25peter 40

Manager Boss Subordinateken stevecarol stevejohn mary

The sub-domains for the attributes Boss and Subordinate arethe same, namely the subset {john, peter ,mary , ken, carol , steve, ...}of the database domain D

Page 10: Chapter 3: The Relational Data Model and Relational Databases

10

Example: (different notations for the same) The table

Account# Name Balance12345 Raoul 400,0034567 Rupert 354,6012338 Rumilde 1234,3034561 Sulema 34445,23

Accounts relation

is an instance of a relation between the attributes Acount#,Name, and Balance

Each attribute has an associated (sub)domain

Here, the account number 12345, the name Raoul and the nu-merical value 400,00 are mutually related through the relation

The schema of the relation is:Accounts(Account#, Name, Balance)

Page 11: Chapter 3: The Relational Data Model and Relational Databases

11

Bank Example: Some abbreviations

clientn = client namecladd = client addressclneigh = client neighborhood,branch = branch nameacc# = account number

Schema:

Deposit(branch,acc#,clientn,balance),

Client(clientn,cladd,clneigh)

Page 12: Chapter 3: The Relational Data Model and Relational Databases

12

An instance

Depositbranch acc# clientn balance

Carleton 101 Jim 500Downtown 215 Sandy 700Barrhaven 304 Alvin 1300

Clientclientn cladd neighclJim 101 Queensbury Barrhaven

Sandy 40 Stone NepeanHernandez 15 Laurier Downtown

Alvin 17 Clyde AltavistaJohn 89 Case Centrepoint

Page 13: Chapter 3: The Relational Data Model and Relational Databases

13

What is a right schema?

What about this one? A single universal relationBank(branch,acc#,clientn, balance,cladd,neighcl)

It depends on the application and other practical, DB oriented,issues

If a client has several accounts, there is redundancy of in-formation

This DB becomes unnecessarily large, and inconsistenciesbecome more likely to occur

If a client has an account, but no address, we have to usemore null values than desired

Null values are not easy to handle

Page 14: Chapter 3: The Relational Data Model and Relational Databases

14

We will come back to design issues later on ...

For the moment, this one seems to be a better schema:

Deposit(branch,acc#,clientn,balance)

Client(clientn,cladd,clneigh)

Consider the relation Deposit

We have 4 (sub)domains, D1,D2,D3,D4, one for each of its4 attributes, where they take values (branch names, accountnumbers, client names, balances)

Any row in the table (extension of the relation) is a 4-tuple(v1, v2, v3, v4) with

v1 ∈ D1, v2 ∈ D2, v3 ∈ D3, v4 ∈ D4

Page 15: Chapter 3: The Relational Data Model and Relational Databases

15

That is

Depositbranch acc# clientn balance

Carleton 101 Jim 500Downtown 215 Sandy 700Barrhaven 304 Alvin 1300

is a subset of D1 ×D2 ×D3 ×D4

Any instance of the relation Deposit will be a subset ofD1 ×D2 ×D3 ×D4

We use relation and table as synonymous, the same for tupleand row

If t is a tuple, and R is a relation (extension), then:

Page 16: Chapter 3: The Relational Data Model and Relational Databases

16

We can say that t ∈ R if the tuple belongs to the relationR (relation extensions are sets)

Let A be the name of an attribute in the nth column ofrelation R

If t ∈ R, then t[n] and t[A] denote the value of the at-tribute A in the tuple t

For example, if t denotes the first tuple in the table Deposit,then t[2] = t[acc#] = 101

Useful notation: Since the same attribute may appear in differenttables, we distinguish the occurrences of the attribute, by usingthe relation name followed by “.” as a prefix, e.g.

Deposit.acc# Deposit.clientn Client.clientn

Page 17: Chapter 3: The Relational Data Model and Relational Databases

17

Queries

For the instance on page 12, give me the addresses with balancesof the clients who have a balance higher than 600

Answer:40 Stone 70017 Clyde 1300

The answer is a set of tuples, a new relation (extension)

We can say that a query is a mapping that sends DB instancesto new DB instances (possible with a different schema)

Page 18: Chapter 3: The Relational Data Model and Relational Databases

18

Several issues:

How to specify a query?

How to write it?

In what language?

What is the precise meaning of a query?

How to compute the answer?

There are several query languages for RDBs

Some more used in practice than others

But those of a more theoretic nature are the basis for the mostused in practice

Page 19: Chapter 3: The Relational Data Model and Relational Databases

19

The distinction between declarative vs. procedural query lan-guages is always relevant

The former express what the user wants to obtain from thedatabase, the latter express a particular way to compute theanswer

Page 20: Chapter 3: The Relational Data Model and Relational Databases

20

Relational Algebra as a Query Language

Idea: Relations are sets (subsets of cartesian products) con-structed on top of other sets (domain or subdomains)

Query answers are new relations

Thus, in order to obtain new relations (e.g. query answers) doset-theoretic algebra on existing relations

Operate on sets and relations in order to obtain new sets orrelations

Page 21: Chapter 3: The Relational Data Model and Relational Databases

21

The Relational Algebra (RA)

Provides algebraic operations over relations that producenew relations

Operations based on set-theoretic operations

Some of those operations come directly from set theory

Others are specific, ad hoc, for the RA

The latter are applicable to relations (as opposed to setsin general)

Provides a procedural query language for RDBs (becauseit is based on explicit operations)

The RA is one of the strengths of the relational model

RA can be used to give a precise, set-theoretic semanticsto other query languages

Page 22: Chapter 3: The Relational Data Model and Relational Databases

22

Queries in RA:

It is possible to answer the query by applying a sequence ofalgebraic (relational) operations starting from the originaldatabase instance

Even if the RDBMS offers a different query language, e.g.a declarative one, a query will be compiled into a sequenceof algebraic operations on the DB

Page 23: Chapter 3: The Relational Data Model and Relational Databases

23

Summary of basic operations of RA:

Union and Intersection: R1 ∪ R2, R1 ∩ R2

Can be applied to similar relations, i.e. same arity (and datatypes), as normal sets

Difference: R1 � R2

Again, for similar relations, as normal sets

Product: R1 × R2

This is essentially the cartesian product of two relationstaken as normal sets

E.g. for R = {(a, b), (c, d)}, S = {(1, 2), (2, 3)}R × S = {(a, b, 1, 2), (a, b, 2, 3), (c, d, 1, 2), (c, d, 2, 3)}

Page 24: Chapter 3: The Relational Data Model and Relational Databases

24

D

D R1R2

D

D R2 R1 U R2

R1 R2

R1

D

D R1 R2

R1 \ R2

Page 25: Chapter 3: The Relational Data Model and Relational Databases

25

Projection: ΠAR(· · · , A, · · · ), i.e. the projection of rela-

tion R on attribute A

R

A

B

II RA

Here, A is one of the attributes of R

The projection could be on several attributes of R

This is a unary operation: takes one relation as input (theprevious ones are binary)

This is an operation special for relations

It deletes, ignores, projects out entire “columns” from arelation

Projects R over one (or several) “coordinates” (attributes)

Page 26: Chapter 3: The Relational Data Model and Relational Databases

26

It generates a new relation, with a subset of the attributes(columns)

Its logical counterpart is the existential quantification

For the relation in the figure:

ΠAR(A,B) = {a ∈ A | it exists b ∈ B such that

(a, b) ∈ R}

Page 27: Chapter 3: The Relational Data Model and Relational Databases

27

Selection: σ<condition>(R)

Unary operation, special for relations

Selects the tuples of the relation R that satisfy the condi-tion

The condition can be expressed in a (limited) logical lan-guage

It generates a new relation, with the same attributes, butpossibly fewer tuples (rows)

Page 28: Chapter 3: The Relational Data Model and Relational Databases

28

Join: R1 �� R2

A binary operator, essential in RA

It allows to compose two relations through the values incommon taken by a distinguished attribute that shared bythe two relations(or two different attributes but with same data type ordomain)

Similar to the operation of composition of two relations asseen in set theory: R◦S

It is essential to combine tables in natural way, without ap-pealing to the possibly large and computationally expensiveproduct of them

There are generalizations of this basic, natural join

Page 29: Chapter 3: The Relational Data Model and Relational Databases

29

Notice: There is no (set-theoretic) Complement operation in RA(as found in set theory)

D

D

R R = ???c

In principle, there could be, given that relations are sets, but

What is the “meaning” of the complement of a relation?

Actually, it could be infinite, because the DB domain is possiblyinfinite

The difference (�) is only a relative complement, relative a givenrelation

Page 30: Chapter 3: The Relational Data Model and Relational Databases

30

Depositbranch acc# clientn balance

Carleton 101 Jim 500Downtown 215 Sandy 700Barrhaven 304 Alvin 1300

Which could be the tuples that are not in Deposit?

Which of those make sense?

Page 31: Chapter 3: The Relational Data Model and Relational Databases

31

Examples:

Union: Two relations with the same schema

WINE1 W# GRAPE VINTAGE PERCENTAGE100 Volnay 1978 12.5110 Chablis 1979 12.0120 Sancerre 1980 12.5130 Tokay 1980 12.5

WINE2 W# GRAPE VINTAGE PERCENTAGE130 Tokay 1980 12.5140 Chenas 1981 12.7150 Volnay 1978 12.5

Page 32: Chapter 3: The Relational Data Model and Relational Databases

32

WINE3 W# GRAPE VINTAGE PERCENTAGE100 Volnay 1978 12.5110 Chablis 1979 12.0120 Sancerre 1980 12.5130 Tokay 1980 12.5140 Chenas 1981 12.7150 Volnay 1978 12.5

Similarly, there is the intersection⋂

of the two relations:

WINE4 W# GRAPE VINTAGE PERCENTAGE130 Tokay 1980 12.5

Page 33: Chapter 3: The Relational Data Model and Relational Databases

33

Difference: Two relations with the same schema

WINE1 W# GRAPE VINTAGE PERCENTAGE100 Volnay 1978 12.5110 Chablis 1979 12.0120 Sancerre 1980 12.5130 Tokay 1980 12.5

WINE2 W# GRAPE VINTAGE PERCENTAGE130 Tokay 1980 12.5140 Chenas 1981 12.7150 Volnay 1978 12.5

Page 34: Chapter 3: The Relational Data Model and Relational Databases

34

WINE4 W# GRAPE VINTAGE PERCENTAGE100 Volnay 1978 12.5110 Chablis 1979 12.0120 Sancerre 1980 12.5

It should be clear from these examples that the complement ofa table does not make much sense ...

Page 35: Chapter 3: The Relational Data Model and Relational Databases

35

Product: Two relations, not necessarily with same schema

GRAPE GRAPE AREA COUNTRYChenas Beaujolais FranceVolnay Bourgogne France

Chanturgues Auvergne France

×

YEAR VINTAGE QUALITY1979 Good1980 Average

Page 36: Chapter 3: The Relational Data Model and Relational Databases

36

G/Y GRAPE AREA COUNTRY VINTAGE QUALITY

Chenas Beaujolais France 1979 GoodChenas Beaujolais France 1980 AverageVolnay Bourgogne France 1979 GoodVolnay Bourgogne France 1980 Average

Chanturgues Auvergne France 1979 GoodChanturgues Auvergne France 1980 Average

A huge table; maybe many of the combinations do not makemuch sense

The product is an expensive operation we may want to avoid,or apply only after we have reached smaller tables using otheroperations ...

Usually it makes more sense from the application point of viewto combine tables via a join

Page 37: Chapter 3: The Relational Data Model and Relational Databases

37

Join:

First the natural Join (there are more general ones)

Essential binary operator of RA

Relations are composed via the values in common taken by at-tributes in common (or of a similar data type)

Page 38: Chapter 3: The Relational Data Model and Relational Databases

38

WINE W# GRAPE VINTAGE QUALITY100 Chenas 1977 Good200 Chenas 1980 Excellent300 Chablis 1977 Good400 Chablis 1978 Bad500 Volnay 1980 Average

LOCATION GRAPE AREA AVG-QUALITYChenas Beaujolais GoodChablis Bourgogne AverageChablis California Bad

Page 39: Chapter 3: The Relational Data Model and Relational Databases

39

W/L W# GRAPE VINTAGE QUALITY AREA AVG-QUAL

100 Chenas 1977 Good Beaujolais Good200 Chenas 1980 Excellent Beaujolais Good300 Chablis 1977 Good Bourgogne Average300 Chablis 1977 Good California Bad400 Chablis 1978 Bad Bourgogne Average400 Chablis 1978 Bad California Bad

This is a common but expensive operation in RDBS

In general one applies it once tables have been reduced usingother operations

The intersection, difference, selection and projection all reducerelations

In (syntactic) query optimization, sequences of operations arerearranged to make the whole evaluation less expensive

Page 40: Chapter 3: The Relational Data Model and Relational Databases

40

Join operations can be more general than this basic one, ac-tually joins can be performed considering more complex “joinconditions”

Here, in formal terms, the simple condition can be specified withthe join:

WINE �WINE.GRAPE=LOCATION.GRAPE

LOCATION

The join is performed under the condition that the values in theGRAPE attribute in the two tables coincide

Page 41: Chapter 3: The Relational Data Model and Relational Databases

41

Projection:

WINE W# GRAPE VINTAGE PERCENTAGE QUALITY

100 Volnay 1979 12.7 Good110 Chablis 1980 11.8 Average120 Tokay 1981 12.1 Excellent130 Chenas 1979 12.0 Good140 Volnay 1980 11.9 Average

ΠVINTAGE,QUALITY

YEAR VINTAGE QUALITY1979 Good1980 Average1981 Excellent1979 Good1980 Average

A unary operator

Giving a new nameto the result is notpart of the operation

(but helps)

Page 42: Chapter 3: The Relational Data Model and Relational Databases

42

A tuple t is in the result (the projection) iff there is a tuple t′ inthe original relation that, restricted to the attributes indicatedin Π gives t:

t′[VINTAGE, QUALITY] = t

In other words:(1979,Good) ∈ Π

VINTAGE,QUALITY(WINE )

because there there exist values, say x, y, z for attributesW#,GRAPE , PERCENTAGE (we do not care whichones) such that the tuple (x, y, 1979, z,Good) belongs to re-lation WINE

Page 43: Chapter 3: The Relational Data Model and Relational Databases

43

Selection:

WINE W# GRAPE VINTAGE PERCENTAGE QUALITY

100 Volnay 1979 12.7 Good110 Chablis 1980 11.8 Average120 Tokay 1981 12.1 Excellent130 Chenas 1979 12.0 Good140 Volnay 1980 11.9 Average

σQUALITY=Good

GOOD-WINE W# GRAPE VINTAGE PERCENTAGE QUALITY

100 Volnay 1979 12.7 Good130 Chenas 1979 12.0 Good

Here the condition is very simple

Page 44: Chapter 3: The Relational Data Model and Relational Databases

44

It is possible to express more complex selection conditions usinga more expressive language that may use

Attribute names

Logical, boolean (propositional) operations (AND ,OR,NOT )

Built-in relations (=, <,≤, >,≥, �=) applied to attributenames and domain elements

Built-in relations have a fixed semantics, and fixed andpossibly infinite extensions

As opposed to relations in the schema that have variableextensions depending on the application and the state ofthe DB

E.g. the < built-in relation on the data type integer has aninfinite, fixed extension that the DBMS can simple use

Page 45: Chapter 3: The Relational Data Model and Relational Databases

45

< Smaller Bigger0 10 2· · · · · ·1000 1500· · · · · ·

�= String Stringjohn peterpeter mary· · · · · ·

mary john· · · · · ·

So a selection could be

σVINTAGE>1980 OR QUALITY=Good

(WINE )

This boolean language for expressing conditions can be used toextend the join operations with conditions �

<condition>, so as we

can express selections with complex conditions σ<condition>

WINE �W.GRAPE=L.GRAPE AND QUALITY=AVG-QUALITY

LOCATION

Page 46: Chapter 3: The Relational Data Model and Relational Databases

46

Queries Expressed in RA

A query can be expressed as a sequence of operations of RAapplied to the original tables and/or intermediate results

Example: Consider the schemas

DRINKER DRINKER# SURNAME FNAME TYPE

DRINKS DRINKER# WINE# DATE QUANTITY

WINE WINE# GRAPE VINTAGE PERCENTAGE

Page 47: Chapter 3: The Relational Data Model and Relational Databases

47

Query 1: Obtain the percentages of alcohol in the wines ofgrape Morgon, vintage 1979

Answer 1:

R1 := σGRAPE=Morgon

(WINE )

R2 := σVINTAGE=1979

(WINE )

R3 := R1 ∩ R2

ANS := ΠPERCENTAGE

(R3)

Answer 2: (same values)

ANS = ΠPERCENTAGE

(σGRAPE=Morgon AND VINTAGE=1979

(WINE ))

Notice the correspondence between the set-theoretic and logicaloperations ...

Page 48: Chapter 3: The Relational Data Model and Relational Databases

48

Query 2: Obtain last and first names of drinkers of Morgon orChenas

Now we need to combine the three original tables

R1 := σGRAPE=Morgon

(WINE )

R2 := σGRAPE=Chenas

(WINE )

R3 := R1 ∪ R2

R4 := R3 �WINE#

DRINKS (R3 is smaller than WINE)

R5 := R4 �DRINKER#

DRINKER

ANS := ΠSURNAME,FNAME

(R5)

Notice that we selected before the join, which becomes smaller

The other way around would have been semantically the same

Page 49: Chapter 3: The Relational Data Model and Relational Databases

49

Query 3: Obtain last and first names of drinkers who havetried in one day more than 10 samples of Chablis, vintage 1976,together with the percentage of alcohol of the wine

R1 := σQUANTITY >10

(DRINKS )

R2 := σGRAPE=Chablis

(WINE )

R3 := σVINTAGE=1976

(WINE )

R4 := R2 ∩ R3

R5 := R1 �WINE#

R4

R6 := ΠDRINKER#,PERCENTAGE

(R5)

R7 := R6 �DRINKER#

DRINKER

ANS = ΠSURNAME,FNAME,PERCENTAGE

(R7)

Page 50: Chapter 3: The Relational Data Model and Relational Databases

50

Warning!:

RA is based on set-theoretic operations, i.e. that take and pro-duce sets

In consequence, “duplicates” (multiple occurrences of the sametuple) do not appear anywhere

It is possible to extend these operations to “multi-sets” thatmay have duplicates (we do not do this for the moment though)

Exercise: Illustrate the computations of queries 1-3 using con-crete initial instances and producing all the intermediate rela-tions that lead to the final answer

Page 51: Chapter 3: The Relational Data Model and Relational Databases

51

Exercise: Assume we have the following schema

Frequents(Drinker,Bar) Serves(Bar,Beer)

Likes(Drinker,Beer)

Express in RA the following queries:

1. Which bars serve the beer John likes?

2. Which drinkers frequent at least one bar that serves somebeer they like?

3. Which drinkers frequent only bars that serve at least onebeer they like?

4. Which drinkers do not frequent any bar that serves somebeer they like?

Page 52: Chapter 3: The Relational Data Model and Relational Databases

52

Remarks, Extensions, Limitations

RA provides a procedural language for querying RDBs

We presented the most common relational operations, but thereare others (c.f. the textbook)

There may be many different RA expressions (formulas) thatcan be used to compute the same query answer

Which one to use depends on efficiency issues

Queries (computations thereof) can be optimized by rearrang-ing them into semantically equivalent RA formulas (i.e. samemeaning)

Space is always an issue; DBs can be very large and computa-tions take place in main memory

Page 53: Chapter 3: The Relational Data Model and Relational Databases

53

RDBMSs have built-in query optimizers that take care of opti-mizing the query

The notion of “semantic equivalence” of queries, in particular ofrelational expressions, is well-defined and precise: two RA queriesare equivalent if for every instance they produce the same result(i.e. same query instance)

Just like when we say that the (numerical) algebraic expressionsx + y + 0 and x(y + 1)− xy + y − 1 + 1 are equivalent: for anyvalues for x, y the result is the same

A strength of RA: the semantics of the language is clear, precise,formal and well-studied

It is grounded on set theory and predicate logic

Page 54: Chapter 3: The Relational Data Model and Relational Databases

54

There is a purely “logical counterpart” to the RA

The relational calculus is a declarative query language that isbased directly on predicate logic (we saw informal examples be-fore)

Relational algebra and relational calculus are equivalent in termsof the queries they can express (more on this later)

It is possible to define new operations for the RA using algebraicexpressions that use the already defined operations (and nothingmore)

Page 55: Chapter 3: The Relational Data Model and Relational Databases

55

Example: We could define the “symmetric difference” of twosimilar relations

R1 ∆ R2 := (R1 � R2) ∪ (R2 � R1)

D

D R2 R1 R2R1

Notice that the new operation (∆) on the LHS is being definedby means of a fixed algebraic formula that uses already definedoperations (�,∪)

A single formula or definition that can be applied to any instance,i.e. the definition is independent from the instance to which isis applied

Page 56: Chapter 3: The Relational Data Model and Relational Databases

56

Two relations with the same attributes

WINE1 W# GRAPE VINTAGE PERCENTAGE

100 Volnay 1978 12.5110 Chablis 1979 12.0120 Sancerre 1980 12.5130 Tokay 1980 12.5

WINE2 W# GRAPE VINTAGE PERCENTAGE

130 Tokay 1980 12.5140 Chenas 1981 12.7

WINE5 W# GRAPE VINTAGE PERCENTAGE

100 Volnay 1978 12.5110 Chablis 1979 12.0120 Sancerre 1980 12.5140 Chenas 1981 12.7

Page 57: Chapter 3: The Relational Data Model and Relational Databases

57

Exercise: Invent and define new operations for the RA

Actually, we haven’t been very economical when we listed thebasic RA operations

Some of them could have been defined in terms of the others, sothey are theoretically redundant (but not necessarily practicallyredundant)

Exercise: Define the join in terms of the product, the selectionand the projection

Page 58: Chapter 3: The Relational Data Model and Relational Databases

58

The Transitive Closure

Example: Paternity Father Son

Eric LuisEric JuanJuan CarlosJuan SergioLuis TomasTomas Pedro

We want to define and com-pute a new relation Ancestrythat contains all (and only) thetuples that can be obtained bytransitive paternity, i.e.

Ancestry Ancestor DescendantEric LuisEric JuanJuan CarlosJuan SergioLuis TomasTomas PedroEric TomasEric CarlosEric SergioLuis PedroEric Pedro

Page 59: Chapter 3: The Relational Data Model and Relational Databases

59

Ancestry is the transitive closure of Paternity, i.e. the smallesttransitive relation that includes Paternity

The transitive closure of a relation is something we use, com-pute, and need all the time

Computation? An iterative procedures computes it

Ancestry Ancestor DescendantEric LuisEric Juan

step 0 Juan CarlosJuan SergioLuis TomasTomas PedroEric Tomas

step 1 Eric CarlosEric SergioLuis Pedro

step 2 Eric Pedro

The length of the iteration de-pends on the initial instance; itis not bounded a priori

Page 60: Chapter 3: The Relational Data Model and Relational Databases

60

Can we define the transitive closure using a general and fixedformula of RA?

TC (R) := · · ·With a formula on the RHS in terms of the operations of RAwe saw (and nothing else)?

In particular, not depending on the instance at hand ...

Page 61: Chapter 3: The Relational Data Model and Relational Databases

61

It can be mathematically proved that it is not possible to definethe TC of a relation by means of a fixed and general formula ofRA

The TC is not part of the RA, and cannot be defined by meansof a fixed formula that uses (the other) relational operators

The theorem is easier to state and prove in the “logical coun-terpart” of the RA, i.e. in the relational calculus

This is a result about the (limited) expressive power of a partic-ular query language: there are things that cannot be expressedin it

Page 62: Chapter 3: The Relational Data Model and Relational Databases

62

In order to compute the TC from a RDB, an iterative procedurecan be programmed in interaction with the DB

Ideally, a query language provided by a RDBMS should offer thepossibility to define and express the TC

The newer SQL standard (SQL99) supports this

(We will come back in the context of (extended) logical query languages ...)

Page 63: Chapter 3: The Relational Data Model and Relational Databases

63

Integrity Constraints

So far we have no way to capture requirements or conditionsthat our DB model (and DB) should satisfy in order to:

Be an accurate model of the outside reality being modeled(c.f. instance in page 9, two ages for mary ...)

Impose or contain more meaning, more semantics wrt themodeled domain

Stay in correspondence with the modeled domain

To make sure that when the data changes, the meaningand correspondence are kept

Page 64: Chapter 3: The Relational Data Model and Relational Databases

64

O utsideReality

DB

ICs

ICs are statements (sentences, propositions, ...) that have to besatisfied in every stable, valid, legal state of the database

It is not difficult to express semantic or integrity constraints(ICs) in languages of predicate logic

They can be expressed as formal, symbolic sentences in thoselanguages

Page 65: Chapter 3: The Relational Data Model and Relational Databases

65

People Name Age Degreemary 35 lawmary 25 medicinepeter 40 CSpeter 40 math

∀x∀y∀z∀u∀v(People(x, y, z) ∧ People(x, u, v) → y = u)

The instance above is not admissible if the IC is to be satisfied,because it does not satisfy the IC

The database instance is inconsistent, in the sense that it doesnot satisfy (does not make true) the ICs

Page 66: Chapter 3: The Relational Data Model and Relational Databases

66

In principle, ICs expressed in such languages could be processedby a DBMS

And the DBMS could make sure that the actual DB instancedoes satisfy them

Page 67: Chapter 3: The Relational Data Model and Relational Databases

67

How?

Rejecting changes (updates) that violates them

Compensating with additional, internal, automatic updates,those updates issued by user or applications programs

Notifying the user or applications about violations of ICsbefore committing changes

...

However, to privilege efficiency, those mechanisms are not alwaysimplemented or offered in/by DBMSs

In general, commercial DBMSs provide automated, built-in sup-port only for some restricted, limited classes of ICs

Page 68: Chapter 3: The Relational Data Model and Relational Databases

68

Thus, in some cases the user has to find alternatives

Maintaining the IC satisfied (DB maintenance) through ex-ternal application programs that interact with the DB

The external program could issue a query to the databasein order to detect if there is a violation of the IC

Depending on the answer returned by the DBMS, the ap-plication program has alternatives on how to proceed

Which could be the query to detect if there is a violationof the IC in page 66?

Page 69: Chapter 3: The Relational Data Model and Relational Databases

69

An external program that interacts with the DBMS, cancreate a view that captures the violations of the IC

For example, the the following “violation view”

V (x) : ∃y∃z∃v∃w(Person(x, y, v) ∧ Person(x, z, w) ∧ y �= z)

This view will contain all the names that have more thatone age

It is expected that the contents of the view is always empty

If not, there is a violation, and the application program cando something about that

In the example, mary would be caught by the view

Page 70: Chapter 3: The Relational Data Model and Relational Databases

70

Defining triggers in the DB to be stored in the DB

Triggers (aka. active rules) react automatically when a vi-olation occurs:

• notification messages

• rejection of updates

• additional compensating updates as programmed inthe trigger

• ...

They are of the form: Event & Condition ⇒ Action

Page 71: Chapter 3: The Relational Data Model and Relational Databases

71

For the IC in page 66, the Event could be an update oftable Person, like an insertion (deletions of tuples are notrelevant for the FD)

The Condition could be a check of the occurrence of sometuple in the violation view

The Action does something to restore consistency, e.g.delete the old conflicting tuple and accept the new one

Crossing fingers ...

Page 72: Chapter 3: The Relational Data Model and Relational Databases

72

Some Classes of ICs

In some cases, a subset of the attributes functionally determines(or is expected to determine) another subset of the attributes

Example:Students Number Name Study Sport

9901254 John Stanley CS Soccer9910803 Sue Jones Math Skating9910803 Sue Jones Math Soccer9901254 John Stanley Literature Handball

“Every student number is associated to at most one studentname” or “Every two students that coincide in student numbercoincide in student name”

Number functionally determines Name; denotedStudents : Number → Name (not logical implication)

Page 73: Chapter 3: The Relational Data Model and Relational Databases

73

2. Key Dependencies (constraints): A particular case of func-tional dependency, where a subset of the attributes functionallydetermines all the attributes in the relation

Students Number Name Study Sport9901254 John Stanley CS Soccer9910803 Sue Jones Math Skating

“Student number determines all the other attributes of the stu-dent”

Number is a key of the relation, i.e.Students : Number → {Number ,Name, Study , Sport}Satisfied by this instance, but not by the previous one

Page 74: Chapter 3: The Relational Data Model and Relational Databases

74

Example:

WINE GRAPE VINTAGE VINEYARD QUALITYChenas 1977 Laphite GoodChenas 1980 Mouton ExcellentChablis 1977 Rotschild GoodChablis 1978 Crepeau BadVolnay 1980 Satie Average

The set of attributes {GRAPE, VINTAGE, VINEYARD} formsa key

{GRAPE, VINTAGE, VINEYARD} → QUALITY

(If the relation name is clear from the context, we omit it)

Page 75: Chapter 3: The Relational Data Model and Relational Databases

753. Range Constraints: They restrict the values that can be takenby some attributes

Example: “A CEO cannot make less that 80,000 per year”,“An employee must be over 18”

Both satisfied by

Employee Name Position Salary Age

john clerk 40 K 35mary CEO 100 K 45ken accountant 60 K 40

But none of the two by

Employee Name Position Salary Age

john clerk 40 K 35mary CEO 70 K 45ken accountant 60 K 40carol programmer 90 K 16

In predicate logic: ∀wxyz(Employee(w, x, y, z) → z > 18)(exercise: express the other)

Page 76: Chapter 3: The Relational Data Model and Relational Databases

76

4. NOT NULL Constraints: They restrict the values taken by someattributes to be non NULL

A NULL value is used in databases to represent missing, un-known, non applicable, ...., information (there are different, most-ly informal, semantics for NULL values)

Emp Name Posit Sal Age

john clerk 40 K 35mary CEO 85 K NULLken account. 60 K 40carol NULL 90 K 19

This instance satisfies“Name cannot beNULL”, but not “Agecannot be NULL”

Normally when a key constraint declares that a set of attributesis a key, it is also (compulsory) required that its attributes cannotbe NULL

Key constraints and NON NULL constraints go together

Page 77: Chapter 3: The Relational Data Model and Relational Databases

77

If Name above has been declared a key (i.e. this IC is imposed),it should not take the value NULL

It has to do with the way NULL values are treated by the DBMS:it does not know if it represents a value that is equal or differentfrom the other certain (or null) values

Page 78: Chapter 3: The Relational Data Model and Relational Databases

78

5. Referential Constraints: They require that all the values ofsome attributes in a relation also appear in attributes of anotherrelation

The first relation refers to the second relation (which can bethough of as a relation containing official data)

Example: “Every student in relation UnivTeams must be aregistered student”

UnivTeams Number Team

9910803 Basketball... ...

Students Number Name Study

9901254 John Stanley CS9910803 Sue Jones Math9901254 John Stanley Literature

(official students table)

Page 79: Chapter 3: The Relational Data Model and Relational Databases

79

A referential IC: UnivTeams .Number refers to Students .Number

Also called an inclusion dependency: UnivTeams .Number isincluded in Students .Number :

UnivTeams .Number ⊆ Students .Number

(or UnivTeams [Number ] ⊆ Students [Number ])

This is not a full inclusion dependency in the sense that not allthe attributes of UnivTeams participate in the inclusion

The referring and referred attributes may have different names,e.g. we could have

UnivTeams [Number ] ⊆ Students [ID ]

(as long as the data types match)

Page 80: Chapter 3: The Relational Data Model and Relational Databases

80

In the language of predicate logic:

∀x∀y(UnivTeams(x, y) → ∃z∃wStudents(x, z, w))

Page 81: Chapter 3: The Relational Data Model and Relational Databases

81

5. Foreign Key Constraints: A combination of a referential con-straint and a key constraint

In addition to the referential constraint, it is required that thereferred attributes in the second relation form a key for thatrelation

That is, if the referential IC requires

R[Ai1 , . . . , Aim ] ⊆ S[Bj1 , . . . , Bjm ],

then we also require that {Bj1 , . . . , Bjm} is a key for S

Page 82: Chapter 3: The Relational Data Model and Relational Databases

82Example: We want UnivTeams .Number to be a foreign key forrelation Students , i.e. that it refers to the attribute Students .Namethat is a key of Students

UnivTeams Number Team

9910803 Basketball... ...

Students Number Name Study

9901254 John Stanley CS9910803 Sue Jones Math9901254 John Stanley Literature

Inconsistent instance!

UnivTeams Number Team

9910803 Basketball... ...

Consistent instance!

Students Number Name Study

9901254 John Stanley CS9910803 Sue Jones Math99052454 Ken Scott Literature

(official students table)

Page 83: Chapter 3: The Relational Data Model and Relational Databases

83

Example: A referential IC

Loan(branchn,loan#,clientn,amount)

↓Branch(branchn,actives,branchNeigh)

Page 84: Chapter 3: The Relational Data Model and Relational Databases

84

Example: Foreign key constraints

DRINKER DRINKER# SURNAME FNAME TYPE

↑DRINKS DRINKER# WINE# DATE QUANTITY

↓WINE WINE# GRAPE VINTAGE PERCENTAGE

(Thinking in terms of the E/R model, the relation in the middleseems to come from a Relationship and the other two, fromEntities)

Page 85: Chapter 3: The Relational Data Model and Relational Databases

85

Final Remarks

There are several other classes of ICs for the relationalmodel

We presented those most common in practice

For most of them, commercial RDBMSs provide automaticsupport (database maintenance wrt them)

Integrity constrains become part of the relational databaseschema

Those declared in the schema are expected to be satisfiedby all the instances that are compatible with the schema

In that case we say that the database (instance, extension)is consistent wrt the declared ICs

We will see other kinds of ICs later on, in other contexts

Page 86: Chapter 3: The Relational Data Model and Relational Databases

Chapter 5:

Relational Algebra

and

Relational Calculus

Page 87: Chapter 3: The Relational Data Model and Relational Databases

2

Relational Calculus

The relational algebra is an algebraic and procedural query lan-guage for relational databases

A few examples have shown that it is also possible to use lan-guages from predicate logic to:

Pose queries to the database

Specify integrity constraints

Define views of the database

In general, express metadata, i.e. data about the data, i.e.about the structure and organization of data

Page 88: Chapter 3: The Relational Data Model and Relational Databases

3The “logical counterpart” of the RA is called the RelationalCalculus (RC), and it comes in two flavors:

The Tuple Calculus (TC): Basically the “atomic values”of data are complete tuples in relations

The language has variables to refer to tuples

The Domain Calculus (DC): The atomic values of dataare those taken by the attributes (i.e. columns) as opposedto whole rows

The reason for the name is that values taken by the at-tributes are drawn from the underlying database domain

Variables of the language refer to elements of (values in)the database domain

Page 89: Chapter 3: The Relational Data Model and Relational Databases

4

There are transformations between TC and DC, and basicallythe same can be expressed

Since DC is easier to explain and closer to classical predicatelogic, we will concentrate on the DC

RC is a declarative language to express queries (a query lan-guage), ICs, view definitions, etc.

We review some elements of predicate logic, at least those thatare relevant to RC:

Page 90: Chapter 3: The Relational Data Model and Relational Databases

5

We introduce a formal, symbolic, object, language to talkabout a database

So, we need some symbolic ingredients

First, we need symbolic names for the data items, i.e. forelements of the database domain DActually, we will use in the formal language of RC the samenames that we use in the metalanguage for the elementsof DSymbolic names for the relations (predicates) in the schema

It may be useful to introduce unary symbolic predicates for(domains of) the attributes A

Alternatively, we could have unary predicates to refer tothe subdomains D(A) of D (c.f. chapter 3 of these notes)

Page 91: Chapter 3: The Relational Data Model and Relational Databases

6Example: Consider the relational schema S with

Domain

D = {john, peter,mary, ken, carol, steve, ..., 0, 1, 2, 3, 4, ....}Binary relations People(·, ·), Manager(·, ·)Attributes for People, in this order: Name,Age

Attributes for Manager , in this order: Boss , Subordinate

This schema S has an associated language L(S) of predicatelogic, based on the following symbols:

Names for domain individuals, to denote them: john, peter ,,mary , ken, carol , steve, ...

Predicate symbols: People(·, ·), Manager(·, ·)Logical symbols: ¬,∧,∨,→,↔,∀, ∃An infinite but countable, official, set of variables: x1, x2, x3, . . .(sometimes we will use other variables)

Page 92: Chapter 3: The Relational Data Model and Relational Databases

7

Possibly of set of logical predicates (aka. evaluable, built-inpredicates): =, 6=, <, ...

They have a fixed interpretation (extension) given by thelogic and depending on the underlying domain (c.f. chapter3 of these notes), as opposed to those in the second list,that can have different interpretations (extensions)

Symbols for subdomain (attribute) predicates:Name(·),Age(·),Boss(·), Subordinate(·)These predicates for the domains of the attributes (or sub-domains of the domain) also have fixed interpretations (ex-tensions), in the sense that they depend on the domain, andnot on the relations of the database

The extension for Name is {john, peter,mary, ken, carol,steve, ...}, the same for Boss and Subordinate, but theextension of Age is {0, 1, 2, 3, . . .}

Page 93: Chapter 3: The Relational Data Model and Relational Databases

8Using these symbols it is possible to build formulas of the lan-guage L(S), e.g.

1. People(john, 35), P eople(john, mary), P eople(x3, 20),P eople(x3, x5), Age(35), Age(x10), john=mary,x2 =ken, 35 < 12, ken 6= john, ...

These are all atomic formulas: A predicate applied tonames and/or variables

2. More complex -non atomic- formulas:

a) People(john, 32) ∧ ¬People(mary, 23)

b) People(peter, x) → Age(x)

c) ∀x∀y∀z(Manager(x, y) ∧Manager(z, y)→x = z)

d) ∃x∀y(y 6= x → Manager(x, y))

e) Manager(peter, x) ∧ ∃y(People(x, y) ∧ y < 30)

f ) ∀x∀y(People(x, y) → Name(x) ∧ Age(y))

Page 94: Chapter 3: The Relational Data Model and Relational Databases

9

Some of these formulas do not have variables outside the scopeof a quantifier (∃,∀)They are called sentences, e.g. People(john, 35), and 2.(a),2.(c)

Notice that atomic sentences correspond to data in the database,i.e. tuples in tables

Sentence 2.(c) could be an integrity constraint; if it is imposedas an IC with the schema, it is expected to be true in the legalinstances of the schema

Sentence 2.(f) could be seen as a condition on the schema: thearguments for People are elements of Name and Age, resp.

Formula 2.(e) could be seen as a query: “Give me the values forx such that the condition on it becomes true” (in the instanceof the database at hand)

Page 95: Chapter 3: The Relational Data Model and Relational Databases

10

The notion of “being true” seems to be crucial here!

Formulas and sentences are purely symbolic objects, but theybecome true (or false) when they are interpreted

In predicate logic, formulas are interpreted in structures; anddatabase instances can be seen as structures

So, formulas can be interpreted in database instances, and theybecome true or false in them

The semantics of the RC languages is inherited from the seman-tics for languages of predicate logic:

Page 96: Chapter 3: The Relational Data Model and Relational Databases

11A database instance D for a relational schema S = {R, S, ...}can be seen as a finite structure D = 〈D, RD, SD, . . .〉, in thefollowing sense:

It has a domain, namely D (possibly infinite)

Example: D = {john, peter, . . . , 0, 1, . . .}The names for individuals in L(S) are interpreted by them-selves

Example: peter of the object language is interpreted as theelement peter of the domain

Every non-logical, domain dependent, predicate has a finiteextension (i.e. a finite number of tuples in the table)

Example:

PeopleD = {(john, 34), (peter, 37), . . . , (mary, 25)}ManagerD = {(john, peter), (peter,mary), . . . , (john, ken)}

Page 97: Chapter 3: The Relational Data Model and Relational Databases

12

Built-in predicates have fixed extensions, possibly infinite

Example:

• {(john, john), (peter, peter), . . .} are the tuples in theinterpretation of =

• {(john, peter), (peter,mary), . . .} are the tuples inthe interpretation of 6=

• < has the extension {(0, 1), (0, 2), . . . , (1, 2), (1, 3), . . .}

We can see that given the schema S and a fixed domain D,the different instances will differ only in the extensions of thenon-logical predicates, i.e. People, Manager

Page 98: Chapter 3: The Relational Data Model and Relational Databases

13

Now we can apply the classical definition of truth of symbolicformulas in structures (Alfred Tarski, early 30’s)

It is a recursive (inductive) definition that can be made preciseand general, but we illustrate it with examples

Given a schema S, a formula ϕ ∈ L(S), and a database instanceD compatible with S, we want to define when ϕ is true in D

Notice that ϕ could have free variables, then we need to indicatethe values in the domain that we assign to the free variables

Page 99: Chapter 3: The Relational Data Model and Relational Databases

14

Example: These are two instances compatible with the schema:

D1:People Name Age

john 35mary 25ken 40

Manager Boss Subordinateken johnjohn mary

D2:People Name Age

mary 35mary 25peter 40

Manager Boss Subordinateken stevecarol stevejohn mary

Page 100: Chapter 3: The Relational Data Model and Relational Databases

15

1. People(john, 35) is true in D1, but false in D2:

D1 |= People(john, 35), because (john, 35) ∈ PeopleD1 ,i.e. the tuple (john, 35) belongs to the extension of thepredicate in the DB

But D2 6|= People(john, 35) (the tuple does not belong tothe extension)

Actually, it is considered to be false by applying the ClosedWorld Assumption on Databases: The only true atomicknowledge is the one explicitly contained in the tables

2. D1 |= john = john, because (john, john) is in theextension for =

D1 |= john 6= mary, because (john,mary) is in theextension for 6=

D1 |= 5 < 40, because (5, 40) is in the extension for <

Page 101: Chapter 3: The Relational Data Model and Relational Databases

16

D1 6|= john 6= john, because (john, john) is notthe extension for 6=

3. People(mary, x) is true in D2 when x takes the value 25

D2 |= People(mary, x)[25]

But D2 6|= People(mary, x)[10]

4. Actually, for the values 35 or 25 for x, the formula (query?)People(mary, x) becomes true in D2

D2 |= People(mary, x)[25] and D2 |= People(mary, x)[35]

5. D2 |= ¬People(john, 35) by definition, becauseD2 6|=People(john, 35) (cf. 1. and use of CWA)

6. D1 |= (People(john, 35)∧Manager(ken, john)), by def-inition, because

D1 |= (People(john, 35) and D1 |= Manager(ken, john))

Page 102: Chapter 3: The Relational Data Model and Relational Databases

17

7. D2 |= (People(peter, 10)∨Manager(ken, steve) by defi-nition, because

D2 |= (People(peter, 10) or D2 |= Manager(ken, steve)

8. D2 |= ∃xManager(x, steve) by definition, because thereexists a value in the domain for x, namely x = ken, suchthat D2 |= Manager(x, steve)[ken]

9. D2 |= ∀y(People(mary, y) → y = 35 ∨ y = 25)by definition, because for all the values a ∈ D, i.e. in thedomain, it holds

D2 |= (People(mary, y) → y = 35 ∨ y = 25)[a]

the value for y ↑

E.g. D2 |= (People(mary, y) → y = 35 ∨ y = 25)[10]

E.g. D2 |= (People(mary, y) → y = 35 ∨ y = 25)[25]

Page 103: Chapter 3: The Relational Data Model and Relational Databases

18

With this recursive definition it is possible to evaluate the truthof any syntactically well-formed sentence or formula (providedwe give values to the free variables) in a database instance

Since the formulas are symbolic, with a precise syntax, and theextensions of the relevant relations are finite, all the “reason-able” formulas (c.f. later) can be evaluated by a computationalsystem, like a RDBMS

Notice that the evaluation of the truth of a formula is composi-tional: the truth of a formula is based on the truth (or not) ofits subformulas, which makes evaluation easier and clearer

A sentence (written in the logical language associated to theschema) can be algorithmically determined as true or false inthe DB

Page 104: Chapter 3: The Relational Data Model and Relational Databases

19

Exercise: Say in English what is expressed by the symbolic sen-tence

∀x∃y∀z(Manager(x, y) ∧ People(y, z) ∧ z > 30 →∃w(People(w, 25) ∧ ¬Manager(x, w)))

Determine if the following sentence is true in the instancesD1, D2 by using the inductive (recursive) definition of truth ina structure (database instance)

D1

?

|= ∀x∃y∀z(Manager(x, y)∧People(y, z)∧z > 30 → ∃w(People(w, 25)∧¬Manager(x, w)))

D2

?

|= ∀x∃y∀z(Manager(x, y)∧People(y, z)∧z > 30 → ∃w(People(w, 25)∧¬Manager(x, w)))

The sentence has to be true or false in D1, D2

Page 105: Chapter 3: The Relational Data Model and Relational Databases

20

For an instance D for a schema S and a formula of L(S) withfree variables, it is possible to algorithmically determine if thereare values in the database domain for those variables, so thatthe formula becomes true in D

If those values exist, they can also be determined algorithmicallyas a part of the same evaluation process

So, we can use the RC as a query language

This language (or family of relational languages) has a precise,clear, and well-studied semantics

Page 106: Chapter 3: The Relational Data Model and Relational Databases

21

Example: Pose and answer the following queries to the databaseinstances above

1. Return the managers who have subordinates that are youngerthan 27

∃y∃z(Manager(x, y) ∧ People(y, z) ∧ z < 27)

The answers are collected as the values for the only freevariable, namely x

D1

?

|= ∃y∃z(Manager(x, y) ∧ People(y, z) ∧ z < 27)

Answer: x = john, because

D1 |= ∃y∃z(Manager(x, y)∧People(y, z)∧z < 27)[john]

Notice how the join of the tables is captured through thevariable in common y above and the conjunction

Projection is captured by existential quantifiers; selectionsby conjuncts expressed in terms of built-in predicates

Page 107: Chapter 3: The Relational Data Model and Relational Databases

22

2. Return names and ages of the employees who are not aboss (of any people)

People(x, y) ∧ ∀z∀w(People(z, w) → ¬Manager(x, z))

Answers in D1: {(mary, 25)}Answers in D2: {(mary, 25), (mary, 35), (peter, 40)}Alternatively (but a different query though, with differentmeaning):

People(x, y) ∧ ¬∃zManager(x, z)

Exercise: Show that the two queries above have different mean-ing by providing an instance of S where the answers are different

Page 108: Chapter 3: The Relational Data Model and Relational Databases

23

RA vs. RC

In RC we can express all the operations of RA, i.e. we can de-fine by means of logical formulas the relations that result fromapplying the RA operations

We introduce in the logical language a new predicate Ans tocollect the result of the operation, and next we define it by asentence

Selection: σϕ(R(A1, . . . , An))

Here R is a relation and ϕ is a condition on the values ofthe attributes Ai

∀x1 · · · ∀xn(Ans(x1, . . . , xn) ←→ R(x1, . . . , xn) ∧ ϕ)

E.g. σA=aR(A,B) can be defined by

∀x∀y(Ans(x, y) ←→ R(x, y) ∧ x = a)

Page 109: Chapter 3: The Relational Data Model and Relational Databases

24

Intersection: R(A, B) ∩ S(A,B)

∀x∀y(Ans(x, y) ←→ R(x, y) ∧ S(x, y))

Instead of using the answer predicate, we can use the for-mula R(x, y) ∧ S(x, y), with free variables x, y, whereverneeded as a (sub)formula to capture the intersection

Union: R(A,B) ∪ S(A,B)

∀x∀y(Ans(x, y) ←→ R(x, y) ∨ S(x, y))

Projection: ΠA(R(A,B))

∀x(Ans(x) ←→ ∃yR(x, y))

Join: R(A,B) 1B=C S(C,D)

∀x∀y∀z(Ans(x, y, z) ←→ R(x, y) ∧ S(y, z))

Cartesian Product: R(A,B)× S(C,D)

∀x∀y∀z∀w(Ans(x, y, z, w) ←→ R(x, y) ∧ S(z, w))

Page 110: Chapter 3: The Relational Data Model and Relational Databases

25

Difference: R(A, B)r S(A,B)

∀x∀y(Ans(x, y) ←→ R(x, y) ∧ ¬S(x, y))

We can see that all the RA can be expressed in the RC

Thus complex RA expressions (RA queries) can be translatedinto declarative RC formulas (queries)

Actually, with the syntax and semantics of RC we could gobeyond ...

Example: Give me those who are not bosses

¬∃yManager(x, y) (*)

Who should be answers in, say D1?mary? susan? (with susan ∈ D)

Page 111: Chapter 3: The Relational Data Model and Relational Databases

26

As a formula, the query is O.K., its semantics as a logical formulais also O.K.

But as a DB query?

Notice that the “corresponding RA query”would be

(ΠSubordinate(Manager))c

We do not have complement in RA

We do have it in RC (or logic), but in DB we do not want touse it

We restrict ourselves to the so called domain independent orsafe queries (the “reasonable” queries mentioned before)

Page 112: Chapter 3: The Relational Data Model and Relational Databases

27

Those are the queries that can be evaluated without appealingto the whole -possibly infinite- underlying DB domain DDomain independent queries can be evaluated by concentratingon the active domain of the DB: the subset of D that containsthe data items that appear in some of the finite DB tables

activeDom(D1) = {john, 35,mary, 25, ken, 40, ken}The query 2. above (as formulated) is safe, because the nonbosses are found among the people; and the latter appear in atable

The RA difference is always safe: R(x, y) ∧ ¬S(x, y)because the answers are all among rows in table R

The query (*) is not domain independent

Page 113: Chapter 3: The Relational Data Model and Relational Databases

28

Finally, it can be proved that it is not possible to define thetransitive closure of a relation using the RC language

This is expected given the correspondence between the RC andthe RC

Actually, this impossibility result is usually proved in the contextof the RC, not directly for RA

So the RC has limited expressive power for some natural appli-cations

Any reasonable, well-behaved, and more expressive extensionsof the RC?

Page 114: Chapter 3: The Relational Data Model and Relational Databases

29

Other Uses of the RC Languages

The RC languages can be used for many purposes, not only forquery formulation

With the same advantages of having a language that is suitablefor computational processing, has clear syntax and semantics,and is highly expressive for DB purposes

1. Metadata:

We can express conditions expressed in a RC language on thestructure of the data

Example: ∀x∀y(People(x, y) → Name(x) ∧ Age(y))

This is saying that the values in attributes have to be taken inthe right subdomain

Page 115: Chapter 3: The Relational Data Model and Relational Databases

30

This opens the possibility of expressing more complex conditionson the data types for the different attributes

∀x∀y(People(x, y) → CharString(x) ∧ Integer(y)),

where CharString(·), Integer(·) are recognized by the system(built-in types)

Or even more complex:

∀x∀y(R(x, y) → Type1 (x) ∧ Type2 (y)),

where Type1(·), T ype2(·) are defined by means of additionallogical formulas

Conditions like these can be checked by the system

Page 116: Chapter 3: The Relational Data Model and Relational Databases

31

Metadata is crucial in many applications of databases today,because data is integrated from different databases

The integration is usually virtual: the data stay at their sources

Think of data sources integrated through the WWW

The metadata of each source provides (some) information aboutwhat is found in a data source and how

Because it is data about data ...

Page 117: Chapter 3: The Relational Data Model and Relational Databases

32

2. Integrity Constraints:

ICs can be expressed as sentences of a RC language

D

Employee Name Position Salary Age

john clerk 40 K 35mary CEO 100 K 45ken accountant 60 K 40

∀x∀y∀z∀w(Employee(x, y, z, w) ∧ y = CEO → Salary > 90K)

This range constraint is satisfied by the DB instance D:

D |= ∀x∀y∀z∀w(Employee(x, y, z, w) ∧ y = CEO → Salary > 90K)

Page 118: Chapter 3: The Relational Data Model and Relational Databases

33

ICs are also metadata

They embody knowledge (about the data) that can be used

For example, it can be publicized as or provided as a semanticlayer for a data source

In this way conveying “meaning” (semantics) about/of the datasource

For example, if different data sources about salaries of employeesare virtually integrated and we want to find those CEO who makeless that 80K, we do not have to search inside a data source thatis exposed to the outside world as satisfying the range constraintabove

Page 119: Chapter 3: The Relational Data Model and Relational Databases

34

We can also express functional dependencies, e.g.Employee : Name → Age

∀x∀y1∀y2∀z1∀z2∀w1∀w2(Employee(x, y1, z1, w1) ∧Employee(x, y2, z2, w2) → w1 = w2)

It is satisfied by D

It is easy to express that Name is a key of the relation

Write down an axiom like this for each of the attributes otherthat those in the key

Page 120: Chapter 3: The Relational Data Model and Relational Databases

35

UnivTeams Number Team

9910803 Basketball... ...

Students Number Name Study

9901254 John Stanley CS9910803 Sue Jones Math

Referential IC: UnivTeams .Number ⊆ Students .Number

∀x∀y∃z∃w(UnivTeams(x, y) → Students(x, z, w))

The corresponding foreign key constraint can be expressed bythe conjunction of this sentence plus the sentence that says thatNumber is a key of Students (do it!)

Page 121: Chapter 3: The Relational Data Model and Relational Databases

36

We can see that by using RC languages to express ICs:

It is clear what it means for a DB to satisfy an IC

We can express complex ICs, actually we can go muchbeyond what commercial RDBMSs support

ICs checking becomes machine processable

Being ICs syntactic objects, they can be in principle storedin the DB and used as extra knowledge about the domain

Knowledge that can be used for other purposes, e.g. queryoptimization, more precisely semantic query optimization

Example: Return students (numbers) that participate in ateam and are registered students

∃y∃z∃v(UnivTeams(x, y) ∧ Students(x, z, v))

If the system knows that the RIC is satisfied, no need togo to table Students

Page 122: Chapter 3: The Relational Data Model and Relational Databases

37

ICs should be checkable in the active domain of the database,so IC are expected to be domain independent sentences

Exercise: Check that all the ICs we have encountered so far aredomain independent

There is a syntactic characterization of the safe formulas, so inDB applications one restricts the RC to its safe portion

(To be precise, the class of safe formulas is a proper subset ofthe class of domain independent formulas, but safeness is goodenough for applications)

Example: ∀x∃yStudent(x, y) is a RC sentence, but is not do-main independent

Page 123: Chapter 3: The Relational Data Model and Relational Databases

38

V

R S

virtual table

3. View Definitions:

Views are (usually) virtual tables that “contain” data that comefrom other (usually material) base tables, those in the originalschema

Views can be defined using the RC: first introducing a namefor the view, i.e. a new predicate, and then a RC formula thatdefines it (as we did with the Ans predicate before)

Page 124: Chapter 3: The Relational Data Model and Relational Databases

39

D :

People Name Agejohn 35mary 25ken 40

Manager Boss Subordinateken johnjohn mary

The view that shows (“contains”) the bosses

∀x(Bosses(x) ←→ ∃yManager(x, y))

The “extension” of the view on D is Bosses(D) = {ken, john}The view containing “top bosses”

∀x(TopBoss(x) ↔ ∃yManager(x, y) ∧ ¬∃zManager(z, x))

TopBoss(D) = {ken}

Page 125: Chapter 3: The Relational Data Model and Relational Databases

40

Bosses with their ages:

∀x(BossAge(x, z) ↔ ∃yManager(x, y) ∧ People(x, z))

BossAge(D) = {(ken, 40), (john, 35)}Notice that views are defined by a query, the one on the RHS

So, a view is any relation (usually virtual) that is defined in termsof already existing relations (usually material tables) by using asuitable query language

Page 126: Chapter 3: The Relational Data Model and Relational Databases

41

We can see that the semantics of a view is clear

We have a rich, expressive language for defining views

It is easy to compute their extensions if wanted

Views are useful to represent different perspectives and/or usesof the data in the DB

They allow to combine data into new relations

They can be used also for security purposes: certain users mayhave access to certain views of the DB only

Problem: How to speed up updating of the view when baserelations change?

Page 127: Chapter 3: The Relational Data Model and Relational Databases

42

Views are very important today

The emphasis is on integration of data sources

Thin again of data sources used/accessed through the WWW

Actually virtual integration (data is not collected into a singleand huge physical repository)

View definitions provide a way to define correspondences, map-pings, semantic bridges between separate and autonomous datasources