© ron rogerson 1998-2010 slide 1 relational databases ron rogerson email [email protected]
TRANSCRIPT
© Ron Rogerson 1998-2010 Slide 2
Topics on this course
What is a database?and what is it used for?
What problems arise in managing one?
file based systemsdata dependence
What kind of system might resolve those problems?
the three level architecturefunctions of a dbmsfirm development principles - development cycle and conceptual modellingand . . one which uses . .
The relational approach - a theoretical architecture
the relational modelmanipulation with relational algebra
SQL - a practical implementation of relational theory
Bringing it together -developing a relational d/b with SQL
Other topics - introduced in Block 1 & returned to throughout course
background to data managementother kinds of information systemdevelopments in databases
© Ron Rogerson 1998-2010 Slide 3
What is a database?
Database: a collection of data stored in a computer system
Data: a representation of information (or, information as interpreted data):
Information can have >1 representation Data has no meaning in isolation (domain
of discourse) Computers process data, not information
Semantic properties of data User: a person whose information
requirements are being supported Sharing data - differing
requirements User Process
application process: single purpose database tool: general purpose
© Ron Rogerson 1998-2010 Slide 4
What problems arise in managing a
database? File-based approach
file organisation determines access ops. close association of program with file programs have to do it all (consistency,
relationships, access control etc) data likely to be duplicated resistent to change data dependent
Database approach DBMS, not programs, does access control,
manages storage/retrieval explicit single database definition - the
schema NOTE that, although we are still talking in
general about a structured database approach, some modern systems such as OO and XML databases do exhibit characteristics of the file-based approach
© Ron Rogerson 1998-2010 Slide 5
Dat
a R
etri
eval
in a
File
-Bas
ed S
yste
m
Tem
pla
te
No
of C
oupo
ns
Coupon 1
Coupon2
Coupon3
No
of G
uara
ntor
sSt
art
Sedo
lIs
suer
Issu
e D
ate
Gua
rant
or 1
Gua
rant
or 2
End
XX
XX
X
9
99
99
9X
X
XX
XX
XX
XX
XX
XX
XY
YY
Y
MM
DD
99
99
9
99
99
9X
XX
XX
XX
XX
XX
XX
XX
X
XX
XX
XX
XX
XX
XX
XX
XX
X
Dif
fere
nt te
mpl
ate
for
each
file
, 'ha
rd c
oded
' in
prog
ram
s
All
'link
s' b
etw
een
file
s do
ne b
y pr
ogra
ms
© Ron Rogerson 1998-2010 Slide 6
Actual systems architectures
Client-Server two-tier: interface & application on client,
d/b on server three-tier: interface, application and d/b
separate Client-Multiserver
multiple d/bs on separate servers each server providing specific data connection management required
Distributed dbms multiple d/bs on separate servers ddbms provides location independence,
security and integrity horizontal and/or vertical fragmentation two-phase commit controls updates replication
reduces network traffic improves availability always reduces consistency somewhat updates can be pushedor pulled
Mobile systems d/b fragments copied to intermittently
disconnected devices updated when possible
© Ron Rogerson 1998-2010 Slide 7
What kind of system might resolve these
problems?
- a system which provides the functions of a dbms
Data Definition Constraint Definition and
Enforcement Access Control Data Manipulation Restructuring and Reorganisation
restructure: a change to the design, e.g. adding a column or a table (logical schema)
reorganisation: a change within the design, e.g. to assimilate recently added records and optimise indices (storage schema).
Transaction Support Concurrency Support Recovery
© Ron Rogerson 1998-2010 Slide 8
What kind of system might resolve these
problems? (2)
a system which gives data independence
Logical data independence change to logical schema has no impact
on user processes Physical data independence
change to storage schema has no impact on user processes
ANSI/SPARC three-schema architecture
external logical storage
a system which provides interaction facilities
Data manipulation language /query language (eg SQL)
Host language; embedded statements Data definition language (eg SQL)
© Ron Rogerson 1998-2010 Slide 9
Stored d/b
Storage Schema
Logical Schema
External Schemas
User Processes
What kind of system might resolve these
problems? (3)The 3-schema architecture
© Ron Rogerson 1998-2010 Slide 10
What kind of system might resolve these
problems? (4)One developed using the
database development lifecycle:
Establishing data requirements Data analysis
produces conceptual data model Database design
produces logical schema (NB specific to db type)
Implementation produces storage schema et al (NB
specific to platform) Testing
may lead to iteration back to any of the earlier stages
One developed using a conceptual data model:
a formal representation of a data requirement, independent of how it may be realised
© Ron Rogerson 1998-2010 Slide 11
What kind of system might resolve these problems?
Conceptual data models Entity-Relationship model
entity types and entity occurrences attributes (identifier - special unique
attribute) Relationships
Degrees of relationships
Bus Driver
1 n A Bus can have many Drivers A Driver drives not more than one bus
Participation conditions
A bus may have no driver
A driver must be allocated to a bus
Bus Driver
Optional Mandatory
© Ron Rogerson 1998-2010 Slide 12
What kind of system might resolve these problems?
Conceptual data models (2)Entity typesBus(RegNo,NoOfSeats,Date1stReg)Driver(DriverNo,Surname,DateOfBirth)
Weak and Strong entity types
Constraint a statement of a necessary restriction that
cannot be expressed elsewhere in the model (e.g.only drivers over 21 may drive buses with more than 8 seats)
Assumption a statement of something that had to be
assumed in order to complete the model and needs pointing out (e.g. only a driver’s current bus allocation is shown)
© Ron Rogerson 1998-2010 Slide 13
m:n relationships
What kind of system might resolve these problems?
Conceptual data models (3)
Bus Driver
is this relationship possible?
what additional information needs to be recorded about occurrences of the relationship?
how might we record it? how does this change the Assumptions?
Recursive relationships
Driver
a driver supervises another driver
© Ron Rogerson 1998-2010 Slide 14
The relational approach - a theoretical architecture
Relation: can be pictured as a form of table, each row representing an occurrence.
Terminology column = attribute row = tuple no. of attributes = degree no. of tuples = cardinality identifier = primary key
Properties of Relations: all attributes have a value in every tuple all values are atomic all values of any attribute are same kind each attribute has name, unique within
relation each tuple is unique ordering of attributes& tuples not
significant.
Domain: a named set of values, with a common meaning, from which 1 or more attributes draw their actual values
NB values on different domains not comparable.
© Ron Rogerson 1998-2010 Slide 15
The relational approach - a theoretical architecture (2)
Candidate key: an attribute with the properties of uniqueness and minimality
implies a semantic constraint is either a primary or alternate key
Primary key: a candidate key chosen as the identifier
entity integrity rule declared in the schema definition
Alternate key: any other candidate key
declared in the schema definition
Qualified attribute names - dot notation; may be needed where same name occurs in >1 relation
© Ron Rogerson 1998-2010 Slide 16
Foreign key: an attribute (or combination of attributes) in a relation, whose values are the same as values of a candidate key (normally the primary key) of some (not necessarily distinct) relation.The only method of representing relationships
by default, represents an n:1 relationship always at the n “end” of a 1:n
(remember the “crow’s foot” pointer)
relationship can be represented by 'posting' primary key of the '1' end into the other relation – 'posted attribute' method
The relational approach - a theoretical architecture(3)
must always have value same as that of some value of the key it references (referential integrity rule)
semantic constraint
© Ron Rogerson 1998-2010 Slide 17
The relational approach - a theoretical architecture(4)
Bus Driver
Why can't the foreign key go at the “1” end?
Bus Driver RegNo Make Driver Id Name
ABC123 Scania 1,2 1 Brown DEF456 Volvo 3 2 SmithGHJ789 DAF 3 Bloggs
Foreignkey
Representing a 1:n using posted key
Bus Driver RegNo Make Id Name BusNo
ABC123 Scania 1 Brown ABC123DEF456 Volvo 2 Smith ABC123GHJ789 DAF 3 Bloggs DEF456
Foreignkey
© Ron Rogerson 1998-2010 Slide 18
The relational approach - a theoretical architecture(5)
Bus Driver
so . . .
Representing a 1:1 using posted key
Bus Driver RegNo Make Id Name BusNo
ABC123 Scania 1 Brown ABC123DEF456 Volvo 3 Bloggs DEF456GHJ789 DAF
this becomes an alternate
key as well as foreign
key
Bus Driver RegNo Make Id Name BusNo
ABC123 Scania 1 Brown ABC123DEF456 Volvo 2 Smith ABC123GHJ789 DAF 3 Bloggs DEF456
We can't have this
duplication
© Ron Rogerson 1998-2010 Slide 19
The relational approach - a theoretical architecture (6)
Recursive relationships m:n relationships cannot be
represented by 'posted attribute' why not?
relationships which are optional at the 'n' end cannot be represented, either
why not?
Deletions from a referenced relation:
Restricted effect cascade delete effect default effect
© Ron Rogerson 1998-2010 Slide 20
Representing m:n relationships
The relational approach - a theoretical architecture (7)
Bus DriverAllocated
Bus(RegNo,NoOfSeats,Date1stReg)
Driver(DriverNo,Surname,DateOfBirth)
Bus DriverAllocation
Allocation(RegNo,DriverNo,Date)
m:n decomposed into new intersection relation + two 1:n relationships
what new semantic constraint is imposed by the above choice of primary key?
how could we relax it? NB: conceptual - decompose to add info. relational - decompose to represent at all Rules for new intersection relations
(when used to decompose an m:n)
Participation conditions Degrees
New entity Both mandatory Both "n"
Old entities Same as before Both "1"
n
© Ron Rogerson 1998-2010 Slide 21
The relational approach - a theoretical architecture (8)
Representing optional relationships (relation for relationship method)
mandatoryl relationships can use this method too - much more complex than posted attribute, but treats all relationships same way
A BAB
A(a) B(b, a)becomes . .
A B
A(a) B(b)AB(b, a)
AB
becomes . .
A BAB
A(a) B(b, a) (whereB.a is an alternate key)
(note that the non-p.k. of AB must be declared alternate key)
A B
A(a) B(b)AB(b, a)
AB
AB(a, b)
n
N OTE that degrees of original relations “cross over” as they move to the new relation, and as with an m:n, the original relations become “1”
© Ron Rogerson 1998-2010 Slide 22
The relational approach - a theoretical architecture
Relational Algebra Operators act on whole relations
(set operators) Have closure property (produce
new relation) Relationally complete theoretical
basis for manipulation Operators reflect structure of
relations (and v.v.)
SELECTselect Allocation where RegNo = ‘R123ABC’
produces horizontal slicing of relation; i.e picks tuples according to some value(s) of attribute(s)
in this case, lists numbers of all drivers ever allocated to the bus with RegNo R123ABC
© Ron Rogerson 1998-2010 Slide 23
The relational approach - a theoretical architecture
Relational Algebra (2)
PROJECTproject Bus over NoOfSeatsproduces a “vertical slicing”; i.e., selects all the unique (combinations of) value(s) of chosen attribute(s)in this case, lists (once only in each case) the seating capacities of buses in the fleet
Combining expressions alias
DriversUnder21DriversUnder21 alias
(select Driver where DateOfBirth > 19710425)
project DriversUnder21 over Surname
nested
project (select Driver where DateOfBirth >19710425 ) over Surname
© Ron Rogerson 1998-2010 Slide 24
The relational approach - a theoretical architecture
Relational Algebra (3)JOIN (natural)join Driver and Allocation“pastes together” the tuples of the given relations which match on some attribute(s) with same name & domain – in this case, DriverNoin this case, produces a complete record of every driver allocation including, for each tuple, all the attributes from each table – but the joining column appears once onlyNOTE this means the new table will contain all the details of every driver once for every time s/he has been allocated to a bus
A join is over a shared attribute and that normally means a foreign key
A relation can be joined to itself (e.g. where there is a recursive relationship) (but must use aliases)
© Ron Rogerson 1998-2010 Slide 25
The relational approach - a theoretical architecture
Relational Algebra (4) DIVIDE
Allocations alias(project Allocation over DriverNo, RegNo)
DriverNo RegNo100 N456CDE101 N456CDE101 R123ABC
Buses alias (project Bus over RegNo)
RegNoN456CDER123ABC
divide Allocations by Buses over RegNoDriverNo101
(produces list of nos. of drivers who’ve been allocated to all buses)
© Ron Rogerson 1998-2010 Slide 26
The relational approach - a theoretical architecture
Relational Algebra (5)
UNION, INTERSECTION and DIFFERENCE require union-compatibility
each relation involved is such (or can be changed such) that the ith attribute of each is on the same domain and has the same name
UNION“adds” relations together
YoungDrivers alias
(select Driver where DateOfBirth >19770425)
OldDrivers alias
(select Driver where DateOfBirth < 19380426)
YoungDrivers union OldDrivers
© Ron Rogerson 1998-2010 Slide 27
The relational approach - a theoretical architecture
Relational Algebra (6)INTERSECTION
picks tuples of 2 relations which occur in both
NotUnder21 alias
(select Driver where DateOfBirth <19770426)
NotOver60 alias
(select Driver where DateOfBirth > 19380425)
NotUnder21 intersection NotOver60
NOTE that, in this case, the whole operation is logically equivalent to an “and”
OTHER OPERATORS theta-join Cartesian product outer join
CONSTRAINTS USING R. A.constraint (project Bus over DriverNo) difference (project Driver over DriverNo) is empty
© Ron Rogerson 1998-2010 Slide 28
Relational Algebra (7)
Bus Driver
Using relational algebra to represent mandatory participation at the referenced
end of a relationship
(project Bus over RegNo)constraint (
difference(project Driver over BusNo)
) is empty
Bus Driver RegNo Make Id Name BusNo
ABC123 Scania 1 Brown ABC123DEF456 Volvo 2 Smith ABC123GHJ789 DAF 3 Bloggs DEF456
Foreignkey
This bus has no driver
Bus Driver RegNo Make Id Name BusNo
ABC123 Scania 1 Brown ABC123DEF456 Volvo 2 Smith ABC123
3 Bloggs DEF456
© Ron Rogerson 1998-2010 Slide 29
The relational approach - a theoretical architecture
Relational Algebra (7)
Updating
Insertion:Driver:= Driver union <023,Smith,19460319>
Deletion:Driver:= Driver difference <023,Smith,19460319>
Amendment:Driver:= Driver difference <023,Smith,19660319>Driver:= Driver union <023,Smith,19660419>
© Ron Rogerson 1998-2010 Slide 30
The relational approach - a theoretical architecture
Normalisation (1)Normalisation aims to remove redundancy
avoids possible inconsistency avoids deletion/insertion anomalies reduces storage
In a normalised relation (i.e., one which is in BCNF) every non-p.k. attribute is a fact about the p.k., the whole p.k. and nothing but the p.k.
Single Valued Facts for every Girl there is (exactly) one Boy
Functional Dependencies Girl -> Boy Girl “determines” Boy note the E-R equivalent:
note that Girl -> Boy does not mean Boy -> Girl (we say an FD is "not reversible")
but what would the E-R diagram look like if we did additionally know that Boy -> Girl ?
Girl Boy
© Ron Rogerson 1998-2010 Slide 31
The relational approach - a theoretical architecture
Normalisation (2)
Derived FDs by transitivity:
Girl -> Boy Boy -> Boy’s_Mother hence, Girl -> Boy’s_Mother
quick check: are there any FDs given, whose "left hand" is the
same as the "right hand" of any other?
by augmentation and transitivity: Programme,StartTimeDate -> Announcer TVChannel, StartTimeDate -> Programme we can augment the second FD to:
TVChannel, StartTimeDate -> Programme, StartTimeDate therefore TVChannel,StartTimeDate -> Announcer
quick check: are there >1 FDs with combined attributes on the LH?
- if not, no augmentation can be done, but if so, do any of those have a RH which is part of the
LH of another? if not, no augmentation can be done, but
if so, can the RH of that first FD be augmented to make it the same as the LH of the other?
© Ron Rogerson 1998-2010 Slide 32
The relational approach - a theoretical architecture
Normalisation (3) First Normal Form (1NF)
A relation is in 1NF iff every non-p.k. attribute is functionally dependent on (i.e., is a fact about) the p.k.
Note that, anything which is a relation must at least be in 1NF
Second Normal Form (2NF) A relation is in 2NF iff it is in 1NF and every non-p.k.
attribute is fully functionally dependent on the p.k. (i.e., not on any subset of the p.k.)
Note that we are only interested in the dependency (or lack of it) between a non-p.k. attribute and the p.k., not in any other dependencies among the non-p.k. attributes.
Moving from 1NF to 2NF “Project out” any “offending” FDs into new relation(s) Determinant (l.h. side) of these FDs becomes p.k. of
new relation “R.h. side(s)” of these FDs becomes non-p.k.
attribute(s) of the new relation(s) and is/are removed from the “old” one
But the determinant remains in the “old” relation so that we have non-loss decomposition
“Projected-out” FDs which share the same determinant will go into a shared new relation
Process must be “non-loss”, i.e. the original relation could be recreated by “joining” the new ones.
© Ron Rogerson 1998-2010 Slide 33
The relational approach - a theoretical architecture
Normalisation (4)
Third Normal Form (3NF) A relation is in 3NF iff it is in 2NF and every non-
p.k. attribute is non-transitively dependent on the p.k. (take great care with definitions in the course material)
Note that a transitively-derived F.D. does not necessarily make the attribute transitively dependent i.e., where A -> B and B -> C, then A -> C is a
transitively derived dependency; but C is not transitively dependent on A if either B -> A or C -> B
Moving from 2NF to 3NF Process is just the same as 1NF to 2NF, except
that we “project out” the FD which is the “right hand” part of the complete transitive FD, i.e. the “B -> C” part
Boyce-Codd Normal Form (BCNF) A relation is in BCNF iff it is in 3NF and every
determinant is a candidate key Note that, unlike 2NF and 3NF, we are
interested in all FDs in the relation, not just those involving the p.k.
© Ron Rogerson 1998-2010 Slide 34
The relational approach - a theoretical architecture
Normalisation (5) Moving from 3NF to BCNF
Process is just the same as previous stages
If the determinant of a F.D. in the relation is the p.k., then that’s fine
If it’s not the p.k., then unless it’s an alternate key, the F.D. is an “offending” one
It can only be an alternate key if it has a 1:1 relationship with the p.k.; we will only know this if it has a “reversible” F.D. with the p.k., i.e. A -> B and B -> A
© Ron Rogerson 1998-2010 Slide 35
Relational model SQL Relational
implementation
Theoretical specification of what is to
be done
Specifies how relational
model is to be implemented - could be many but in practice
only SQL
Many implementations
of SQLexist; often they cover only a subset of
the standard, and may cover a
superset
3 level architecture
SQL schema
D/b schema
Manipulation languages:
the relational algebra and relational calculus
SQL (does not include a storage DDL)
Implementation of some version
of SQL plus further
command set
© Ron Rogerson 1998-2010 Slide 36
Architectures - Theoretical and Actual
Stored d/b
Storage Schema
Logical Schema
External Schemas
User Processes
Stored d/b
User Processes
Database Schema
Base table
Base table
Base table
View
ViewView
© Ron Rogerson 1998-2010 Slide 37
SQL - a practical implementation of relational
theory
1970s IBM developments - Ted Codd - SEQUEL
(Structured English Query Language) Many variants developed
1987 First ANSI standard - SQL:1987 ("SQL1" - also
known as ISO9075, BS6964:1988) includes a DDL and DML lacked many features of the model
1989 SQL:1989 - added p.k. and f.k. constraints.
1992 SQL:1992 ("SQL2") defines many features beyond SQL:1989 most implementations support it many also offer "superset" functions which may
add features but reduce portability
1998 SQL3 includes aspects of OO technology not covered in this course
NOTES: SQL does not define a storage DDL, nor some
other management functions; an implementation may do these any way it chooses
SQL databases consist of tables and columns
© Ron Rogerson 1998-2010 Slide 38
SQL - a practical implementation of relational
theory (2)
SELECT the query statement operates on tables all query statements produce a table logical processing model helps to explain how it
works
SELECT * FROM country “FROM" clause produces an intermediate table,
which is a full copy of the country table "SELECT *" produces a final table giving the result
- which in this case is also the full table
SELECT births, population FROM country "FROM" clause produces the intermediate table "SELECT" 'slices' this vertically into just the 2
columns required, which form the final table.
SELECT DISTINCT gdp FROM EUROBOND as above, but 'slices' vertically to produce only
one occurrence of each value compare with 'PROJECT' in the Relational Algebra
© Ron Rogerson 1998-2010 Slide 39
SQL - a practical implementation of
relational theory (3) VALUE EXPRESSIONS
(manipulating stored values) COLUMN:
SELECT name, (births/population)/1000 AS birth_rate_per_thousand FROM country
(NB can use + - * / (number), || (string) Columns must be suitable format)
SET: SELECT AVG(cars) FROM country (NB can use AVG, DISTINCT, COUNT(*),
SUM, MAX, MIN. Columns must be suitable format)
STRING FUNCTIONS: SELECT name, SUBSTR(name,1,3) FROM
country (NB also LENGTH, CAST, SUBSTRING, etc)
© Ron Rogerson 1998-2010 Slide 40
SQL - a practical implementation of relational
theory (4)The WHERE clause
(or, specifying a row search condition) SELECT staff_no
FROM staffWHERE name = 'Jennings’
Logical processing model for this query:FROM clause copies whole of STAFF into intermediate tableWHERE clause slices just the rows which meet it into a 2nd intermediate table
SELECT takes STAFF_NO into final table
SELECT name
((births-deaths)/population)*100
AS growth_rate
FROM country
WHERE
((((births-deaths)/population)*100
AS growth_rate
FROM country
WHERE
((births-deaths)/population)*100 > 0.5
(NB can use =, <, >, <=, >=, <> )
can use AND OR NOT (care needed with brackets)
© Ron Rogerson 1998-2010 Slide 41
SQL - a practical implementation of relational
theory (5)
Other Operators SELECT . . .
WHERE quantity BETWEEN 5000 AND 6000
(inclusive) WHERE name IN (‘Berlin’, ‘Bonn’, ... ,) WHERE classification LIKE ‘_h%s’ WHERE cars IS NULL
Joins using FROMSELECT s_country.name, capital, population
FROM s_country, s_cityWHERE capital = s_city.name(cf. the relational algebra JOIN)
(NB what kind of table does this FROM produce, in the logical processing model?)
Aliases SELECT p.staff_no, p.name, q.staff_no
FROM staff p, staff qWHERE p.name = q.nameAND p.staff_no < q.staff_no
(NB why < and not <> ?)
© Ron Rogerson 1998-2010 Slide 42
SQL - a practical implementation of relational
theory (6) Outer joinsSELECT student.student_id, name, phone_no
FROM student LEFT OUTER JOIN telephoneON student.student_id = telephone.student_id
(NB can use RIGHT OUTER, FULL OUTER)
Natural joinsSELECT student.student_id, name, phone_no
FROM student NATURAL JOIN telephone
GROUP BYSELECT product, COUNT(country),
SUM(quantity)FROM productionGROUP BY product
HAVING (is to groups what WHERE is to rows) SELECT product, COUNT(country),
SUM(quantity)FROM productionGROUP BY productHAVING SUM(quantity) > 15000
© Ron Rogerson 1998-2010 Slide 43
SQL - a practical implementation of relational
theory (7)
ORDER BY SELECT PRICEDATE, PRICE
FROM LUXPRICEWHERE LUXCODE = 123456AND CURRENCY = 'US'ORDER BY PRICEDATE DESC
(NB ASC is the default)
QUERY SEQUENCEStatement OrderSELECT . . .FROM . . .(WHERE . . .)(GROUP BY . . .)(HAVING . . .)(ORDER BY . . .)
Logical Processing ModelFROM . . .(WHERE . . .)(GROUP BY . . .)(HAVING . . .)SELECT . . .(ORDER BY . . .)
n
© Ron Rogerson 1998-2010 Slide 44
SQL - a practical implementation of
relational theory (8)
COMPOSITE QUERIES Note that SQL has no equivalent of joining
queries using 'alias' in the Relational Algebra. Most complex queries can be handled by the following methods.
UNION
SELECT country, yr, population
FROM population
WHERE country IN (‘Spain’,’Ireland’)UNION
SELECT name, 1990, population
FROM country
WHERE name in (‘Spain’, ‘Ireland’)
SUBQUERIES ('nested' queries)
Generally, a subquery is a query the result of which will be a single column. It is enclosed in brackets so that it can become part of the predicate of another query.
© Ron Rogerson 1998-2010 Slide 45
SQL - a practical implementation of relational
theory (9) Subqueries (cont’d)
Where its output will be more than 1 row, it must be used with the quantifiers ALL or ANY, (or the comparison operator IN):
SELECT nameFROM student
WHERE registered <= ALL(SELECT DISTINCT registered
FROM student)
Where its output will be exactly one row, it can be used thus:
SELECT countryFROM production
WHERE product = ‘Oats’
AND quantity < (SELECT AVG (quantity)
FROM production
WHERE product = ‘Oats’)
Joins v. subqueries JOIN if output needs data from both tables SUBQUERY if comparison with aggregate
function on 2nd table else can use either n
© Ron Rogerson 1998-2010 Slide 46
SQL - a practical implementation of relational
theory (10)
SUBQUERIES (cont'd)
In the logical processing model, a normal subquery is processed first.
Correlated Subqueriesa subquery that refers to the value of a column in the “current row” of the outer query (an “outer reference”).
SELECT country, yr
FROM population p
WHERE population >
(SELECT 0.2*SUM(q.population)
FROM population q
WHERE q.yr = p.yr)
a Correlated Subquery is processed once completely for every row of the “outer” query
© Ron Rogerson 1998-2010 Slide 47
SQL - a practical implementation of relational
theory (11)
DATA DEFINITION
CREATE TABLE small_country(name CHAR(16),gdp DECIMAL(4,1),cars INTEGER,population DECIMAL(6,1),PRIMARY KEY (name))
cars INTEGER NOT NULL DEFAULT 0
ALTER TABLE small_countryADD area INTEGER
ALTER TABLE small_countryDELETE population
ALTER TABLE small_countryMODIFY cars DEFAULT 0
ALTER TABLE small_countryALTER cars DROP DEFAULT
DROP TABLE small_country
© Ron Rogerson 1998-2010 Slide 48
SQL - a practical implementation of
relational theory (12) Constraints
PRIMARY KEY NOT NULL UNIQUE:
population DECIMAL(6,1) UNIQUE, orALTER TABLE small_country
ADD UNIQUE population REFERENTIAL:
counsellor_no CHAR(4) NOT NULL
REFERENCES staff {staff_no} {ON
DELETE (RESTRICT or SET DEFAULT or CASCADE)} , or
FOREIGN KEY (counsellor_no) REFERENCES
staff {staff_no} {ON DELETE . . } CHECK:
registered SMALLINT CHECK (registered between 1988 and 2010)
CHECK (region = (SELECT region FROM staff WHERE counsellor_no = staff_no))
DOMAIN:CREATE DOMAIN credit_points AS SMALLINT
NOT NULL DEFAULT 60 CHECK (VALUE IN (30, 60))
© Ron Rogerson 1998-2010 Slide 49
SQL - a practical implementation of
relational theory (13) VIEWSCREATE VIEW counselling2(s_name, s_no, region,
c_name, c_no) AS
SELECT s.name, student_id, s.region, c.name, counsellor_no
FROM student s, staff c
WHERE counsellor_no=staff_no
DROP VIEW counselling2
UPDATING
DELETE
DELETE FROM small_country WHERE name = "Yugoslavia"
INSERT
INSERT INTO small_country {column_names}
VALUES ('Slovenia', NULL, 157, 4325.1)
(can specify columns to be filled as an alternative to putting NULLs in the VALUES clause)
© Ron Rogerson 1998-2010 Slide 50
SQL - a practical implementation of
relational theory (14)UPDATING (cont’d)
UPDATE
UPDATE small_country
SET gdp = 17.3
WHERE name = 'Slovenia' INSERT INTO dba.staff
VALUES ('8086', 'Pratchett', 1)
UPDATING VIEWS
Can be updated, if, in the definition:
SELECT includes only column names (no value expressions) and no DISTINCT operator;
FROM only references one table;
WHERE does not include a subquery;
no GROUP BY and no HAVING;
© Ron Rogerson 1998-2010 Slide 51
SQL - a practical implementation of relational
theory (15) Access Control
GRANT SELECT {DELETE, INSERT, UPDATE, REFERENCES} ON staff TO admin, faculty
GRANT ALL PRIVILEGES ON mod_staff TO admin {with grant option}
GRANT UPDATE (name) ON mod_staff TO faculty
Restructuring: planning Main priciple: ensure data is not lost
e.g., CREATE temp. table with old structure
INSERT. . SELECT old data into it
DROP old table
CREATE table, new structure, old name
INSERT . . SELECT old data into it
DROP temporary table or ALTER table to add replacement
column
UPDATE table to copy data from old
column to new
DROP old column
© Ron Rogerson 1998-2010 Slide 52
Bringing it together -developing a relational
d/b with SQLSteps:
establishing requirements data analysis database design implementation
Desirable properties of a model completeness integrity flexibility efficiency usability
Modelling constructs entity types relationships attributes
identifiers complex values (separate entities/multiple
attributes) entities or attributes ? derived data
entity subtypes
© Ron Rogerson 1998-2010 Slide 53
Bringing it together -developing a relational
d/b with SQL (2)Constraints
inclusive exclusive
Developing a conceptual data model establishing requirements possible ambiguities the model:
“formal representation of what a d/b should contain, independent of how it should be realised”
should: represent all users’ requirements; have no duplication; include all constraints; be general; be understandable.
Data analysis to produce it establish scope of model text analysis:
list nouns as potential entity types discard those which:
are outside scope occur only once are synonyms are attributes relate to implementation details
list verbs as potential relationships re-scan to find constraints
© Ron Rogerson 1998-2010 Slide 54
Bringing it together -developing a relational
d/b with SQL (3)
Data analysis (cont'd) Document analysis
assuming a document represents an entity: what does each occurrence represent? what are properties / facts about entity type? for each property, is it
single- or multi- valued? optional or mandatory? derivable? temporal?
Produce initial E-R model add participation, constraints & assumptions eliminate redundancy, resolve m:n, examine
complex data, remove derived data, consider subtypes
check: 'read' model to try and reconsitute the
requirement check requirement to see if data & relationships
are correctly represented
© Ron Rogerson 1998-2010 Slide 55
Bringing it together -developing a relational
d/b with SQL (4)Database designThe final system may not be a direct
implementation of the model!Choices in directly representing model:
posted key or relation-for-relationship alternative constraint methods representing complex data representing entity sub-types
in implementation: defining columns
Numeric data types - range/precision Character data types - length Operations required - restricted by number
and type of columns chosen Data/time data types Not null constraints Default values - essential with a “not null”
constraint - meaningful and distinguishable defining keys
surrogate primary keys foreign keys - “on delete” action
omit or relax constraints? de-normalise?
© Ron Rogerson 1998-2010 Slide 56
Distributed Data Client / multi-server
designed to store data where it is mostly used, but permit remote access
multiple independent databases user process must explicitly navigate
them (connection management) data may be divided by function, or users,
etc. transaction management done in “two-
phase” commit
Distributed databases meant to store data where it is mostly
used and permit remote access or to provide resilience
appears to user processes as a single d/b (location independent)
distribution schema identifies physical locations of data in the logical schema
data may be fragmented (horizontally or vertically) according to usage, or replicated for resilience
optimisation of queries requires knowledge of where the data items (or nearest copy of them) are available
transaction support (and consistency of multiple copies) an added problem but handled by dbms
© Ron Rogerson 1998-2010 Slide 57
Distributed Data (2)
Replication systems Designed to avoid remote accesses by
storing multiple copies locally Improves response and availability but
produces consistency issues May not aim for real-time consistency Consolidation approach - primary sources
for different items are in different places, collect these fragments to produce global view
Dissemination approach - start with single primary copy and distribute copies, but may allow real-time update of central and local copies together
© Ron Rogerson 1998-2010 Slide 58
Data Warehouses (NB: do Data Warehouses involve any new kind of
technology, in the same way as distributed d/bs, or row clustering?)
Decision support systems cf. “management information” systems typically ask non-”right now” questions may require data from diverse systems, or
discarded in normal operations (or not otherwise captured?)
Characteristics of data warehouse
Subject-oriented Non-volatile integrated time variant
Dimensional analysis aims to determine subject area of interest,
and important dimensions of analysis. “star schema” element of guesswork in fixing dimensions
(is it a data-centred or application-centred system?)
sales
member
timearea
wine
© Ron Rogerson 1998-2010 Slide 59
Data warehouses (2)
Building a data warehouse extraction component
produces warehouse data from existing systems
first define how to identify the “fact” in question
may convert from stored data, or extract as it is added (e.g. by a trigger)
integration component format integration semantic integration
the database fact table is centre of star has n:1 relationships with dimension tables
Aggregates fact table may become enormous so
queries need huge processing power in practice, queries tend to want summary
or aggregate data create aggregate tables at various levels can query one level and drill down of drill
up as necessary to follow up trends discovered
aggregate navigator may help to take advantage of the various levels
© Ron Rogerson 1998-2010 Slide 60
XML and databases
XML – a formal markup language for documents
principally for layout & presentation specific XML definitions can be made for
specific applications
XML and relational data
Relational XMLAtomic values in table structure
with unique namesNested elements in tree structure
with named root element
Columns have unique names,ordering not significant,
values all same type
Elements have unique names,can contain data or other
elements, schema can determine type
Rows are distinct, orderingnot significant
Elements distinguished bylocation, specified as a path
Access to data by tableoperations, no concept of
locationAccess to data by location in tree
Atomic values in table structurewith unique names
Nested elements in tree structurewith named root element
Relations are logical structures, no direct storage
implications
XML is logical structure withspecified storage
representation
© Ron Rogerson 1998-2010 Slide 61
XML and databases (2)
Transforming relational data into XML
export data by representing table/row structure by XML tags, or
use SQL/XML query to create XML document for specific application
Storing XML data in relational d/b “shred” document by reducing elements
to simple values for table structure – XMLTABLE function, or
store entire XML as CHAR data value
Querying XML values in r/d/b use XMLTABLE function in SQL query to
return table-like values, or use XMLQUERY function to return XML
values
© Ron Rogerson 1998-2010 Slide 62
Application Development
Embedded SQL Direct (non-cursor) statements
only where =<1 row will be transferred EXEC SQL
SELECT name, registered, region
INTO :StudentName, :YearRegistered, :Region
FROM student
WHERE student_id= :SelectId; Use also with INSERT, DELETE, etc.
Cursor statements EXEC SQL
DECLARE regional_student CURSOR FOR
{SQL query specification}; EXEC SQL
OPEN regional_student; EXEC SQL
FETCH regional_student
INTO :StudentId, :StudentName, :YearRegistered, :Region;
EXEC SQL
CLOSE regional_student;
© Ron Rogerson 1998-2010 Slide 63
Application Development (2)
ODBC Provides interface between applications and
rDBMS Applications can devise own methods of
handling returned data Applications can be DBMS-independent Can handle dynamic SQL, enabling separate
front-end tools to access other vendors' DBMS
Connection to DBMS provided by DBMS-specific ODBC driver
JDBC Provides ODBC-like interface between Java
applications and rDBMS Provides its own automatic cursor-like
method of handling multiple rows Connection to DBMS provided by DBMS-
specific JDBC driver
SQLJ Embedded SQL for Java programs Iterator provides cursor-like functionality
© Ron Rogerson 1998-2010 Slide 64
Application Development (3) D/b routines using Java
dbms implementation includes Java virtual machine
internal 'SQL' routine can use Java directly
Object-relational mapping mapping tools require definition to map
each d/b to an application d/b can then be accessed from Java
program without knowledge of SQL or d/b structure
Scripting languages e.g. Python, PERL interpretive, easily changed often used to facilitate browser access to a
d/b requires DBMS-specific DB-API language has to provide own cursor-like
functionality