co3041 systems analysis, eaglestone & ridley style

35
18-Oct-12 1

Upload: james-denholm-price

Post on 14-Jul-2015

148 views

Category:

Education


0 download

TRANSCRIPT

18-Oct-121

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 2

18-Oct-123

Remember the database modelling process?

› Requirements analysis

› Conceptual (E-R) model

Top-up/bottom-down design

‘Look for the nouns’

› Logical data model

Tables & Normalisation

› Physical data model

Column types; integrity constraints.

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 4

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 5

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 6

Faculty seminars› Fortnightly research seminars.

› Rooms are usually booked in advance but the schedule is flexible.

› Potential speakers are contacted, talks requested and (hopefully!) scheduled. There is usually a significant delay between initial contact and scheduling

a seminar.

Speakers have various details, some public some private.

› Agreed talks have titles, may eventually have abstracts.

› Seminars database is to be administered from the web.

› Seminars are to be advertised primarily on a web page but may be in other forms.

› Publicity may be gained by making past seminar details available but these must obviously be separate from forthcoming talks!

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 7

Organisation and system requirements

Flesh out the details:

› A database containing seminar details is required.

Speakers give seminars in particular rooms at particular times that must all be advertised. Seminars are usually of fixed duration.

Speakers may give more than one seminar and it may be possible for seminars to be given by more than one speaker.

Speakers may give a seminar abstract which may be long.

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 8

Organisation and system requirements …

› A public web page for advertising seminars is required. This must also include a way of advertising past seminars

in an obvious fashion. Since the abstract may or may not exist and may be quite

long it should be on a separate page.

› A method for entering speakers and seminars into the database is required. Administration is not public. Not all speakers contacted give seminars. Not all seminars have abstracts but all have titles,

speakers, date/time, location and duration.

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 9

Organisation and system requirements …

› At this stage it’s not unreasonable to examine the

suitability of a Web-Database solution to the problem!

› Is it a suitable application?

Any obvious drawbacks?

Obvious advantages?

Does everything require a web interface or is another

‘thin client’ solution suitable?

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 10

Conceptual data model: Top-down...

› From the description the nouns suggest the following candidate entities:

seminar, room, speaker, schedule, title, abstract.

Some are weak entities that depend for their existence on other ‘stronger’ entities.

E.g. room: a seminar cannot occur without a room and attributes

of room are irrelevant to this simple DB.

Some are obvious attributes of entities:

E.g. title & abstract are attributes of a seminar

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 11

UML notation RDBMS E-R

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 12

Seminar

gives

Speaker

name

address

start room

duration

title

abstract

[1,n]

[0,n]

Seminar

PK room

PK start

duration

title

abstract

Speaker

PK firstName

PK lastName

PK title

address

email

phone

fax

Conceptual data model:

› Speakers ‘give’ 0…n seminars

0 until booked!

n they may come back!

› Seminars must have at least one speaker; may have more than one!

1…n

› Many candidate keys: Identify

speaker by name? Name+title? Name+title+institution? Email?

seminar by title? Title+speaker? Start time+room?

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 13

Conceptual web data model:› This is a similar analysis to the DB conceptual analysis (e.g.

‘looking for nouns’) but this time in the web-page description.

› There are 3 completeness rules that must be obeyed:1. Entity attribute completeness:

» Add entities and attributes whenever an attribute from the DB conceptual model is added to the WebDB conceptual model

2. Entity identity completeness:» Each entity must be accompanied by its primary key.

3. Referential completeness:» If an relationship (entity) is used then all entities that participate in

the relationship must be added.

› These ensure that each page can extract the data it needs from the database and that no necessary keys and/or entities are missing.

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 14

Conceptual web data model: ‘Stage 1’ lists 4 web pages:1. Seminars advert page.

Lists all forthcoming and a selection of past speakers & seminars.

References to the DB conceptual model are: (speakers) title, firstName, lastName

(seminars) start, duration, room, title

› At this stage an idea for an additional feature crops up: It would also be nice to allow browsers to visit the speakers’ home and/or departmental

web pages.

This requires three more entities in the DB conceptual model: (speakers) institution, instWWW, WWW

These must be added to the conceptual model before proceeding! (Hopefully this kind of thing usually gets identified in ‘Stage 1’)

2. Seminar abstract page. Lists one seminar that has been specified some how.

References the same entities/attributes plus (seminars) abstract

Requires the referring page to supply the seminar relevant key value.

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 15

Conceptual web data model: contd…3. Seminar speakers admin page.

– Creates a new entry in the speakers table and allows an old entry to be edited. Uses: (speakers)

title, firstName, lastName, address, email, phone, fax, institution, instWWW, WWW

– Links to a separate page for seminars associated with each speaker (just the key) (seminars) {PK} starts, room

4. Seminars admin page.– Creates a new entry in the seminars table and allows an

old entry to be edited:– (seminars) start, room, duration, title, abstract– (speakers) {PK} title, firstName, lastName

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 16

Conceptual web data model: contd…– Fortunately 2 entities & no relationships make the ‘rules’

(E&R page 291) easy to check– Each page that refers to an entity does include the entity’s primary

key!– Each attribute comes with its entity/key.– No relationships!

– Questions arising from the web conceptual model: Must we worry about referential integrity?

Later! Either the DB will take care of it or once the ‘Logical Web Data Model’ has been created (RDBMS tables designed) we can identify the problem areas!

There may be pages in the conceptual web model that do not refer to database entities – if an ‘E-R type’ diagram is drawn at this stage then they must be incorporated (e.g. following Eaglestone & Ridley.)

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 17

Conceptual web data model: contd…

– In our example the web page ‘entities’ do not correspond with distinct

DB entities.

– We can easily map these relationships using an extended E-R type

diagram or a full-on UML diagram where each page becomes a UML

entity with methods and properties

• I.e. the UML diagram also illustrates what the page does.

– For simplicity/brevity we’ll follow Eaglestone & Ridley and draw a new E-

R diagram where:

• Unidirectional arrows indicate hyperlink relationships between entities

(which are usually one way).

• Bidirectional arrows denote ‘uses’

» i.e. a page entity uses a DB entity.

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 18

CO3041

Databases and the Web lecture 4 — © Kingston

University, UK 19

SeminargivesSpeaker

title

firstName

lastName

address email

phone fax

institution

instWWW

WWW

start roomduration

title

abstract

Advertising AbstractSeminar

admin

Speaker

admin

Logical data model:› Turn the basic UML E-R model into tables.

NB We added two more fields to speaker in the web data analysis stage … these are reflected in new UML…

› Only two tables means this stage is relatively straightforward: The 0…n relationship should be modelled like a ‘many to

many’ relationship: many speakers may be involved in one seminar;

many seminars may be related to one speaker We create an intermediate table that represents the relationship, E.g. ‘giving a seminar’: Table could be called ‘gives’ Inverse is ‘given by’

NB: Referential integrity is not violated by the ‘0’.

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 20

Logical data model:

It’s also inconvenient to copy the whole (composite) primary keys from the ‘speakers’ and ‘seminars’ table to the ‘gives’ table; instead we can create a new key column in each table e.g. ‘speakerId’ and ‘seminarId’

Q: Looking ahead to step 6 Is there a convenient column type in MySQLfor this?

A: {SMALL,MEDIUM,LARGE}INT UNSIGNED AUTO_INCREMENT

Q: Is this always a good idea?

A: No! If the application requires frequent lookup conversions between the new ‘ID’ key and the ‘real’ key this leads to inefficient queries.› E.g. a ‘users’ table for a web authentication system would probably use the

email address as a unique identifier and need to convert to/from ID number each time … probably quicker to let SQL take care of the email lookup in this case.

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 21

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 22

Seminar

PK seminarId

room

start

duration

title

abstract

Speaker

PK speakerId

firstName

lastName

title

address

email

phone

fax

institution

instWWW

WWW

gives

PK,FK1 seminarId

PK,FK2 speakerId

• Association entity models the relationship between our two tables.

Logical data model:

We should also check normalisation of tables at this point (althoughtables from E-R are usually normalised enough…)

› 1NF: Repeating groups?

› 2NF: “A relation that is in 1NF and every non-primary-key attribute is functionally dependant on the primary key.”

› 3NF: “A relation that is in 1NF & 2NF, and in which no non-primary-key attribute is transitively dependant on the primary key.”

… but bear in mind that while normalisation is desirable it can be useful occasionally to ‘denormalise’ tables for efficiency

› This depends on the application and implementation.

Normalise first to gain the benefits then denormalise for efficiency.

An exercise!

› Revise/practice yourselves!CO3041

Databases and the Web lecture 4 — © Kingston University, UK 23

Web data analysis: (public pages)

› The seminars advert page utilises fields (entities and attributes) from the database conceptual model as follows:

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 24

Seminars: LIST_OF

room: STRING

start: DATE/TIME

duration: NUMBER

seminars.title: STR

speakers.title: STR

firstName: STRING

lastName: STRING

institution: STRING

instWWW: STRING

WWW: STRING

Seminarsroom: STRING

start: DATE/TIME

duration: NUMBER

seminars.title: STRING

speakers.title: STRING

firstName: STRING

lastName: STRING

institution: STRING

instWWW: STRING

WWW: STRING

abstract: STRING

Abstract

URL

URL

URL

URL

Web data analysis: (private pages)

› The admin pages necessarily utilise most of the DB fields

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 25

title STRING

firstName: STRING

lastName: STRING

institution: STRING

address STRING

email STRING

telephone STRING

fax STRING

instWWW: STRING

WWW: STRING

speakerID NUMBER

Speakers Admin

title: STRING

abstract: STRING

room: STRING

start: STRING

duration: STRING

seminarID: NUMBER

speakerID: NUMBER

Seminars Admin

Web data analysis:

› We can do much more in this part of the analysis (see Eaglestone & Ridley) including

mock-ups of pages

textual page schema

identifying repeating structures

› An exercise for the group project!

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 26

These leave the ‘abstract’ layers of conceptual and logical

design behind.

‘Physical’ refers to the actual database and middleware to be

used … this might be influenced in the real world by

› personal preference/experience

› availability/cost

› ‘political’ viewpoints (e.g. open source, also c.f. cost)

› running costs

The seminars example must use open source web

server, database and middleware that runs on Sun Solaris.

› One obvious solution is Apache + PHP + MySQL

(there are others)CO3041

Databases and the Web lecture 4 — © Kingston University, UK 27

Physical database design:

› Specifying the speakers table structure

› CREATE TABLE speakers (speakerID SMALLINT UNSIGNED NOT NULL

AUTO_INCREMENT PRIMARY KEY,title VARCHAR(10) NOT NULL,firstName VARCHAR(20) NOT NULL,lastName VARCHAR(30) NOT NULL,institution VARCHAR(50),address TINYTEXT,email VARCHAR(50),telephone VARCHAR(25),fax VARCHAR(25),instWWW VARCHAR(100),WWW VARCHAR(100)

);

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 28

Physical database design:

› Specifying the seminars table structure

› CREATE TABLE seminars (seminarID SMALLINT UNSIGNED NOT NULL

AUTO_INCREMENT PRIMARY KEY,title VARCHAR(255) NOT NULL,abstract TEXT,starts DATETIME NOT NULL,duration TINYINT UNSIGNED NOT NULL

);

› CREATE TABLE gives (seminarID SMALLINT UNSIGNED NOT NULL,speakerID SMALLINT UNSIGNED NOT NULL,PRIMARY KEY (speakerID, seminarID)

);

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 29

Physical database design Foreign keys:› MySQL supports referential integrity rules via the REFERENCES part of

a column definition or FOREIGN KEY directives

› CREATE TABLE gives (seminarID SMALLINT UNSIGNED NOT NULL

REFERENCES seminars…,speakerID SMALLINT UNSIGNED NOT NULL

REFERENCES speakers…,PRIMARY KEY (speakerID, seminarID)

);

› but this only does anything when working with InnoDB tables not MyISAM tables

In MyISAM tables the REFERENCES rule is syntax-checked but nothing more …

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 30

From the MySQL manual

› CREATE TABLE parent (

id INT NOT NULL,

PRIMARY KEY (id)

) TYPE=INNODB;

› CREATE TABLE child (

id INT PRIMARY KEY,

parent_id INT,

INDEX par_ind (parent_id),

FOREIGN KEY (parent_id)

REFERENCES parent(id)

ON DELETE CASCADE

) TYPE=INNODB;

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 31

Physical web data design:

› Specifying the DB connectivity and how pages refer to each-other.

› For example:

Links in the advert page must pass a seminarID value to the abstract page.

The admin pages are similarly linked by seminarID and speakerID values.

› We’ll talk about how this may be achieved next weekbut will see it briefly in the exercise...

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 32

Physical web data model:

› The ‘finished’ product.

› So why the massively complex procedure?

Stage 1 allows you to specify what you’ll be doing so the client

cannot ‘change the goalposts’ + it’s measurable.

Stage 2/3 ensures consistency between the real world data and your

model of that data.

Stage 4/5 ensures your model is (kind of) optimal

Stage 6/7 separates the programming from the above stages … but

it’s made easier because of the earlier stages!

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 33

Without conceptual models entities/connections/

relationships can be missed or incorrectly represented.

E.g. The seminars database example

› Can you spot the problem? Hint: What’s the speaker-seminar

relationship?

› Originally envisioned for single-speaker seminars …

› Without the association entity (table) ‘gives’ it’s very difficult to

associate n speakers sensibly with 1 seminar!

It’s possible to cheat and glue an ‘other speakers’ field into the seminar

table … but that’s not “normal” and is bad design.

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 34

Because the conceptual data model represents

the ‘real world’ it can often be used to ‘repurpose’

the data … I.e. provide a different interface

(perspective)

› E.g. The seminars database lends itself to:

Print format advertising (e.g. PDF or RTF)

XML distribution of seminar data.

RSS dissemination of ‘forthcoming events’.

CO3041

Databases and the Web lecture 4 — © Kingston University, UK 35