2-databasefundamentals

7/29/2019 2-DatabaseFundamentals

1/42

TOPIC 2: DATABASE FUNDAMENTALS

1

Topic 2

Database Fundamentals

Learning Object ives

The learning objectives for this topic are:

Know the importance of data compared with programs.Understand file-based systems and their problems.Know elementary database concepts.Understand the capabilities of second generation DBMS and the direction of future

DBMS developments.Understand the database development model.Understand the components of the ER data model. To be able to construct an ER model from textual data requirements.To gain experience in developing ER models.


2/42


2

Contents2.1 Evolving Database Technology ................................................................................................................. 5

2.1.1 The Importance of Data ....................................................................................................................... 5

2.1.2 File-Based Systems ............................................................................................................................... 6

2.1.2.1 The Organisation of Data ........................................................................................................... 6

2.1.2.2 An Example File-based Information System Architecture ......................................................... 9

2.1.2.3 Problems with File-based Systems ............................................................................................. 9

2.1.3 Database Concepts .......................................................................................................................... 10

2.1.3.1 Database Management System (DBMS) .................................................................................. 11

2.1.3.2 Data Models ............................................................................................................................. 12

2.1.3.3 Schemata and Instances ........................................................................................................... 12

2.1.3.4 Database Views ........................................................................................................................ 14

2.1.3.5 Data Independence .................................................................................................................. 15

2.1.3.6 Database Languages and Interfaces ......................................................................................... 16

2.1.4 Second Generation Databases ........................................................................................................ 17

2.1.5 Future Database Development ....................................................................................................... 18

2.2 The Entity Relationship Model ............................................................................................................... 19

2.2.1 Approaches to Modelling ................................................................................................................ 19

2.2.2 Establishing Data Requirements ..................................................................................................... 20

2.2.2.1 Description of Data Requirements for University .................................................................... 21

2.2.3 Components of the Entity Relationship Model ............................................................................... 22

2.2.3.1 Entities, Entity Types and Attributes ........................................................................................ 22

2.2.3.2 Example Entity Types ............................................................................................................... 22

2.2.3.3 Relationships ............................................................................................................................ 23

2.2.3.4 The Degree of a Relationship ................................................................................................... 24


3/42


3

2.2.3.5 Participation Conditions ........................................................................................................... 26

2.2.3.6 Constraints and Assumptions ................................................................................................... 26

2.2.3.7 Modelling Multi-Line Data ....................................................................................................... 27

2.2.4 Constructing an ER Model from Data Requirements Description .................................................. 27

2.3 Entity Relationship Modelling Tutorial ................................................................................................... 29

2.4 Answers to questions and activities ......................................................................................................... 33

2.4.1 Preliminaries ...................................................................................................................................... 33

2.4.2 CD Shop ER Model ............................................................................................................................ 34

2.4.3 University ER Model ........................................................................................................................... 35

2.4.4 ER Model of Geographical Features .................................................................................................. 38

2.4.5 An ER Model of Timber Handling in the Sir Edward Kelly Case Study ............................................... 40


4/42


4

This topic provides an introduction to the fundamental concepts and terminologyconcerning databases. The relevant sections of the recommended course textsand other books are as follows:

El Masri and Navathe, Chapters 1 and 2. You may also like to consult C JDate, Chapters 1 and 2

If you are unable to locate the above text books there are other sources of informationthat may prove useful. Remember to check out the details of the recommended readingand sources of information that were detailed in Topic 1 of this module.

There are two main subtopics that cover the core material in this topic:

Evolving Database Technology: a brief history of database design, which beginswith a discussion of why we place so much emphasis on data. This is examinedthrough the history of database development, from the pre-1960s through to thepresent day and into the future.

Entity Relationship (ER) Modelling: ER modelling is a well establishedtechnique to support the design of database systems. The technique focuseson the data of interest (represented as a number of entities) and the relationshipsbetween the data.

A third subtopic includes a substantial tutorial to be completed at the end of this topic.

The tutorial provides an opportunity for you to develop and test the skills and knowledgegained from this topic. The tutorial includes two separate activities:

ER Model of Geographical Features: provides practise in developing a generalER model of geographical features, considering rivers and the cities they flowthrough.

ER Model of Sir Edward Kelly Case Study: provides an opportunity to examinethe Sir Edward Kelly case study introduced in Topic 1 of this module.


5/42


5

2.1 Evolving Database TechnologyThis provides an introduction to the design of database systems. It requires usto consider the importance of the data that systems need to store, as well as areview of developments that have led to the introduction of database systems.

The following topics are covered: The Importance of Data; File-Based SystemsDatabase Concepts; Second Generation Databases; Future Database Development

2.1.1 The Importance of DataLearning ObjectiveKnow the importance of data compared with programs.

Data has greater importance than the programs that operate on it. Consider the followingreasons:

Data persists much longer than the programs that operate on it. Data may have alifetime of centuries e.g. records of sewers in a city, such as those in Paris whichwere built in Napoleonic times. In contrast, the average computer application maybe expected to last approximately five years before it is rewritten orcompletely replaced.

Data, or more importantly the information that the data represents, has a value -such information represents power. Power may have many forms, for exampleeconomic e.g. enabling more effective marketing, political e.g. enabling moreeffective campaigning, strategic

Vast amounts of data exist about each of us. This data is stored in databases andcan be analysed, e.g. supermarket loyalty cards which can be used to monitorshopping trends or promote sales in certain areas. More sensitive data includes:

Secure Data: Like financial data and examination papers. This data is subjectto authorisation restrictions, as it must only be manipulated by certain people

and programs. Private Data: Examples of private data include health, employment,

credit and criminal records.

The Internet is a good example of a global data resource - the data that it containsis more significant than the variety of browsers and applications that provideaccess to it.


6/42


6

2.1.2 File-Based Systems

Learning ObjectiveUnderstand file-based systems and their problems.

In its most basic form the database is just a collection of data. As such, databases haveexisted for many years - since records began and people started to collate data. In thismodule our interest concerns computerised databases. This can be traced from before1960s through to the present day and into the future.

Before the 1960s computer systems stored data in files. The key characteristicsof systems of this period are: Mainframe-based computing technology; File-basedstorage; Piecemeal information system development; Persistent storage based ontapes/drums.

Between executions of a program, data can be stored in files. Such files need tostore a number of different types of data, for example characters, integers ormore significant structures such as employee records. The COBOL programminglanguage was enormously successful in the 1970s and 80s, as its design aimed for easyhandling of files.

2.1.2.1 The Organisation of DataThis section provides an introduction to the fundamental file structures. The relevantsections of the recommended course texts are El Masri and Navathe, Chapters 5 and 6.

The organisation of the data within the file can be achieved in a number ofdifferent ways. There are two key file structures that we consider here - sequentialandindexed sequential:

Sequential: A sequential file structure is the most simple of file structures, witheach item of data being stored sequentially after one another. A sequential file canbe unsortedor sorted. An unsorted sequential file is also often referred to as aheap, whilst a sorted sequential file may be referred to as an ordered file. Anexample of an unsorted sequential file is shown in Figure 2.1. This shows the datain the order that it was added to the file. Storing a new data item in the unsortedsequential file is very simple, since each new data item is added to the end of the

file. For example, a new data item '5' would be added to the file in Figure 2.1 afterdata item '116'.

Figure 2.1: Unsorted Sequential File Format

An example of a sorted sequential file is shown in Figure 2.2. Note that the orderin which data is sorted can be either ascending or descending - the example inFigure 2.2 shows data sorted in ascending order, i.e. smallest data items first.


7/42


7

Figure 2.2: Sorted Sequential File Format

The main disadvantage of a sequential file structure is that a program must read allrecords in the file during each processing activity before finding the record it wants.This is rather like reading a dictionary from the first page until you find theword you are looking for, and then reading on until you find a second word.Sequential structures may be useful, however, if all records are to be read in eachpass, for example in processing payroll data.

Q1: Consider the sorted and unsorted sequential file structures described above.Which is most efficient (in this case we assume efficient to mean fastest) in terms offinding data? Why?

Q2: Consider the sorted and unsorted sequential file structures described above.Which is most efficient (in this case we assume efficient to mean fastest) in termsof inserting data? Why?

Indexed Sequential: Indexed sequential files are held in a similar way to sequential files, but in addition to the contents of the file, an index is created tolocate the position of a number of records in the file to assist in searches.In order to find a record in the file, the index is consulted to locate the indexedrecord nearest to the one required, and will read the file from there. Consider theexample shown in Figure 2.3. This shows a simple index containing thevalues '50' and '100'. The index contains references that point to locations in theactual data file. So, if looking for a data item '25', the index indicates that the

search should start from the very beginning of the file since the data item sought isless than index value '50'. Similarly, if looking for data item '52' the index indicatesthat the value sought is larger than index item '50', but less than index item '100',so the search is set to start after data item '47' in the file. This format is helpful forsystems that process more random data records, such as a stock system.

Figure 2.3: Indexed Sequential File Format

A further example of an indexed sequential file is shown in Figure 2.4. Thisexample shows a typical application of an indexed sequential file system, withthe index providing alphabetical access to the data file items. In this situation, we


8/42


8

assume that the data sought is alphabetical in nature, for example in a telephonedirectory where data is arranged by company or personal name. The index is used tolocate the area of the file to begin searching. This makes it simple to enter the file atthe correct point for names beginning with the letter 'R' without having to scan through

the previous items as would be the case in a simple sequential file.

Figure 2.4: Indexed Sequential File Format

The sequential and indexed sequential file structures are two of the common filestructures that you should be aware of. There are, however, many more variantson these simple structures. It is recommended at this point that you consult therecommended course text references at the start of this subtopic and find outabout the following file structures. In particular you are directed to find out about hashfiles. Hashing is a very common technique that is used to support fast access to data.The technique relies on a hash key and hash function that compute the location(address) of data items in the file. It is important that you are aware of hashing as thisaccess method applies to databases as well as non-database file structures.


9/42


9

2.1.2.2 An Example File-based Information System Architecture

A typical file-based information system architecture is shown in Figure 2.5. This figure

shows a number of application programs and the data that they access via files.

Figure 2.5: File Based Architecture of Library Information System

This system shows a number of functions supported by a library service - MakeReservation, Make Loan and Overdue List. These functions are shown to accessa number of different database files in carrying out the required task.

The database files shown hold different sets of data - there are different database files tohold data concerning reservations, loans, books and borrowers.

2.1.2.3 Problems with File-based Systems

File-based systems work well for small or single-user collections of data and are stillwidely used. However, there are a number of problems that occur in such systems andthese are summarised here:

Changes in the format of the data force changes in the programs that act on thedata - this is termed data dependence. For example, adding a postcode to borrowers'records means that the programs using it (Overdue Listand Make Loan in Figure 2.5)must also be changed.

It is hard to find a single file organisation that will suit all existing (and future) users of thedata. For example, should the loans information be arranged by borrower or book?

This often leads to a duplication of data to satisfy different users and differentapplications - once duplicate data is present this leads to the possibility of datainconsistency if one instance of the data is updated and others are not. Forexample, it is common that companies have separate Personnel and Payrolldatabases. A person's entry in the personnel database may indicate a promotion,resulting in a salary increase. This will require a separate, explicit update in thepayroll database, otherwise the member of staff will not receive the correct pay.


10/42


10

The data is inaccessible to non-programmers, unless they use a program that is written forthem.

There is no global description of the data available outside of the programs that use it.

There is duplication of functionality across the programs that manipulate the files, as theyeach need to reimplement:

Concurrency Control: otherwise simultaneous reading and writing of files cancorrupt data. For example, in the library example shown in Figure 2.5, checks onthe availability of a particular book may be incorrect if two users are trying to takethe same book out at the same time.

Recovery: otherwise computer crashes during updates are likely to corrupt data,for example if an application responsible for transferring money between bankaccounts crashes after removing money from one account, but before it has hada chance to credit the other account.

Integrity: otherwise data may become inconsistent, for example ensuring that aborrower can only reserve or borrow three books at a time.

Security: otherwise unauthorised people may change the data. For example, thelibrarian's identity should be verified by the system to avoid allowing any user toenter the system and change data.

2.1.3 Database Concepts

Learning ObjectiveKnow elementary database concepts.

This introduces some key concepts and terminology relating to databases, and covers:Definitions; Database Information System Architecture; Data Models; Schemata andInstances; Database Views; Data Independence; Database Languages and Interfaces.


11/42


11

2.1.3.1 Database Management System (DBMS)Database Management System (DBMS) A Database Management System is ageneral purpose software management system that controls shared access to adatabase, and provides mechanisms that help to ensure the security and integrity ofthe stored data.

The basic idea behind a DBMS is to solve the data management problems outlined insection 2.1.2.3 and re-use the solution for many collections of data.

Figure 2.6: Database Information System Architecture

In a database information system a DBMS is interposed between the programs and thedata they access.


12/42


12

2.1.3.2 Data ModelsData models are key to the design of database systems. There are a number of differentlevels of data model that we need to consider:

1. Conceptual Data Model: a high-level model of the data, describing what isstored. The conceptual data model is often presented in the form of awritten document, which describes the data using high-level concepts such asobjects and relationships. Examples of conceptual models are class models, usingstandard notation such as the Unified Modelling Language (UML) and Entity-Relationship models.

2. Implementation Model: an intermediate level describing how the data is logicallyorganised. This is often presented as a description of the interface to the DBMS,

and describes the data using medium-level concepts such as relations orsets. Examples of logical models are relational models and network models.

3. Physical: very low-level compared to the conceptual and implementation models,and describes how the data is physically stored. This is often expressed in the formof commands in a control language, using concepts like record formats or indexfiles. The implementation model can be defined in the statements that create thedatabase, such as part of the SQL language, for example CREATE TABLEor CREATE INDEX.

2.1.3.3 Schemata and InstancesSchemata and instances are key concepts in databases.

A schema is a description of a database in some data model. If we compare thedatabase to a programming language, then the schema can be compared to thedatatypes represented by the programming language. Consider the example schema inFigure 2.7 . This example shows the schema for data related to a student.

Figure 2.7: Example Schema for Student

An instance is a collection of data that corresponds to a schema. Again, using thecomparison with the programming language, the instance corresponds to the variables inthe programming language.

The relationship between schemata and instances is analogous to the relationshipbetween the value of a variable in a program and the associated datatype, e.g. 3 is of type

int. An instance of the schema in Figure 2.7 is provided in Figure 2.8.


13/42


13

Figure 2.8: Example Instance Supported by the Student Schema

Q3: Think about the description of the database instance and schema describedabove. How often do you think that the instance will change compared to the schema?


14/42


14

2.1.3.4 Database ViewsNot all users should, or need, to have access to all data held within the database. Thiscan be controlled at the schema level. A database view is a type of schema that sits alevel above the actual schema and controls what the user can see. Such views are oftenreferred to as external schemas. There are two main reasons for using views: restrictingaccess and generating data.

Restricting access: may be important, since you do not want all users to be able to access all data. For example, consider a simple database which a bank may be using tohold details of customer accounts. The bank database will have a number of differentlevels of user. The following may be an appropriate set of restrictions that may apply to ateller user of the bank's system:

Operations: it may be appropriate to prevent a teller user allocating or updating acustomer's overdraft limit.

Fields: it may be appropriate to prevent the teller user from accessing the customer's overdraft status.

Individual Records: it may be appropriate that only the bank's credit controller is able to see records for customers who have a negative balance on their account.

Groups of Records: it is reasonable to stop the teller user from accessingemployee records.

Generating data: is a common technique in database systems. It is not alwaysnecessary to store items of data that can be generated from other items of datain the database. For example, the age of a person can be calculated from their date ofbirth - it is not therefore important to store the age as a separate field. A database viewcan be used to control access to such generated data items, so that as far as the user isconcerned the item of data is actually stored in the database.

Three-level Database Architecture: Databases are considered to have three levels:physical, conceptual and external. Figure 2.9 illustrates this three-level architecture.

The internal level (orphysical level) describes the physical storage structure of the

database. The internal level therefore has an internal schema which defines thestorage of data. The internal schema uses a physical data model which shows howdata is organised on the machine.

The conceptual level has a conceptual schema, which describes the structure ofthe entire database for the community of users. This schema hides the details ofthe physical data model and concentrates on the description of entities, data types,relations, user operations and constraints.

The external level (or view level) includes a number of external schemas or userviews. Each of these views or external schemas describes a part of the database ofinterest to a particular group of users. This allows the users to see only those parts

of the database that are relevant to them.


15/42


15

Figure 2.9: Three-Level Architecture

2.1.3.5 Data IndependenceData independence is an important concept in database design. There are two kinds ofdata independence that we need to consider:

1. Logical Data Independence: the conceptual schema can be altered withouthaving to modify the views or applications that access the database. For example,an existing application may access customer records in a database. If anadditional attribute is added to the customer schema, for example areference indicating passport number, then only applications or views that need toaccess the new data item need to be modified.

2. Physical Data Independence: the physical schema can be changed withouthaving to change the conceptual schema. An example of such a change could bethe addition of a new index for accessing customer addresses. This does not affectthe conceptual schema.


16/42


16

2.1.3.6 Database Languages and InterfacesThere are three main types of language that we need to be aware of. Each language

serves a particular purpose. The languages are:

Query Language: The query language is used to extract data from the database.

The following is an example of a query statement which selects the attributes CName and Yearfrom some COURSEtable. You will see XML query languageslater in the course.

SELECT Cname, Year

FROM Course

Data Manipulation Language (DML): The DML is used to modify the data, byinserting, deleting or updating items. The following is an example DML statementwhich deletes entries from the COURSEtable where the CourseNo attributematches the value 'C3'.

DELETE FROM Course

WHERE CourseNo = C3

Data Definition Language (DDL): The DDL language is used to define the structure

of the data to be stored. It is responsible for the creation of tables, indexes, views

etc. The following is an example DDL statement which creates a table called

COURSEwith four attributes which represent a course reference number, coursename, department name and year the course is offered. You will see XML documentdefinition language(s) later in the course.

CREATE TABLE Course

(CourseNo CHAR(5),

CName CHAR(20),

DName CHAR(22),

Year NUMBER) ;

In order to use the above three languages, the user needs some sort of interfacein order to access the database. The interface will be dependent on the DBMS product.The majority of databases these days provide some sort of GUI to try and help the user;for example the Oracle database vendor provides a Navigator tool.


17/42


17

2.1.4 Second Generation Databases

Learning ObjectiveUnderstand the capabilities of second generation DBMS and the direction of futureDBMS developments.

The second generation DBMS were introduced in the 1970s and 1980s. This wasa major step on from the first generation DBMS, notably by introducing anapproach echoed by a number of vendors, which began to introduce an element ofconsistency between different providers of database technology.

The second generation DBMS is seen as the introduction of the relationalmodel. Thismodel addressed the majority of problems identified by preceding generations:

Data Independence: Support for data independence is provided.

Interfaces: High-level interfaces started to appear, making the data more readilyaccessible to non-programmers. Such interfaces were implemented by interactiveSQL, data-entry forms and support for querying by form.

The relational model was proposed by Ted Codd in the 1970s. Efficient implementationsbecame available from the mid 1980s. Subsequently, the Relational DBMS (RDBMS)

has become the most dominant database model. You may be interested to check out thefollowing seminal publication:

Codd, E.F., A Relational Model of Data for Large Shared Data Banks,Communications of the ACM, Vol 13, No 6, 1970.

A range of RDBMS are available, and some important ones are as follows:

Oracle: produced by The Oracle Corporation.DB2: produced by IBM.MySQL: Open sourceAccess: produced by Microsoft

SQL Server: produced by Microsoft

Why not try a search on the Internet to find out more details of these companies and theirRDBMS products?

All of the above RDBMS make use of a common declarative language called SQL. Youwill learn more about the SQL in subsequent module topics 3 and 4


18/42


18

2.1.5 Future Database DevelopmentDatabase technology is still evolving. This topic considers what will direct the futuredevelopment of DBMS. Important new data models are being considered in addition to therelational one, e.g. object-oriented, object-relational and knowledge-based.

DBMS make use of a number of related areas of technology, which are themselvesdeveloping. As a result the DBMS can then take advantage of these developments:

Hardware platforms: for example, using multiprocessors

Communications networks: for example, using Java bindings for access via the Internet.

User Interface: The user interface technology is developing rapidly; for example,allowing the use of multi-media, speech and hands-free access.

The main drive for DBMS development comes from the demands of new databaseapplications, for example data warehousing and data mining applications which haveparticular requirements. Examples of future database developments are considered inTopic 7 in this module.


19/42


19

2.2 The Entity Relationship ModelThe entity relationship model is a well established approach to the design ofdatabase systems. The focus of the technique is on the data represented (as anumber of entities in which we have an interest), and the relationshipsbetween the data. The relevant sections of the recommended course texts areas follows: El Masri and Navathe, Chapter 3, C J Date, Chapter 22.

The following subtopics are covered here: Approaches to Modelling; Establishing Data

Requirements; Components of the Entity Relationship Model; Constructing an ER Model

from Data Requirements Description.

2.2.1 Approaches to Modelling

Learning Objective

To understand the Database development model.

Topic 1 introduced the system life cycle. In this subtopic we consider the similaritieswith approaches to data modelling. Figure 2.10 provides a graphical representation of a lifecycle concerning software development. This is similar to that introduced in Topic 1 (referback to it now if you need to remind yourself).

Figure 2.10: Illustration of Software Development Model


20/42


20

Database development follows a similar life cycle, as depicted in Figure 2.11.

Figure 2.11: Illustration of Database Development Model

You are directed to read more about this yourself from the recommended course text,El Masri and Navathe. If you do not have access to the above text, then consult someother database text on this subject.

2.2.2 Establishing Data RequirementsThe starting point is to establish data requirements for the database. This processis similar to establishing requirements of any other Information System andfocuses on the data that the database will be required to store. This involvesestablishing the requirement for holding that data, i.e. not just about what data, butasking why, to help establish non-obvious data.

Let us consider an example information system which looks at the data requirements of a

university. The university has an overall requirement to store data which is central to itskey business functions. This concerns holding data on staff, students and courses. Thisis insufficient information on which to build a system, so it is important to consider anapproach that we use to express the detail of the requirements.

The following introduces a case study based on a University system. This example willbe used throughout this and following database topics. The case study is introduced byproviding an initial text-based description of the data requirements of the university. Thiswill be used later in this topic as we consider a more rigorous notation than text alone -namely ER modelling.


21/42


21

2.2.2.1 Description of Data Requirements for UniversityConsider the following text description which represents an example of thedata requirements for a University.

A university has a requirement to maintain details of staff, students, thecourses available and the performance of students on courses.

Information about each student is initially recorded at registration, andincludes the student's matriculation number, name and year of registration.A student may or may not enrol on courses at registration.

Information recorded for each staff member includes the staff number andname. Each staff member may or may not act as a counsellor to one ormore students, and may or may not act as a tutor on one or more courses.

A student has one counsellor, and has a tutor for each course on which thestudent is enrolled. A student is allocated a counsellor at registration andmust always have a counsellor. A student may or may not have a tutor fora course on which they are enrolled.

Each course has an identifying code, a title and a credit value. There maybe a limit to the number of students who can be registered on a course -this is referred to as the course quota. A course may have no students.

Students may not enrol for more than 100 credit points worth ofcourses at a time. Courses have assignments, and the grade of a studentfor an assignment is recorded as a percentage.

Using plain text to capture requirements has advantages as well asdisadvantages. Consider what some of these may be from your reading of the abovetext:

Q4: What might some of the advantages be?Q5: What might some of the disadvantages be?


22/42


22

2.2.3 Components of the Entity Relationship Model

Learning Objective

To understand the components of the ER data model.

The entity relationship model is a conceptual data model. This conceptual modelis represented through the three key components: entity types, attributesand relationships.

These components are introduced and explained as follows:Entities, Entity Types and AttributesExample Entity Types RelationshipsThe Degree of a RelationshipParticipation ConditionsConstraints and AssumptionsModelling Multi-Line Data

2.2.3.1 Entities, Entity Types and AttributesAn entity represents a thing about which data is recorded. An entity may represent atangible object, such as a student (John Perkins) or a vehicle (Y354 LMN). It can oftenrepresent intangible objects, such as enrolment or order.

An entity type defines the properties of a collection of entities. For example, a Studentwith Name, Matriculation Number and Date of Registration.

An attribute is a component of an entity that represent a single property of entities of thattype. The attributes of the studententity type are Name, Matriculation Number and Dateof Registration.

One or more attributes may be chosen to be the identifierof an entity. The identifier isthe attribute that helps us distinguish one entity from another entity of the same type. Forexample the matriculation numberwould be a suitable identifier for a studententity.

It is advantageous to introduce a notation what allows us to graphically represent thedeveloping ER model - such a notation allows us to represent an ER model as an ERdiagram. This will be discussed later.

2.2.3.2 Example Entity TypesThis introduction of example entity types uses the data requirements of the universitysystem mentioned previously.

The ER model allows the entity types to be represented graphically. Examples of bothStaff and Student entities are illustrated in Figures 2.12 and 2.13.

Figure 2.12: Graphical Illustration of Student Entity Type


23/42


23

Figure 2.13: Graphical Illustration of Staff Entity Type

The entity is represented as a simple rectangle, with the name of the entity placed insidethe rectangle. For clarity, it is a helpful convention to keep the name of the entity thesame as the name used in the database schema - including spelling and capitalisation.

The entity types can also be written textually with all of the attributes shown. An

illustration of the Student and Staff entities is given in Figure 2.14.

Student(MatricNo, Name, Registered)

Staff(StaffNo, Name)

Figure 2.14: Student and Staff Entity Types

The identifierof each entity is shown underlined - the identifier is the attribute that will beused to distinguish one entity instance from another. In our example entities, theMatricNo is the attribute that will identify one student from another student. In normalconvention, the identifier will be at the start of the attribute list in the schema. However,

later examples will show that the identifier can comprise more than one attribute. Theunderlining is therefore used for clarity.

2.2.3.3 RelationshipsA relationship is an association between entities that needs to be recorded. Therelationship is key to the way the data is to be interpreted and used by the database. Anexample of a relationship is shown in Figure 2.15, which shows one relationship betweenthe Student and Staff entities. The particular relationship concerns the counsellingrelationship between these entity types. The relationship is initially illustrated by drawinga connecting line between the two entity types affected by the relationship. Therelationship is itself then named for clarity.

Figure 2.15: counsels Relationship between Staff and Student Entity

A relationship may exist between different entities of the same type. For example,consider the example shown in Figure 2.16 which shows that a relationship existsbetween instances of the Person entity. The relationship is shown to be marriedto, which captures the fact that one instance of the Person entity may be married toanother instance of the Person entity.


24/42


24

Figure 2.16: married to Relationship between instances of Person Entity

A relationship occurrence is a specific set of associations that exist at a giventime. These can be captured in an occurrence diagram.

An example occurrence diagram is shown in Figure 2.17.

S01

7774S02

S05

6635

3158 s07

S09

5324S10

Figure 2.17: Occurrence Diagram

Figure 2.17 depicts to the counsels relationship between the Student and Staff. Theoccurrence diagram shows the associations between the entity identifiers. So, for example, thediagram shows that Staffnumber 7774 has a counsels relationship with Students01, s02 and s05.This diagram further illustrates that certain Staff need not be associated with Students via thisrelationship; for example Staffnumber 6635 is not associated with any Studentoccurrences.

2.2.3.4 The Degree of a RelationshipThe degree of a relationship governs the maximum number of entities that participate inthe relationship.

There are three alternatives to consider:


25/42


25

1:1: This is referred to as a one to one relationship and states that an entity may beassociated with at most one other entity in the given relationship. An example inFigure 2.18 shows that a member of Staff may be head of one Department.Likewise, each department has just one head.

Figure 2.18: 1:1 Degree of Relationship

Figure 2.16 also represents a 1:1 relationship. The married to relationshipbetween instances of the Person entity type represents a monogamous marriage,where we restrict a person to being married to at most one other person.

1:N: This is referred to as a one to manyrelationship and states that one entity mayrelate to more than one other entity. The example in Figure 2.19 shows that aPerson may own one or more Carentities. The many aspect of the relationshipapplies to the Car entity. The one aspect of the relationship belongs to the Personentity. So, a Person may own one or more cars. A Carhowever, can be owned byat most one Person.

Personowns

Car

Figure 2.19: 1:N Degree of Relationship

M:N: This is referred to as a many to manyrelationship. An example is shown in Figure 2.20,which shows that if we have a Course and Location entity, then the Course may be offered atone or more locations. Similarly, a Location can offer one or more courses.

Figure 2.20: M:N Degree of Relationship


26/42


26

2.2.3.5 Participation ConditionsParticipation conditions govern the minimum number of entities that participate in arelationship. In this simple ER model there are just two alternatives:

mandatory: every entity must participate in the relationship. For example, in recording detailsof car ownership we may want to stipulate that each car must have an owner. This is represented inthe notation by adding a filled in dot on the relationship line. The dot is placed adjacent to the entitywhich is mandatory. An example is shown in Figure 2.21.

Figure 2.21: Mandatory Participation in Relationship

optional: every entity may, or may not participate in the relationship. This is represented inthe notation by an unfilled dot on the relationship line. The dot is placed adjacent to the entitywhich is optional in the relationship. For example, returning to the relationship between Person andCar, we may further stipulate that a Person entity may exist without having an associated Car. This isshown in Figure 2.22.

Figure 2.22: Optional Participation in Relationship

2.2.3.6 Constraints and AssumptionsTypically, not all the data requirements can be captured using entities, attributes andrelationships. Such items are simply recorded as constraints, i.e. assertions about thedata. For example, referring to the university example, a student may not enrol for morethan 100 points worth of courses.

Typically, in order to construct an ER model, we must make assumptions about the datathat are not stated in the data requirements. For example, it is reasonable to assumethat we allow a Student to enrol on a Course at most once - this is not something to berepeated at each registration.


27/42


27

2.2.3.7 Modelling Multi-Line DataOften we may find that we have identified a headerentity. Such an entity has multiple

lines of associated data that all relate to the single header entity.

Consider the example of a company that specialises in mail order of music CDs.Customers place orders for one or more CDs. The order date, delivery date and deliveryaddress is recorded for each order. A single order can be for multiple CDs, and for eachCD we may want to record the CD number, title, price and quantity ordered.

The following questions present a pen and paper exercise based on the abovedescription of the CD shop described above.

Q6: List the entity types for the CD shop. To do so you must identify the attributes ofeach entity, and an identifier for each entity type.

Q7: Draw an initial graphical ER model that represents the CD shop based onthe above information. Remember to state clearly any assumptions that you make.

2.2.4 Constructing an ER Model from Data Requirements Description

Learning Objective

To be able to construct an ER model from textual data requirements.here is no guaranteedapproach to producing an ER model from a text-based description of the data requirements.There are, however, a number of general guidelines which can be considered as a general recipe.This recipe is as follows:

1. Identify Potential Entities: This involves scanning the text and picking out all those itemsthat may suggest themselves as potential entities. HINT: often tangible things, such as students, areentities.

2. Identify Attributes: Given the list of possible entities identified in step 1 above, scan the text andpick out all the possible attributes that belong to the candidate entities.

3. Choose Identifiers: Having identified a list of candidate attributes, examine those attributes andchoose one that seems a suitable key identifier for each entity type.

4. Draw Initial ER Diagram: Given the information from the above steps, draw out an initial ERdiagram - representing the entity types in the rectangles.

5. Add Relationship Information: Referring to the text description, draw relationshipsbetween the entities on the draft ER diagram from step 4 above.

6. Add Degree Information: From the text information, identify any additional informationconcerning the degree of relationships between entities. Add this information to the draft ERdiagram.

7. Add Participation Information: From the text information, identify any additional information


28/42


28

concerning the participation of entities in the relationships identified. Add this information to thedeveloping ER diagram.

8. Redraw the ER Diagram: At this stage, take the time to carefully redraw the ER diagram neatly,

reviewing each entity, attribute and relationship as you do so.

It is important to remember that developing an ER model is an iterative process. It is unlikelythat you will obtain an ideal ER model in a single pass through the process described in steps 1through 8 above. As the model develops you may find it appropriate to make assumptions, whichwould subsequently need to be clarified with the user or other expert in the relevant business domain.

The following questions guide you through the above recipe steps in order to develop an ER model ofthe university. The university data requirements were introduced earlier in this topic - refer back to theDescription of Data Requirements for Universityin section 2.2.2 for further details.

Q9: Identify potential entities for the University: You already have the Staff and Studententities to start with - these were introduced earlier. What others can you identify from the universitydata requirements?

Q10: Identify attributes: Sample attributes have already been provided for the Staff and Studententities. What about attributes for the additional entities you have identified?

Q11: Choose identifiers: Choose identifiers for the entities from the attribute listprovided in the previous step.

Q12: Draw an initial ER Diagram: Draw an initial ER diagram to represent the entities asrectangles.

Q13: Add relationship information: Now add relationship information to your ERdiagram developed in the previous step.

Q14: Add degree information: Now think about the degree information that is relevant tothe model you are developing. Add the degree annotationsto your ER diagram.

Q15: Add participation information: Think about which entities have anoptionalormandatoryrole in a relationship. Add the required participation

notations to your ER diagram.

Finally, you may like to redraw your ER diagram at this point to provide a neat model.


29/42


29

2.3 Entity Relationship Modelling TutorialLearning ObjectiveTo gain experience in developing ER models.

This section provides a substantial tutorial to be completed at the end of this topic. Thetutorial provides an opportunity for you to develop and test the skills and knowledge fromthis topic. The tutorial includes two separate activities.

ER Model of Geographical Features: provides practise in developing a generalER model of geographical features, considering rivers and the cities they flow

through.

ER Model of Sir Edward Kelly Case Study: provides you with an opportunity toexamine the Sir Edward Kelly case study introduced in Topic 1 of this module.


30/42


30

ER Model of Geographical FeaturesRead through the details of the data requirements presented in this activity. This activityis a paper exercise during which you will develop an ER model based on thedata requirements.

A database is required to record data about geographical features in a number ofcountries. The data relates to rivers, lakes, mountains and cities. It is assumed thateach such feature can be uniquely identified by a name, except for cities, where a nameis only unique within the country to which it belongs.

For each country, identified by its name, data is required for its area, population andcapital city.

For each mountain, record the height, and the country in which it lies.

For each river, record its total length, and its length in each country through which itflows.

For each lake, record its area, maximum depth and the proportion of the lakeowned by each country on its shoreline.

For each city record its name, population, the country to which it belongs, and theriver (just one) which may flow through it.

The following questions guide you through developing an ER model of the geographicalfeatures database. The questions allow you to build up the model incrementally if you sochoose, by first checking if you have identified all the entities and their attributes, thenestablishing all relationships between entities. Finally, the third question in the followingquestion set leads you to develop the complete ER model, including the participationinformation, along with a summary of the constraints and assumptions.

Q16: As an initial step, list the entities to be represented and their attributes (we are not concerned with their datatype at this stage, so only a representative name is required).Remember to clearly mark the attributes that form the primary key for each entity.

Q17: Taking the entities identified in the first question, develop an ER model forthe data requirements described above. Your model should include entity identifiers andthe degree of relationships between the entities.

Q18: Finally, taking the ER model developed in Question 2, now add participation constraints on each of the relationships. Also include all constraints and assumptionsthat you make.


31/42


31

An ER Model of Timber Handling in the Sir Edward Kelly Case Study

Read through the details of the data requirements presented in this activity. This activityis a paper exercise during which you will develop an ER model based on the

data requirements found in the Sir Edward Kelly case study described in topic 1of this module.

The following questions guide you through developing an initial ER model of the timberhandling part of the Sir Edward Kelly Case Study. It is important that you complete thisactivity before working on the assessed coursework for the database part of this module.

The questions allow you to build up the model incrementally if you so choose, by firstchecking if you have identified all the entities and their attributes, then establishing allrelationships between entities. Finally the third question in the following question setleads you to develop the complete ER model, including the participationinformation, along with a summary of the constraints and assumptions.

Q19: As an initial step, write down a list of the entities and their associated attributes -

remember to clearly mark the attributes that form the primary key for each entity.

Hint 1: You should focus on the example documents in the case study, and produce entity

types to model them.

Hint 2: Much of the data is multi line, so the techniques outlined in Section 2.2.3.7 should be

applied.

Q21: Taking the entities identified above, develop an ER model for the datarequirements described above. Your model should include entity identifiers and thedegree of relationships between the entities.

Q22: Finally, taking the ER model developed in Question 2, now add participationconstraints on each of the relationships. Also include all constraints and assumptionsthat you make.


32/42


32

Glossary

database

A database is a coherent collection of related data.

Database Management System (DBMS)

A Database Management System is a general purpose software managementsystem that controls shared access to a database, and provides mechanisms thathelp to ensure the security and integrity of the stored data.


33/42


33

2.4 Answers to questions and activities

2.4.1 Preliminaries

Q1: A sorted sequential file will tend to be more efficient (i.e. faster) than an unsorted sequential file for data retrieval because each data item is stored in a specific order. The orderof the data file provides some assurance as to where we will find data. Consider the exampleof searching for a data item '46'. If we consider the sorted sequential file shown in Figure 2.2we know that we can stop the search when we reach data item '47', as at that point we knowthat '46' cannot exist in the file. The same cannot be said of an unsorted file, as there is noguarantee of the order that we will find data in. Given the above example of searching for dataitem '46' in the file shown in Figure 2.1, we would have to search until the end of the file toestablish whether data item 46 existed or not.

Q2: The unsorted sequential file structure is the most efficient (i.e. fastest) for inserting databecause each new data item is simply added to the end of the file. If we assume we want toadd a new data item '55' to a file, then it is simply added to the end of an unsortedsequential file structure - after data item '116' in Figure 2.1. The sorted sequential filestructure is a little more complicated, as we assume that the data file will be searched in orderto find the correct insertion point to maintain the order of the data. With regard to the sortedsequential file in Figure 2.2, this would mean searching past data items '16', '25', '47, '52' and'116' - then backtracking to insert the new data item before date item '116'.

Q3: The instance will change as the user requests data be inserted, updated or deleted.Compared to the schema this is a frequent change.

The schema only change as the user requires a change in the type of data recorded in thedatabase. This is most commonly in situations where an additional data field is required - thedata field would be added to the schema, then instances of that data item could be recorded inthe database.

Q4: The key advantage of the text-based description is that it is easy for people toread. It requires no specialised knowledge to interpret the data requirements.

Q5: The disadvantage is that the relationships between data items and constraints ondata are not entirely clear in the text description - it can be difficult to pick out keycharacteristics.


34/42


34

2.4.2 CD Shop ER Model

Q7: The entity types for the CD shop are as follows. Note the underlined identifiers.

Customer(CustNo, Title, Initial, Surname)

CD(CDNo, Title, Price)

Order(OrderNo,CustNo,OrderDate,

DeliveryDate, DeliveryAddress)

OrderItem(OrderNo, CDNo, Quantity)

Q7: The following figure represents a possible ER model of the CD shop. The

Customer entity is able to place one or more orders. An Order must be placed by oneCustomer, but a Customer is optional in theplaces relationship - this assumes we can holddetails of customers who have not yet placed an Order. The Order is made up of one or moreOrderItems. The Order and OrderItem are both mandatory in the contains relation. EachOrderItem concerns a CD that the shop sells. Note that the CD is shown as optional in theorderedrelation with the OrderItem - you might like to consider what that means.

Figure 2.23: Example CD Shop ER Model

Q8: There is no Question 8


35/42


35

2.4.3 University ER Model

Q9: Tangible things that you might identify as entities are Course and Assignment.

There are also intangible things that can be important entities. As far as the universityERmodel is concerned this leads us to consider Enrolment as our final entity. The full listof entities is therefore as follows:

StudentStaffCourse

Assignment

Enrolment

Q10: The attributes for the suggested entities are listed below:

Student(MatricNo, Name, Registered)Staff(StaffNo, Name)Course(CourseCode, Title, Credit)Enrolment(MatricNo, CourseCode)Assignment(MatricNo, CourseCode, AssignmentNo, Grade)

Q11: Candidate identifiers are shown by underling the attribute - remember that multipleattributes may be involved in forming the identifier.

Student(MatricNo, Name, Registered)

Staff(StaffNo, Name)Course(CourseCode, Title, Credit)Enrolment(MatricNo, CourseCode)Assignment(MatricNo, CourseCode, AssignmentNo, Grade)


36/42


36

Q12: The following figure represents the entities for the University system - think carefully about the Enrolment entity. Did you identify it in your own list? You may haveidentified it as a relationship - but the details of the enrolment need to be stored and this

is best achieved by having a separate entity that represents each enrolment.

Q13: Adding initial relationship information results in drawing lines between the entities andadding some text that describes the nature of the relationship.


37/42


37

Q14:

Adding the degree information the the ER diagram results in the following clarification of

relationships.

Q15:

Adding the participation information to the ER diagram gives the final ER diagram.


38/42


38

2.4.4 ER Model of Geographical Features

Q16:

The following is a list of entities and associated attributes.

The names of entities and attributes are chosen from the problem domain, so it issimple in reading the following descriptions to relate the information back to the datarequirements description. It is feasible that you will come up with different names fromthose suggested below - that is fine, as long as the names you have chosen are alsoreadily identifiable with the problem description.

Country(CountryName, Area, Population)

City(CountryName, CityName, Population)

Mountain(MountainName, Height)

Lake(LakeName, Area, Depth)River(RiverName, Length)

Stretch(CountryName, RiverName, Length)

Portion(CountryName, LakeName, Proportion)

Did you remember to underline the primary key attributes in your own entity list?


39/42


39

Q17:

The ER diagram in Figure 2.25 shows the ER model capturing entities, therelationships and the degree of relationship.

Figure 2.25:

Q18:

The figure below shows the ER model capturing additional informationconcerning participation in relationship. This represents the complete ER Model for thistutorial.


40/42


40

The constraints and assumptions concerning the model are summarised below:

Constraints:

The total of all proportions for a lake, as fractions, add up to 1.The capital of a country must be contained within it.A country may not have a mountain, lake, or river.A country may have more than one mountain, lake, or river.

Assumptions:

If a river separates two countries, there is a separate stretch

of river for each country. Thus the total length of all thestretches is not necessarily the same as the total length of a river(i.e. it is not derived data).

Any lake is completely included within the region being modelled,so there are no parts of the lake that are owned by some countrynot in the region.

2.4.5 An ER Model of Timber Handling in the

Sir Edward Kelly Case Study

Q19:

The following is a list of entities and associated attributes. The namesof entities and attributes are chosen from the problem domain, so it issimple in reading the following descriptions to relate the informationback to the data requirements description. It is feasible that you willcome up with different names from those suggested below - that is fine,as long as the names you have chosen are also readily identifiable withthe problem description. Did you remember to underline the primarykey attributes in your own entity list?

PurchaseContract(ContractNo, ContractDate,

ShippingDate,

Description,Lengths, UnitPrice,

CurrencyRestrictions)

ContractItem(ContractNo, BillOfLadingNo,

ShipmentDate, Shipment, Comments)

ShippingSheet(ShippingSheetNo, ContractNo,

Vessel, ShippingAgent, CustomsAgent,

TotalVolume, DateShipped, Dock, ExQuayPeriod,


41/42


41

PortOfShipment, FreightCharge, Insurance,

ExchangeRate, DateArrived, Berth, ExQuayRate)

ShippingItem(ShippingSheetNo, BillOfLadingNo,

Size, Quality, Type, NumPacks, NumPieces,

Volume, Destination, Haulier, OrderNo,

DateInStock/Invoiced)

BillOfLading(BillOfLadingNo, ContractNo,

LoadingDate, Description, Size, Quality, L51,

L47, L45, L42, L39, L36, L33, L30, L27, L24,

L21, L18, NumPieces, TotalLength, Volume)

OutturnReport(BillOfLadingNo, NumPacks,

NumPieces, Condition)

StockSheet(BillOfLadingNo, ContractNo,

ShippingSheetNo, Stowage, Cost, Condition,

Size, Quality, Type, Vessel)

StockItem(BillOfLadingNo, ReferenceNo, Date, L51,

L47, L45, L42, L39, L36, L33, L30, L27, L24,

L21, L18, NumPieces, NumPacks, Volume, Balance)

Q20:

The Figure below shows the ER model capturing entities, the relationships andthe degree of relationship.


42/42


Q21:

Figure 2.28 shows the ER model capturing additional information concerningparticipation in relationship. This represents the complete ER model of the Sir EdwardKelly timber handling side of the business for this tutorial.

The key parts of the modelling process are to identify the entities correctly,together with their attributes, and the relationships between them. The degree andparticipation conditions of the relationships are also important.

The layout of the diagram, and the names that you chose for the entity types, attributesand relationships are not so important. The aim in deciding on names is to beas unambiguous as possible.

Figure 2.28:

The assumptions concerning the model are summarised below:

Assumptions:

Assume that the bill of lading is unique (i.e. no two contracts, or shipping itemshave the same bill of lading number

2-databasefundamentals

Documents