database design the er model - simon fraser university · database design the er model entity sets...
TRANSCRIPT
Database design The ER model
▪ Entity sets
▪ Relationship sets
▪ Weak entity sets
▪ Subclasses
▪ Aggregation
ER design issues
Until recently the relational data model was the prevalent model for databases
There have been two main challenges to this
▪ Object oriented programming
▪ Object models do not match well to relational models
▪ The rise of the web and big data
▪ Very large amounts of data are now stored and accessed in a distributed way
▪ Which makes it difficult for conventional relational databases to manage
An SQL database is a relational database where
the data model is based on set theory
NoSQL databases have different data models
▪ That are typically much looser with variable schema
▪ We will discuss NoSQL databases later in the course
The major focus is on relational databases
▪ For now, assume that an enterprise is to be modeled
as a relational database
Produce a set of tables that can store the data required by an enterprise
Relational database tables represent
▪ Entities, and
▪ Relationships between entities
A student database might contain
▪ The entities student and course and
▪ The relationship student takes course
Experienced designers may be able to directly determine the tables for small applications
Many database applications model large and complex enterprises
▪ In many cases no one person understands the complete requirements
▪ A DB designer should discuss requirements with the users▪ Translate the requirements into a high level design
▪ Confirm with the users that the design is correct
▪ Translate the high level design into a database schema
Requirements Analysis Conceptual Database Design Logical Database Design
▪ Schema Refinement
Physical Database Design Security Design
A database is intended to model a real world enterprise
▪ What data are to be stored in the database?
▪ Linguistic aside: data is plural, datum is singular
▪ What applications are required to work with the database?
▪ Which are the most frequent, and the most important operations?
Use the Entity Relationship (ER) model to develop a
high level description of the data
Identify the entities and relationships in the enterprise
Identify what information about these entities and
relationships is to be stored in the database
Identify the integrity constraints (or business rules) that
apply to the entities and relationships
Check with the client that the ER model that has been
developed is correct
Repeat as necessary
Determine which data model should be used to implement the database
Determine which DBMS to use
▪ In most cases this means deciding which existing DBMS product to purchase or license
Map, or translate, the conceptual schema to a database schema of the chosen model
▪ We will specify the conceptual schema by drawing an Entity Relationship diagram
The designer has to decide how to represent the stuff to be modeled in the database
▪ Items to be represented in a database are referred to as entities
▪ Connections between items are referred to as relationships▪ e.g. A customer (entity) buys (relationship) a product (entity)
There are two major problems to be avoided
▪ Redundancy – information should not be repeated
▪ Incompleteness – it should be possible to record all the desired information
The ER model has similarities to other modeling languages like UML
The major components of the ER Model are
▪ Entities such as people, cars, accounts, things, …
▪ Attributes that describe the entities, e.g. name, age, amount, date, …
▪ Relationships that connect the entities, e.g. customer owns account, student takes course
▪ Constraints which restrict relationships, e.g. an account must be owned by a customer
Entity-relationship diagrams show the structure of a database graphically
▪ Simple symbols: rectangles, diamonds, ovals and lines represent the components of the ER model
▪ They are straightforward and easy to explain to users
There are many variations of ER diagrams
▪ So don't expect the symbols in every ER diagram you see to be exactly the same!
▪ Some common variations are discussed at the end of this presentation
And Attributes
An entity is a distinguishable object
▪ That is, it can be distinguished from other objects
▪ e.g. customer Buffy Summers, or course CMPT 354
May be concrete or abstract An entity set is a collection of entities of the same
type
▪ e.g. all of the customers, or all of SFU's courses
Entity sets need not be disjoint
▪ That is, two entity sets can contain the same entity
▪ e.g. Buffy Summers could be an entity in both the Customerand Employee entity sets
An entity is described by a set of attributes
▪ Most of the data stored in a database are attribute values
▪ If there is no value for an attribute it is given the null value
▪ Some attributes may have their values derived from other attributes
Each attribute has a domain
▪ A set of all possible attribute values
▪ Attributes may be considered as functions that map an entity set to a domain
Customer
lastName birthday
incomefirstName
Entity Set
Attributes
A database is usually designed starting with a high level description from the users
▪ Who are unlikely to be familiar with relational databases
Most attributes are single valued data
▪ First name, student ID, height, grade, …
Attributes may also be composite, multi-valued or derived
Composite attributes are divided into sub-parts
▪ e.g. address is composed of city, street and number
They group related attributes together
Composite attributes should be replaced with their sub-parts
▪ This makes it easier to derive a set of tables from the ERD
city street
number
address
A single entity may have multiple attributes all of the same type – a multi-valued attribute
▪ e.g. phone numbers or cars
Multi-valued attributes should usually be replaced by an entity set
▪ And a relationship created to represent the connection between the two sets
The value of a derived attribute can be derived from other values
▪ Belonging to related attributes or entities
▪ e.g. employeeCount for a department
▪ Calculated by counting the number of employees in the department
Derived attributes do not need to be stored in the database
▪ They can be calculated when required
▪ Indicated in the ERD with a broken line around the oval
empCount
Differences between entities in an entity set are expressed in terms of their attributes
▪ That is, you cannot have two different entities that have the same values for every one of their attributes
▪ As they would represent the same real-world entity (but see weak entity sets later)
▪ In contrast to OOP where every object has a unique object ID independent from the values of its variables
A key is a set of attributes whose values uniquelyidentify an entity in an entity set
Consider CMPT 354 D200, with attributes:▪ Department – CMPT▪ Number – 354▪ Offering – D200▪ Class number – 7629▪ Term – Fall 2017▪ Instructor – John Edgar▪ Exam date and time – 13/12/2016, 9:30▪ Enrollment – 86▪ Classroom – SUR5140▪ Meeting times – MWF, 9:30 am▪ …
Note that his table is neither in 3rd
Normal Form nor BCNF – but we havenot covered normalization yet …
A superkey is any set of attributes whose values
uniquely identify an entity in an entity set
▪ An entity set can have many different superkeys
▪ Additional superkeys can be created by taking a key attribute
and adding other attributes
For example, SIN is a superkey for the Citizen
entity set
▪ {SIN, last name} is also a superkey
▪ As is any set of attributes that includes SIN
A superkey is any set of attributes that uniquely identifies a course offering
▪ Class number*
▪ Department, number, offering, term, instructor
▪ Instructor, term, meeting times, exam date
Some of the attributes in the above examples are not required to identify course offerings
*Let's assume that this is true – in practice we mayre-use these numbers from semester to semester
A candidate key is a minimal superkey, that is a superkey with no extraneous attributes
▪ Extraneous means unnecessary
▪ It is not the superkey with the fewest attributes
▪ In the Citizen superkey {SIN, last name} the last name attribute is not required to identify citizens
▪ And is therefore extraneous
A relation can have more than one candidate key
A candidate key is a minimal superkey
▪ Class number
▪ Department, number, offering, term
▪ There can be only one offering of CMPT 354 D200 in any one term
▪ Instructor, term, meeting times
▪ I can't be assigned to teach two courses in the same term at the same time
▪ At least, I hope not
A primary key is the candidate key chosen to
refer to rows in a table
▪ Chosen by the database designer
▪ A table may have only one primary key
▪ Primary key attributes are underlined in an ERD
CMPT 354 D200 primary key
▪ Pick one of the candidate keys
▪ Probably class number
A primary key must be a candidate key
▪ i.e. a minimal superkey
The primary key should be chosen so that its attributes never (or very rarely) change
▪ Including an address as part of a primary key is therefore not recommended
▪ SINs make good primary keys
It is sometimes useful to generate a unique primary key for entities
▪ Which is what class number is
A relationship is an association between two or
more entities
A relationship set is a set of relationships of the
same type
Relationship sets may have descriptive attributes
▪ Such attributes cannot be part of a relationship set’s
primary key
▪ Useful for properties that can’t be associated with
either of the participating entity sets
An n-ary relationship set R is an association between n entity sets E1 … En
▪ R is a subset of {(e1, …, en) | e1 E1, …, en En}
Binary relationships are the most common
▪ i.e. relationships between two entities
▪ Which can be the same entity!
▪ Ternary relationships are not unusual
The role of an entity is the function that it plays in a relationship
Descriptive Attribute
start
Employee
salary
name
sinbudget
projectName
Project
cost
worksOn
start date is when an employee started working on aproject; an employee can work on many projects so itmust be an attribute of the relationship
Note the “verb” name
manages
sin
salary
Employee
firstName
managersubordinate
lastName
roles
A relationship is an association between two* entities Let there be a binary relationship, R, between two
entity sets, A and B
▪ Mapping cardinalities specify how many entities of set Bmay be associated with one entity in set A, and vice versa▪ Or, the number of relationships of set R that may be participated in
by one entity in set A (or B)
▪ These two statements mean the same for binary relationships only
▪ A key constraint exists between A and R if A entities can only be associated with a single B entity
*or more
An entity in entity set Acan be associated with at most one entity in entity set B and
▪ An entity in entity set B can be associated with at most one entity in entity set A
Key constraints are specified in an ER diagram by directed lines (arrows)
A BR
one-to-one
A B
An entity in entity set Acan be associated with any numbers of entities in entity set B and
▪ An entity in entity set B can be associated with at most one entity in entity set A
The directed line from B indicates the one key constraint
A BR
one-to-many
A B
Read one-to-many as an abbreviatedsentence (one a can be related to many bs)not as a "one" side and a "many" side
many-to-one
A B
An entity in entity set Acan be associated with at most one entity in entity set B▪ An entity in entity set B can
be associated with any numbers of entities in entity set A and
It's the same as one to many but with the roles reversed
A BR
an a can be associatedwith at most one b
A B
many-to-many
An entity in entity set Acan be associated with any numbers of entities in entity set B and▪ An entity in entity set B
can be associated with any numbers of entities in entity set A and
An unconstrained relationship
A BR
Mapping cardinalities are shown in a relationship set by directed and undirected lines
▪ Arrows point from an entity set to the relationship set
In a relationship R between entity sets A and B, an arrow from A to R indicates that
▪ Entities in A can be involved in only one relationship in R▪ i.e. be related to at most one entity in entity set B
An employee can work in only one branch
A branch can have many employees working in it
A branch can havemany employees butan employee can workin only one branch
worksInEmployee
budget
branchName
salary
name
sin
Branch
address
many-to-one
employee
branch
Basic approach to creating a relational DB
▪ Each relationship and entity is represented by a separate table
▪ It’s a little more complex than this …
Relationships in an ERD should not be given attributes
▪ Except descriptive attributes
▪ Descriptive attributes do not form part of the primary key
▪ They inherit the primary key attributes of the entities that participate in the relationship
The primary key of a relationship set depends on the key constraints in the relationship set▪ Many-to-many – all the relationship set's non-descriptive
attributes ▪ A compound primary key consisting of the primary keys from both
entities
▪ One-to-many or many-to-one – the primary key for the entity set with the key constraint
▪ One-to-one – the primary key of either entity set Primary key attributes of a relationship set are derived
from one or more of its participating entity sets▪ And cannot include descriptive attributes that belong directly to
the relationship set
The attributes of worksIn are sin and branchName
▪ They should not be shown as attributes of the relationship set in the ERD
▪ The relationship set has no descriptive attributes
The primary key of worksIn is sin
▪ Note that {sin, branchName} is a superkey but not a candidate key
▪ Employees can only work in one branch, so can appear only once in
worksIn
worksInEmployee
budget
branchNamesalary
name
sin
Branch
address
Indicate that each entity in an entity set must be involved in at least one relationship
Participation is said to be either total (there is a constraint) or partial (no constraint)
▪ Don't derive the existence of a participation constraint from a current instance▪ If there is no participation constraint all the entities may still be
involved in the relationship
Total participation is indicated by a double line from the relationship to the entity
▪ Or a thick line
Each account must be owned by at least one customer
ownsCustomer
accountNumber
type
birthdate
income
sin
firstName
lastName
Account
balance
A law firm’s lawyers are assigned to represent clients interests
representsLawyer Client
Children attend pre-school, assume that▪ A pre-school must have children
▪ Children can only attend one pre-school
attendsChild Pre-school
Employees work for companies▪ Employees must be employed by someone▪ Or they wouldn’t be employees
▪ People may have more than one job
worksForEmployee Company
Chief Executive Officer (CEOs) of a company▪ Being a CEO is a full time position and
▪ A company can only have one CEO
▪ One person must be in charge▪ So no coops or workers’ collectives
is_CEOEmployee Company
Bank accounts and branches
▪ A branch can have many accounts
▪ An account must be held at a single branch
holdsBranch Account
Rooms in a house
▪ This one should be easy …
But, how do we identify a room?
▪ What are its attributes and its primary key?
containsHouse Room
A relationship does not have to be binary but can include any number of entity sets
▪ Ternary relationships are not uncommon
Specifying the mapping cardinalities of non-binary relationships can be complicated
A key constraint indicates that an entity can participate in only one relationship
▪ Just like with binary relationships
▪ The precise meaning varies between ERD versions
purchasesBranch
company
branchName
branchPhone
budget
street
city
Supplier
supplierPhone
partID
Part
partName
a branch purchasesparts from suppliers
there are no cardinalityconstraints, implying that abranch can order the samepart from multiple suppliersthe ternary relationship
records which suppliersparts are bought from
The key constraintmeans that a branchcan buy a single partfrom a single supplier …
Is it possible to represent thata branch can buy multipleparts but must buy each partfrom only one supplier?
… which makes nosense whatsoever …
purchasesBranch
company
branchName
branchPhone
budget
street
city
Supplier
supplierPhone
partID
Part
partName
Weak Entity Sets
▪ Where the entities cannot be uniquely identified without information from a related entity set
Subclasses
▪ Class hierarchies of entities
Aggregation
▪ Used to model relationships that exist between entities and relationships
A weak entity cannot be identified by its own attributes alone
▪ A member of a weak entity set is identified by combining its partial key with the primary key of another entity set
▪ The other entity set is referred to as the owner entity set
Weak entity sets are permitted only when
▪ The owner and weak entity set participate in a one-to-many identifying relationship set and
▪ The weak entity set has total participation in the identifying relationship
ContainsHouse Room
name
Identifies the partial key
number street
city
squareFeet
Identifies the weak entity set
contains
It may be useful to classify the entities in an entity set
into subclasses
▪ Similar to subclassing in OOP
▪ Each entity in a subclass is also an entity in the superclass
The attributes of the superclass entity are inherited by
the subclass entities
▪ The subclass entity may also have additional attributes
Class hierarchies can have multiple levels
The subclass relationship is sometimes referred to as an “ is a” relationship
▪ A “is a” B (A is the subclass, B the superclass)
▪ A specializes B or
▪ B generalizes A
Specialization is the process of identifying subsets with additional attributes from an existing entity set
Generalization is the process of identifying common characteristics (attributes) of entity sets
▪ And creating a new parent entity set with those attributes
The attributes of parent entity sets are inherited by their subclasses
▪ A subclass entity set is therefore described by its attributes and the attributes of its superclass(es)
Subclass entities also inherit participation in superclass relationship sets
▪ Since an entity in a subclass is the same entity as one of the superclass entities
▪ e.g. Xander Harris is an employee and a foreman
Building
RentalProperty
SRO
number street
city
units
HeritageBuilding
ageISA
Identifies a class hierarchy
The participation of entities in a subclass may be condition-defined or user-defined
If participation is condition-defined an entity is only a member of a subclass if it meets some condition
▪ e.g. only accounts with type savings are included in the savings_account entity set
If participation is user-defined an entity is assigned to a subclass by the database user
▪ e.g. employees are assigned to the manager subclass by the database user (presumably when they are promoted)
It is often useful to specify whether or not an entity can belong to more than one subclass
Subclasses are either disjoint or overlapping
▪ If subclasses are disjoint, entities in one subclass cannot appear in another subclass▪ e.g. The animal subclasses Bird and Mammal
▪ If subclasses overlap then entities in one subclass may also appear in another subclass▪ e.g. The Person subclasses Customer and Employee
By default subclasses are disjoint
▪ If overlap is allowed this should be noted on the ERD
A coverage constraint indicates that each superclass entity must belong to a subclass
▪ Can be specified on the ER diagram by drawing a double line from the superclass to the triangle
▪ e.g. a Vehicles class has cars and boats subclasses without a coverage constraint
▪ That is, there may be vehicles that are neither cars nor boats
So that attributes in common don’t have to be re-defined for each subclass
So that additional descriptive attributes can be added to subclasses
To identify the set of entities that participate in a particular relationship
▪ i.e. subclasses can be created to identify a relationship with another entity set
Consider the ternary relationship shown below A branch can buy multiple parts from multiple suppliers But branches can only purchase a part from one supplier
▪ The Mordor branch buys helmets from the Haradrim and
▪ Cleavers from the goblins
▪ But can't buy helmets from another supplier (the goblins for example)
purchasesBranch
Supplier
Part
Adding a key constraint to anyof the participating entities willnot capture this situation
We can replace the ternary relationships with two or three binary relationships
▪ But we still haven’t captured the constraint
The relationship pairs are not forced to relate to each other
▪ Mordor may buy helmets and buy (something) from the Haradrim
▪ But the Haradrim may not make helmets!
Branch
Supplier
Part
Again, adding a key constraintto any of the participatingentities will not capture theconstraint
makes
uses
buys
Indicates that a relationship set participates in
another relationship set
▪ An abstraction that treats relationship sets as higher-level
entities
▪ For the purpose of participation in other relationships
When should aggregation be used?
▪ When there is a relationship between an entity set and
another relationship
▪ Aggregation is often used with non-binary relationships
Treat the buys relationship as an aggregate entity
▪ Each buys relationship can be associated with just one supplier▪ Mordor can only buy helmets from one supplier but can buy cleavers from a
different supplier
usesBranch
Supplier
Part
buys
A rectangle is drawnaround the relationshipand its entities toindicated that it isbeing treated as anaggregate entity
Faithfulness
▪ The design must be faithful to the specification and the enterprise that is being represented
▪ All the relevant aspects of the enterprise should be represented in the model
Avoid redundancy
▪ Redundant representation makes the ER diagram harder to understand
▪ Redundancy wastes storage in the DB and
▪ May lead to inconsistencies
Simplicity
▪ The simpler it is, the easier it is to understand
▪ Avoid introducing unnecessary entities or relationships
▪ Where possible use attributes rather than entity sets or relationships
Specify as many constraints as possible
▪ Ensure that all key constraints and participation constraints are specified
▪ Some constraints cannot be shown in ER diagrams
Entity set or attribute? Entity set or relationship set? What sort of relationship is it?
▪ Multiple binary relationships
▪ Ternary relationship
▪ Aggregation
The distinction between entity set and attributes depends mostly on what is being modeled
▪ In one situation some object may be an attribute, and an entity in another
▪ e.g. consider an office
▪ if only the office number is to be recorded it would be appropriate to make it an attribute
▪ if other data about the office (such as its dimensions) are to be recorded it makes sense to make it an entity
One important consideration is that attributes that belong to relationships must be descriptive
▪ That is, they cannot form part of the primary key
An employee can, over time, work in more than one branch and this historical data is to be maintained
What possible situation can this ERD not capture?
▪ An employee may work in the same branch twice
worksInEmployee
branchID
budgetsalary
sin
firstName
lastName
Branch
branchName
from to
from
Duration
to
The Duration entity set may seem arbitrary butthis ERD captures the required data
works_InEmployee
branchID
budgetsalary
sin
firstName
lastName
Branch
branchName
Usually relationship sets are verbs, and entity sets nouns but this sometimes does not work!
▪ e.g. is a loan an entity in its own right or is it a relationship between borrower and lender?
A relationship set with descriptive attributes may be more appropriately modeled as an entity set
▪ Consider two entity sets, Employee and Project and
▪ A relationship set called manages which includes▪ a startDate attribute (the date the manager started), and
▪ a discretionary budget for the manager
▪ startDate and budget are descriptive attributes of manages
The ERD shown below works as long as the manager gets a separate budget for each project he/she manages
But what if the manager gets one budget regardless of the number of projects?
▪ The ERD leads to redundancy and
▪ Is misleading
managesEmployee Project
budgetstartDate
In this version we associate the budget attribute with the Employee entity set▪ As each manager only has one budget
But, most employees are not managers▪ So most Employee records will have a null value for their budget
attribute, which is undesirable
managesEmployee Project
budget
startDate
One solution is to create an entity to record the attributes associated with managing a project
budgetManager
managerID
Another solution is to subclass Employee
managesEmployee Project
startDate
Some relationships may appear to be non binary
▪ But can be better represented by several binary relationships
It is always possible to replace a non binary relationship with multiple binary relationships
▪ This may entail creating identifying attributes, and
▪ It may not be possible to translate constraints on a ternary relationship
rA B
C
a new entity (E) is created, which also entails creating an identifying attribute for E
rAA E
C
BrB
rC
Consider this situation – we are to record information about the prices of parts for a company's branches
▪ There are entity sets for Part, Branch and Supplier
A descriptive attribute, price, is required on the ternary relationship contract
▪ Different branches may get different prices for the same part from the same supplier
▪ The same branch always gets the same price for the same part from a particular supplier (but not different suppliers)
Can this be captured by binary relationships?
cP S
B
the new entity, contract, has aunique ID and is given thedescriptive attribute, price
rPP C
B
SrS
rB
price id
price
the cardinality constraints showthat each contract has just onebranch, part, and supplier
the participation constraints show that a contract must have a branch, part, and supplier
But, input errors are more likely: contract {b1, p1, s1} now requires three separate relationships
There are a number of alternative ER notations
▪ The most significant differences relate to how relationships and their cardinalities are expressed
▪ It is important to recognize this, and to check which version of an ER diagram is being used
Some versions allow composite and multi-valued attributes
▪ This increases the expressive power of the model
▪ But makes it harder to translate the diagrams into a relational schema
RA B
RA B
many-to-many
RA B
RA B
RA B
one-to-one
RA B
RA B
RA B
many-to-one
RA B
* * 1 1 * 1
RA B RA B RA BM N 1 1 N 1
ER diagrams allow more complex cardinality constraints to be specified
Each line between entity and relationship set can be annotated
▪ With the minimum and maximum cardinality
▪ Shown as l..h (lower to higher)▪ l is the minimum and h the maximum number of relationships that
an entity can be involved in
▪ A minimum value of 1 indicates total participation
▪ A maximum value of * indicates no limit
0..*1..1
An employee must work in exactly one branch A branch may employ any number of employees,
including zero
works_inEmployee
budget
branchNamesalary
name
sin
Branch
city
Conceptual design follows requirements analysis ER model often used for conceptual design
▪ Relatively easy to understand and explain
▪ There are many variations of this model
Basic ideas: entities, relationships and attributes (of entities and relationships)
Additional features
▪ Weak entity sets
▪ Subclasses
▪ Aggregations
The models can express several kinds of integrity constraints:
▪ Cardinality constraints, participation constraints, and overlap and coverage constraints for class hierarchies
Some constraints cannot be expressed in the ER model
▪ Constraints are important when designing a DB
ER design is subjective, no one best way
▪ Although there can be many poor ways!
ER diagrams are mapped to the schema of the chosen data model (e.g. relational model)