relational database design - nwfps...1 relational database for forest inventory data structuring dr....
TRANSCRIPT
1
Relational Database for Forest
Inventory Data structuring
Dr. Fouad MOUNIR ([email protected])
National Forest School of Engineers – Salé
Morocco
ENFI
Objectives
Understand definition of modern
relational database
Understand and be able to apply a
practical method for designing databases
Recognize and avoid common pitfalls of
database design
3
Phases of Database Design
Conceptual design begins with
the collection of requirements and
results needed from the database
(ER Diag.)
Logical schema is a description of
the structure of the database
(Relational, Network, etc.)
Physical schema is a description
of the implementation (programs,
tables, dictionaries, catalogs
Data
Requirements
Conceptual
Design
Logical
Design
Physical
Design
Conceptual Schema
Specification of requirements
and results
Logical Schema
Physical Schema
4
Models
A data model is a collection of objects
that can be used to represent a set of data
and operations to manipulate the data
Conceptual models are tools for
representing reality at a very high-level of
abstraction
Logical models are data descriptions that
can be processed by computers
What’s a database?
A collection of logically-related information
stored in a consistent fashion
The storage format typically appears to users as
some kind of tabular list (table, spreadsheet)
What Does a Database Do?
Stores information in a highly organized manner
Manipulates information in various ways, some
of which are not available in other applications
or are easier to accomplish with a database
Models some real world process or activity
through electronic means
◦ Often called modeling a business process
◦ Often replicates the process only in appearance or
end result
Databases and the Systems which
manage them Modern electronic databases are created
and managed through means of RDBMS: Relational DataBase Management Systems
An individual data storage structure created with an RDBMS is typically called a “database”
A database and its attendant views, reports, and procedures is called an “application”
Database Applications
Database (the actual DB with its
attendant storage structure)
SQL Engine - interprets between the
database and the interface/application
Interface or application – the part the
user gets to see and use
Relational Database
Management Systems Mid-level
◦ Microsoft Access, Lotus Approach, Borland’s Paradox
◦ More or less total control of design allows custom builds
High-end
◦ Oracle, Microsoft SQL Server, Sybase, IBM DB2
◦ Professional level DBs: Banks, e-commerce, secure
◦ Amazon.com, Ebay.com, Yahoo.com
Conceptual design :
Entity/Relationship model
Problems with Bad Design
Early computers were slow and had
limited storage capacity
Redundant or repeating data slowed
operations and took up too much
precious storage space
Poor design increased chance of data
errors, lost or orphaned information
Benefits of Good Design
Computers today are faster and possess much larger storage devices
Rigid structure of modern relational databases helped codify problems and solutions
Design problems are still possible, because the DBMS software won’t protect you from poor practices
Good design still increases efficiency of data processes, reduces waste of storage, and helps eliminate data entry errors
Modification Anomalies
A search for “General Tool Co.” would miss “General Tool” and “General Toll”. A case-sensitive search for “Totally
Toys” would miss “TOTALLY TOYS”
Customer OrderNum ItemNum Item
General Tool 07456 2246 Pentium Computer
General Toll 08622 3145 HP Printer
General Tool Co. 08622 3967 17” monitor
Totally Toys 06755 2246 Pentium computer
TOTALLY TOYS 08134 3145 Hewlett-Packard Printer
XYZ Inc. 09010 0446 Dot Matrix Printer
Customers_Orders_Inventory
Insertion Anomalies
How would you enter a new item into your
inventory if no one had ordered it yet?
Customer OrderNum ItemNum Item
General Tool 07456 2246 Pentium Computer
General Toll 08622 3145 HP Printer
General Tool Co. 08622 3967 17” monitor
Totally Toys 06755 2246 Pentium computer
TOTALLY TOYS 08134 3145 Hewlett-Packard Printer
XYZ Inc. 09010 0446 Dot Matrix Printer
Customers_Orders_Inventory
Deletion Anomalies
If you wanted to stop selling “dot matrix printer” and remove it from your inventory, you would have to delete the order and customer info for “XYZ Inc.”
Customer OrderNum ItemNum Item
General Tool 07456 2246 Pentium Computer
General Toll 08622 3145 HP Printer
General Tool Co. 08622 3967 17” monitor
Totally Toys 06755 2246 Pentium computer
TOTALLY TOYS 08134 3145 Hewlett-Packard Printer
XYZ Inc. 09010 0446 Dot Matrix Printer
Customers_Orders_Inventory
The Fix
OrderNum ItemNum
06755 2246
07456 2246
08134 3145
08622 3145
08622 3967
09010 0446
CustomerNum OrderNum
7822 09010
8755 06755
8755 08134
9123 07456
9123 08622
CustomerNum Customer
7822 XYZ Inc.
8755 Totally Toys
9123 General Tool Co.
ItemNum Item
0446 Dot Matrix Printer
2246 Pentium Computer
3145 Hewlett-Packard printer
3967 17” monitor
Order_Items Orders
Customers
Products
Database Modeling
Refers to various, more-or-less formal
methods for designing a database
Some provide precision steps and tools
◦ Ex.: Entity-Relationship (E-R) Modeling
Widely used, especially by high-end database
designers who can’t afford to miss things
Fairly complex process
Extremely precise
Entity/Relationship model
It is mainly based on three fundamental concepts.
Entity type
Attribute
Relationship
Be sure to Limit the Scope of the database.
19
Purpose of E/R Model
The E/R model allows us to sketch the
design of a database informally.
Designs are pictures called entity-
relationship diagrams.
Fairly mechanical ways to convert E/R
diagrams to real implementations like
relational databases exist.
20
Entity Type
Entity = “thing” or object.
Entity type = collection of similar entities.
◦ Similar to a class in object-oriented languages.
Attribute = property of an entity type.
◦ Generally, all entities in a set have the same properties.
◦ Attributes are simple values, e.g. integers or character strings.
21
Attribut
Attribute = property of an entity type.
◦ Generally, all entities in a set have the same properties.
◦ Attributes are simple values, e.g. integers or character strings.
22
Types of Attributes
Simple
◦ Each entity has a single atomic value for the attribute. For example, forest_name.
Composite
◦ The attribute may be composed of several components. For example, Address (Apt#, House#, Street, City, State, ZipCode, Country) or Name (FirstName, MiddleName, LastName). Composition may form a hierarchy where some components are themselves composite.
Multi-valued
◦ An entity may have multiple values for that attribute. For example, authors of a book.
23
Example
Entity Forest Forest has two attributes, name
and nbr (number).
Each Forest entity has values for these two
attributes, e.g. (Maamora, 15)
Forests
Name : String
Nbr: Numeric
24
E/R Diagrams
In an entity-relationship diagram, each entity
type is represented by a rectangle.
Each attribute of an entity type is a string
representing the name if the attribute. It is
located in the second part of the rectangle
representing the entity type.
Identify the Key Fields
Primary Key(s)
◦ Can never be Null; must hold unique values
◦ Automatically indexed in most RDBMSs
◦ Values rarely (if ever) change
◦ Try to include as few fields as possible
Multi-field Primary Key
◦ Combination of two or more fields that uniquely identify an individual record
Candidate Key
◦ Field or fields that qualify as a primary key
◦ Important in Third and Boyce-Codd Normal Forms
26
Relationships
A relationship connects two or more
entity sets.
It is represented by a diamond or an oval
form, with lines to each of the entity sets
involved.
27
Example
Forest is divided into parcels.
Parcel is composed of some celles.
0,n
1,1
1,1
1,n
1,n
1,1
Foret
num_foret
forest_name
area
manage_plan
periode_ameneg
beging_mana
climat
Commente
<pi> Number (4)
Text (25)
Decimal (6,2)
Characters (1)
Number (2)
Number (4)
Text
Text (500)
<M>
Identifier_1
...
<pi>
Parcelle
num_parcelle
area
status
appelation
<pi> Number (4)
Decimal (6,2)
Text (10)
Text (40)
Identifier_1 <pi>
Cellule
num_cellule
area
objet_carto
<pi> Number (4)
Decimal (6,2)
Number (2)
Identifier_1 <pi>
Divided
Subdivisée
Contains
composed
Identify Entities type Relationships
Based on business rules being
modeled
Examples:
◦ “each customer can place many orders”
◦ “all employees belong to a department”
◦ “each TA is assigned to one course”
Relationship Terminology
Relationship Type
◦ One-to-one: expressed as 1:1
◦ One-to-Many: expressed as 1:N or 1:M or 1:∞
◦ Many-to-Many: expressed as N:N or M:M
Primary or Parent Table
◦ Table on the left side of 1:N relationship
Related or Child Table
◦ Table on the right side of 1:N relationship
Relational Schema
◦ Diagram of table relationships in database
30
Many-Many Relationships
Think of a relationship between two entity
type, such as Composed between Forests and
Parcelle.
In a many-many relationship, an entity of the
first set can be connected to many entities of
the other set.
◦ E.g., a Parcel can be composed of many Species; a
specie can be in the composition of many Parcels.
31
Example
1,n
1,n
Foret
num_foret
forest_name
area
manage_plan
periode_ameneg
beging_mana
climat
Commente
<pi> Number (4)
Text (25)
Decimal (6,2)
Characters (1)
Number (2)
Number (4)
Text
Text (500)
<M>
Identifier_1
...
<pi>
Strata
str
strat_name
date_photointer
nbr_unite
<pi> Number (3)
Text (19)
Date
Number (2)
Identifier_1 <pi>Divided
32
Many-One Relationships
Some binary relationships are many -one from
one entity type to another.
Each entity of the first set is connected to at
most one entity of the second set.
But an entity of the second set can be
connected to zero, one, or many entities of the
first set.
33
Example
Divided, from Forests to Parcels is many-one.
A Parcel belongs to at most one specific Forest.
But a Forest can be divided to any number of
Parcels, including zero.
34
Example
1,n
1,1
Foret
num_foret
forest_name
area
manage_plan
periode_ameneg
beging_mana
climat
Commente
<pi> Number (4)
Text (25)
Decimal (6,2)
Characters (1)
Number (2)
Number (4)
Text
Text (500)
<M>
Identifier_1
...
<pi>
Parcelle
num_parcelle
area
status
appelation
<pi> Number (4)
Decimal (6,2)
Text (10)
Text (40)
Identifier_1 <pi>
composed
35
One-One Relationships
In a one-one relationship, each entity of either entity set is related to at most one entity of the other set.
Example: Relationship Responsible-of between entity sets Forests and Managers.
◦ A Forest cannot be managed by more than one manager, and no manager can have more than one forest under his responsibility.
36
Example
1,11,1
Foret
num_foret
forest_name
area
manage_plan
periode_ameneg
beging_mana
climat
Commente
<pi> Number (4)
Text (25)
Decimal (6,2)
Characters (1)
Number (2)
Number (4)
Text
Text (500)
<M>
Identifier_1
...
<pi>
Managers
Code
name
degree
adress
phone_nbr
Number (8)
Text
Text
Text
Number
Responsible-
of
37
In Pictures:
many-many many-one one-one
38
Representing “Multiplicity”
Show a many-one relationship by an arrow entering
the “one” side or use a cardinality that is a couple of
number (n,n).
Show a one-one relationship by just a line entering
both entity sets use a cardinality that is a couple of
number (1,1).
In some situations, we can also assert “exactly one,”
i.e., each entity of one set must be related to exactly
one entity of the other set. To do so, we use a
rounded arrow.
Example
1,n
1,1
Foret
num_foret
forest_name
area
manage_plan
periode_ameneg
beging_mana
climat
Commente
<pi> Number (4)
Text (25)
Decimal (6,2)
Characters (1)
Number (2)
Number (4)
Text
Text (500)
<M>
Identifier_1
...
<pi>
Parcelle
num_parcelle
area
status
appelation
<pi> Number (4)
Decimal (6,2)
Text (10)
Text (40)
Identifier_1 <pi>
composed
Naming Conventions
Rules of thumb
◦ Table names must be unique in DB; should be plural
◦ Field names must be unique in the table(s)
◦ Clearly identify table subject or field data
◦ Be as brief as possible
◦ Avoid abbreviations and acronyms
◦ Use less than 30 characters,
◦ Use letters, numbers, underscores (_)
◦ Do not use spaces or other special characters
41
Weak Entity type
Occasionally, entities of an entity type
need “help” to identify them uniquely.
Entity type E is said to be weak if in
order to identify entities of E uniquely,
we need to follow one or more many-one
relationships from E and include the key
of the related entities from the connected
entity type.
42
Example
number is almost a key for parcel, but there
might be two with the same number.
number is certainly not a key, since parcels on
two forests could have the same number.
But number, together with the Forest related
to the parcel by Divided should be unique.
43
In E/R Diagrams
Parcels Forests Divided-to
number name number
• Double diamond for supporting many-one relationship. • Double rectangle for the weak entity type.
44
Weak Entity-Type Rules
A weak entity type has one or more many-one
relationships to other (supporting) entities type.
◦ Not every many-one relationship from a weak entity
type need be supporting.
The key for a weak entity type is its own
underlined attributes and the keys for the
supporting entity sets.
◦ E.g., parcel-number and forest-name is a key for Parcels
in the previous example.
45
How to construct a conceptual model
The construction of a conceptual model can be done as follow:
Identify the list of entities type
For each entity type:
◦ Establish the list of it’s attribute;
◦ From this list, identify the entity identifier.
Determine the relationships between the entities type;
For each relationship:
◦ Write down the list of it’s attributes;
◦ Determine the dimension of the relationship (binary, multi-way, …);
◦ Establish the cardinalities;
Verify the obtained model:
◦ Eliminate the transitivity;
◦ Be sure that all the entities in the schema are connected;
◦ Be sure that it satisfy the questions.
Validate with the users.
46
Design Techniques
1. Avoid redundancy.
2. Limit the use of weak entity sets.
3. Don’t use an entity set when an
attribute will do.
47
Avoiding Redundancy
Redundancy occurs when we say the
same thing in two different ways.
Redundancy wastes space and (more
importantly) encourages inconsistency.
◦ The two instances of the same fact may
become inconsistent if we change one and
forget to change the other, related version.
Logical design :
logical model
Logical Database Design
Based upon the conceptual data model
Four key steps
1. Develop a logical data model for each known user interface for the application using normalization principles.
2. Combine normalized data requirements from all user interfaces into one consolidated logical database model (view integration).
3. Translate the conceptual E-R data model for the application into normalized data requirements.
4. Compare the consolidated logical database design with the translated E-R model and produce one final logical database model for the application.
What Is Logical Data Modeling
Translating conceptual data models into a
format consistent with the architecture
used by the data management software to
be used with the application
Normalization
◦ analysis of functional dependencies between
data items to result in a structure of data that is
simple, stable, and fundamental
Functional Dependency
For a relation (table), attribute A depends
on attribute B if for every valid row the
value of B determines the value of A
B A
E.g.
◦ Student ID Student name
◦ Order No + Product No Quantity ordered
Normalization
Normal Forms (NF): design standards based on database design theory
Normalization is the process of applying the NFs to table design to eliminate redundancy and create a more efficient organization of DB storage.
Each successive NF applies an increasingly stringent set of rules
Normal Forms First normal form
◦ No multi-valued attributes.
◦ Every attribute value is atomic.
Second normal form
◦ 1NF and every non-key attribute is fully functionally dependent on the primary key.
◦ Every non-key attribute must be defined by the entire key, not by only part of the key.
◦ No partial functional dependencies.
Third normal form
◦ 2NF and no transitive dependencies (functional dependency between non-key attributes.)
Sample 1NF Violation - 1
EmployeeID Name Project Time
EN1-26 Sean O’Brien 30-452-T3, 30-
457-T3, 32-
244-T3
0.25, 0.40, 0.30
EN1-33 Amy Guya 30-452-T3, 30-
382-TC, 32-
244-T3
0.05, 0.35, 0.60
EN1-35 Steven Baranco 30-452-T3, 31-
238-TC
0.15, 0.80
Employee_Projects_Time
Tables in 1NF
*EmployeeID LastName FirstName
EN1-26 O’Brien Sean
EN1-33 Guya Amy
EN1-35 Baranco Steven
*ProjNum EmployeeID Time
30-328-TC EN1-33 0.35
30-452-T3 EN1-26 0.25
30-452-T3 EN1-33 0.05
Employees
Employees_Projects
Sample 2NF Violation
*EmpID Lname Fname *ProjNum ProjTitle
EN1-25 O’Brien Sean 30-452-T3 STAR Manual
EN1-25 O’Brien Sean 30-457-T3 ISO Procedures
EN1-25 O’Brien Sean 31-124-T3 Employee
Handbook
EN1-33 Guya Amy 30-452-T3 STAR Manual
EN1-33 Guya Amy 30-482-TC Web site
Employees_Projects
Tables in 2NF
*EmployeeID LastName FirstName
EN1-26 O’Brien Sean
EN1-33 Guya Amy
Employees
*EmployeeID *ProjNum
EN1-26 30-452-T3
EN1-33 30-457-T3
Employees_Projects
*ProjNum Title
30-452-T3 STAR manual
30-457-T3 ISO procedure
Projects
Sample 3NF Violation
*ProjNum ProjTitle ProjMgr Phone
30-452-T3 STAR Manual Garrison 2756
30-457-T3 ISO Procedures Jacanda 2954
30-482-TC Web Site Friedman 2846
31-124-T3 STAR prototype Garrison 2756
35-272-TC Order System Jacanda 2954
Projects_Managers
Tables in 3NF
*ProjNum ProjTitle Manager
30-452-T3 STAR manual Garrison
30-457-T3 ISO procedures Jacanda
Projects
*Manager Phone
Garrison 2846
Jacanda 2756
Project Managers
Transforming E-R Diagrams into
Relations
1. Map Regular Entities to Relations.
◦ Composite attributes: Use only their simple,
component attributes.
◦ Multi-valued Attribute - Becomes a separate
relation with a foreign key taken from the
superior entity.
Transforming E-R Diagrams Into
Relations
2. Map Weak Entities
◦ Becomes a separate relation with a foreign
key taken from the superior entity.
Transforming E-R Diagrams Into
Relations
3. Map Binary Relationships ◦ One-to-Many - Primary key on the one side
becomes a foreign key on the many side
◦ Many-to-Many - Create a new relation with the primary keys of the two entities as its primary key
◦ One-to-One - Primary key on the mandatory side becomes a foreign key on the optional side
Transforming E-R Diagrams Into
Relations
4. Map Associative Entities
◦ Identifier Not Assigned
Default primary key for the association relation is
composed of the primary keys of the two entities
◦ Identifier Assigned
It is natural and familiar to end-users.
Default identifier may not be unique.
Transforming E-R Diagrams Into
Relations
5. Map Unary Relationships ◦ One-to-Many - Recursive foreign key in the same
relation
◦ Many-to-Many - Bill-of-materials: Two relations:
One for the entity type.
One for an associative relation in which the
primary key has two attributes, both taken from the
primary key of the entity
Transforming E-R Diagrams Into
Relations
6. Map Ternary (and n-ary) Relationships
◦ One relation for each entity and one for the
associative entity
That’s it for Table Design
Watch for repeating values and fields
Check against the Normal Forms
Make new tables when necessary
Re-check all tables against the NFs
Remember the business rules
Use common sense, but check anyway!
Ensuring Data Integrity
Placing constraints on how and when and where data can be entered
Done after or along with table design
Part of design process because many constraints are established at the database and table levels
Methods of Controlling Data Integrity
Default Value
◦ A value a field will assume unless an explicit value is entered for that field
Range Control
◦ Limits range of values that can be entered into field
Referential Integrity
◦ An integrity constraint specifying that the value (or existence) of an attribute in one relation depends on the value (or existence) of the same attribute in another relation
Null Value
◦ A special field value, distinct from 0, blank, or any other value, that indicates that the value for the field is missing or otherwise unknown
Referential Integrity
True relational databases support Referential Integrity: every non-null foreign key value must match an existing primary key value.
In other words, every record in a related table must have a matching record in the primary table.
Preserves the validity of foreign key values.
Enforced at database level.
Cascading Updates
When a primary key value changes,
Cascade Update changes the
corresponding values in the related
records, so no records get orphaned.
Usually only one level deep
◦ Foreign key is not usually primary key of
related table (except in 1:1 relationships)
hence no other tables are usually related to it
Cascade Deletes
When a primary table record is deleted, all matching records in any related table are also deleted
Can propagate through multiple tables if Cascade Delete is turned on in all relationships between those tables
Another protection against orphan records, only this time by eradicating them instead!
Levels of Enforcement
Referential Integrity enforced at database level because it affects relationship between two tables.
Many other business rules enforced at field and table level to ensure data integrity.
Business rule implementation should be documented: how and where it is enforced in the design.
Some rules can’t be enforced at table or field level; must be enforced in the application level.
Testing of Business Rules
Always test business rule implementation
◦ What happens when rule is met?
◦ What happens when rule is violated?
Not much good as a data entry constraint if it doesn’t constrain properly
Good application or interface design will provide feedback when user violates a constraint or rule
Field Level Integrity
Constraining by use of field properties
◦ Data type: text, number, Yes/No, Date/Time
◦ Field size
◦ Formats
Entry and editing constraints
◦ Required
◦ Indexed, with or without duplicates
◦ Input masks
◦ Default value
◦ Validation Rule
Table Level Integrity
Field Comparisons
◦ Compare value in one field to value in another
◦ Comparison performed before record is saved
◦ Violations could display an error message or force constraint of available values
Validation or Lookup Tables
◦ Store generally static set of values
◦ Stored values used to populate new records to ensure accuracy of data entry
Documentation
A good design deserves good documentation
Data Dictionary for database/table design
◦ Table and field names
◦ Table and field properties
◦ Relationships, including primary and foreign keys
◦ Indexes
Provide reasons for design features, especially if they intentionally violate normal design principles
Physical design :
physical model according to the
RDBMS chosen