overview of database design by nazife dimililer. database management system a dbms is a data storage...

51
Overview of Database Design By Nazife Dimililer

Post on 19-Dec-2015

223 views

Category:

Documents


3 download

TRANSCRIPT

Overview of Database Design

By

Nazife Dimililer

Database Management System

A DBMS is a data storage and retrieval system which permits data to be stored non-redundantly while making it appear to the user as if the data is well-integrated.

Database Management System

DBMS manages data resources like an operating system manages hardware resources

DBMSDBMS Databasecontainingcentralized

shared data

Application#1

Application#2

Application#3

Advantages of Database Approach

• Program-Data Independence – Metadata stored in DBMS, so applications don’t need to

worry about data formats– Data queries/updates managed by DBMS so programs

don’t need to process data access routines– Results in: increased application development and

maintenance productivity• Minimal Data Redundancy

– Leads to increased data integrity/consistency

Advantages of Database Approach• Improved Data Sharing

– Different users get different views of the data• Enforcement of Standards

– All data access is done in the same way• Improved Data Quality

– Constraints, data validation rules• Better Data Accessibility/ Responsiveness

– Use of standard data query language (SQL)• Security, Backup/Recovery, Concurrency

– Disaster recovery is easier

Costs and Risks of the Database Approach

• Up-front costs:– Installation Management Cost and

Complexity– Conversion Costs

• Ongoing Costs– Requires New, Specialized Personnel– Need for Explicit Backup and Recovery

• Organizational Conflict– Old habits die hard

The Range ofDatabase Applications

• Personal Database – standalone desktop database

• Workgroup Database – local area network (<25 users)

• Department Database – local area network (25-100 users)

• Enterprise Database – wide-area network (hundreds or thousands of users)

Evolution of DB Systems

• Flat files - 1960s - 1980s• Hierarchical – 1970s - 1990s• Network – 1970s - 1990s• Relational – 1980s - present• Object-oriented – 1990s - present• Object-relational – 1990s - present• Data warehousing – 1980s - present• Web-enabled – 1990s - present

Database Design Phases• Conceptual DesignModel the data without any physical considerations for

each user view.• Logical DesignChoose the data model that will be used and modify the

conceptual data model to fit the data model without any other physical considerations. Validate the model using normalization and transaction requirements.

• Physical DesignChoose the actual DBMS and implement the data model

efficiently. Performance, security and reliability are key issues.

Physical Database Design

• Purpose - translate the logical description of data into the technical specifications for storing and retrieving data

• Goal - create a design for storing data that will provide adequate performance and insure database integrity, security and recoverability

Physical Design Process

Normalized relations

Volume estimates

Attribute definitions

Response time expectations

Data security needs

Backup/recovery needs

Integrity expectations

DBMS technology used

Inputs

Attribute data types

Physical record descriptions (doesn’t always match logical design)

File organizations

Indexes and database architectures

Query optimization

Leads to

Decisions

Designing Fields

• Field: smallest unit of data in database

• Field design –Choosing data type–Coding, compression, encryption–Controlling data integrity

Field Data Integrity

• Default value - assumed value if no explicit value

• Range control – allowable value limitations (constraints or validation rules)

• Null value control – allowing or prohibiting empty fields

• Referential integrity – range control (and null value allowances) for foreign-key to primary-key match-ups

Denormalization• Transforming normalized relations into unnormalized

physical record specifications• Benefits:

– Can improve performance (speed) be reducing number of table lookups (i.e reduce number of necessary join queries)

• Costs (due to data duplication)– Wasted storage space– Data integrity/consistency threats

• Common denormalization opportunities– One-to-one relationship– Many-to-many relationship with attributes– Reference data (1:N relationship where 1-side has data not used

in any other relationship)

Systems Development Life Cycle

Project Identification and Selection

Project Initiation and Planning

Analysis

Physical Design

Implementation

Maintenance

Logical Design

Systems Development Life Cycle

Project Identification and Selection

Project Initiation and Planning

Analysis

Physical Design

Implementation

Maintenance

Logical Design

Purpose --preliminary understandingDeliverable –request for project

Database activity – enterprise modeling

First step in database developmentSpecifies scope and general contentOverall picture of organizational data, not specific designEntity-relationship diagramDescriptions of entity typesRelationships between entitiesBusiness rules

Systems Development Life Cycle

Project Identification and Selection

Project Initiation and Planning

Analysis

Physical Design

Implementation

Maintenance

Logical Design

Purpose – state business situation and solutionDeliverable – request for analysis

Database activity – conceptual data modeling

Systems Development Life Cycle

Project Identification and Selection

Project Initiation and Planning

Analysis

Physical Design

Implementation

Maintenance

Logical Design

Purpose –thorough analysisDeliverable – functional system specifications

Database activity – conceptual data modeling

Systems Development Life Cycle

Project Identification and Selection

Project Initiation and Planning

Analysis

Physical Design

Implementation

Maintenance

Logical Design

Purpose –information requirements structureDeliverable – detailed design specifications

Database activity – logical database design

Systems Development Life Cycle

Project Identification and Selection

Project Initiation and Planning

Analysis

Physical Design

Implementation

Maintenance

Logical Design

Purpose –develop technology specsDeliverable – program/data structures, technology purchases, organization redesigns

Database activity – physical database design

Systems Development Life Cycle

Project Identification and Selection

Project Initiation and Planning

Analysis

Physical Design

Implementation

Maintenance

Logical Design

Purpose –programming, testing, training, installation, documentingDeliverable – operational programs, documentation, training materials

Database activity – database implementation

Systems Development Life Cycle

Project Identification and Selection

Project Initiation and Planning

Analysis

Physical Design

Implementation

Maintenance

Logical Design

Purpose –monitor, repair, enhanceDeliverable – periodic audits

Database activity – database maintenance

Simplified Database Development Procedure

Start

Draw ERD

Convert to Relational Schema

Validate using Normalization

Validate against user transactions

Stop

DocumentationEntity Document

Entity Name Description Aliases OccurrenceName of entity A short

Description of entity

Other names the users used to refer to this entity

A common Situation where this entity can be found

Instructor Employees teaching courses

Lecturer,professor

Instructors work in departments

DocumentationRelationship Document

Entity Type Relationship Type

Entity Type Cardinality Participation (Optionality)

Name Ofparticipatin

g Entity : Entity A

Name ofRelationship

Name ofparticipatin

g Entity : Entity B

Cardinalityfrom Entity A to Entity B 1:1 1:M M:1

Participation constraints on the relationship from Entity A to Entity B (Optionalities)

Full (F) : Manadatory Entity (min>0)

Partial (P) : Optional Entity (min=0)

Instructor workFor Department M:1 P:F

DocumentationAttribute Document

Entity Names of Attributes

Description Data type and length

Constraint

Name of Entity List of all attributes of the entity

Description of each attribute

Data type ofeach attribute. It is possible touse domainnames youhave describedin the domaindocument

Primary , Unique and Secondary Key.(SecondaryKeys are usedto search forthe entity)

Student Student Id Uniquely identifies a student.

6 fixed character

Primary Key

Name Full name of student

50 variable character

Secondary Index

Gender Gender of student

1 fixed character

DocumentationAttribute Document Continued

Names of Attributes

Default Value Alias Null Value?(Yes or No)

Derived?

List of all attributes of the entity

Default valuefor attributes

Othernames, theusers used for the attribute

Yes : Null values are allowed

No: Null values are not allowed

Yes: It is derived

No: It is not derived

Student Id No

Name No

Gender ‘F’ Sex Yes

cgpa Cumulative grade

Yes Yes

DocumentationAttribute Domain Document

Domain Name Domain Characteristics Examples of allowed values

Name of Domain for attributes

Description of domain Illustrative examples

Cgpa domain 3 digit floating point between 0.00 and 4.00

3.33, 4.00

Gender 1 character string (‘F’ or ‘M’)

M, F

Some helpful pointers

• Use consistent naming rules for all entities,relationships and attributes

• Choose primary keys intelligently.

Primary keys should NOT change over time.

• Choose appropriate data types for attributes

Introduction

• There are endless possibilities for a designer to make a bad or wrong choice.

• You must try to understand how the customer manipulates data and how the ERD will produce the data structures required to sustain the same data manipulation

• The errors may be corrected at conceptual or logical database design phases. In fact you must check for errors at every phase!

• Here we discuss how to fix some common problems at the conceptual database design phase.

Problem:Unnormalized Attributes• Does an attribute name contain data?

– Multiple Attributes:ex : A1, A2, A3, …, An

ex :First_Inspection, Second_Inspection …

– Enumerations:X-Approval, Y-Approval, Z-Approval

• Difficult to predict population and changes require attribute changes

Restaurant

Id NameFirst_inspection

Second_inspection

Third_inspection

TextBook_Request

FormNo FormDate

Coordinator_Approval

Director_Approval

Rector_Approval

Solution: Unnormalized AttributesFixing Repeating Attributes

• Split repeating variables into its own– Split into a repeating group based on index

ex: (A,n) , (InspectionResult, OrderNo)

Restaurant

Id NameFirst_inspection

Second_inspection

Third_inspection

Restaurant

IdName

inspection

OrderNo Result

Solution: Unnormalized AttributesFixing Repeating Attributes

• Alternatively the following solution may be used

Restaurant

Id Name

Inspection

OrderNo Result

InspectionbelongsTo

If we need to store information on the employees who performed the inspection, it can be easily added here

employee

performedBy

Id Name

Solution: Unnormalized Attributes Fixing Enumerations

• Enumerations– Split the enumeration to code and domain

value(Code, Approval)

TextBook_Request

FormNo FormDateCoordinator_Approval

Director_Approval

Rector_Approval

TextBook_Reuest

FormNoFormDate

Approval

Code Status

EntryDate

Solution: Unnormalized Attributes Fixing Enumerations

Alternatively

Better Yet:

TextBookRequest

FormNo FormDate

Inspection

Code Status

ApprovalHas

EntryDate

TextBookRequest

FormNo FormDate

Inspection

Status

ApprovalHas

EntryDate

Employee

Id Name

Has

Employee

Id Name

TextBookRequest

FormNo FormDate

Approval

StatusEntryDate

Or

We can find out exactly who approved or disapproved of a text book request

Problem: Enumerations (Lists)

• Does an entity have any attributes that are enumerations but are not foreign keys?

• Create special code entities to hold the list of enumerated values and descriptions– also known as Lookup Tables, Reference tables or Cross-Reference entities

• This is different from the unnormalized attribute-enumartions. Here the attribute name does not contain data!

• Ex: If country is a simple attribute, then its value must be chosen from a list.

Solution: Use Validation Entities (lookup tables)

Student

id name

country

Employee

id name

country

Student

id name

Employee

id name

Country

Code

name

isFrom

isFrom

Problem : Single valued attributes changing over time• Even though an attribute may have only one value at any

given time, do you need to know its previous values?– Do you need to keep track of changes of an

attribute?

Instructor

Id Name

Title

At any given time an instructor has only one title: Assist. Prof, Assoc. Prof, Prof. But the title is expected to change!

InstructorTitle

Instructor

Id Name

InstructorTitle

ChangeDateTitle

change

Solution: Add History

Problem: Use of “complex” attributes

• Does an attribute represent a real life object or concept?

ServiceRecord

EquipmentId ServiceDate

EmployeeDescription

Description

HiredateDate

ServiceRecord

EquipmentId ServiceDate

EmployeeSalary

NameId

performs

Solution: Create a separate entity for the “complex” attribute

Representing compound attributes as simple attributes

Customer

Id

name address

Customer

Id

name

addressfirstname lastname

street

city

country

• Is a simple attribute composed from more than one field?

Solution: Use composite attributes

Problem: Fan Traps

• Result of hierarchical relationships that split semantic relationships resulting in the loss of information

• Commonly expressed by traversals from weak entity to related weak entity through parent which results in loss of information

• Fixed by reordering hierarchy

Example of Fan Trap

Issue: Who uses which computer?

Computer Office Employeecontains worksin

code speed

ramcapacity

officeno

floor

name

id

Fixing a Fan Trap

Computer Office Employee

cid ramcapacity

speed

officeno

floor

nameid

uses

worksin

This re-arrangement fixes the fan trap problem but if it is possible to have a computer in an office that is not assigned to any employee, it has another problem

Problem: Chasms

• Result of hierarchical relationships that split semantic relationships resulting in the loss of business rules

• Commonly expressed by creating artificial intermediate entity values for the sole purpose of providing a link

• Fixed by rebalancing hierarchy and adding appropriate relationships

Example of a Chasm

Issues: What if a customer is not assigned to an employee?

Branch employee customerworksFor(0,M) (1,1) (0,M) (0,1)

code name eid name cid name

represents

Fixing a Chasm

Branch employee customerworksFor(0,M) (1,1) (0,M) (0,1)

code name eid name cid name

represents

dealsWith

(0,M) (1,1)

More Design problems

• Misplaced relationships• Incorrect Cardinalities• Missing Relationships• Overuse of specialized data modeling

tools (ex: Inheritance, multiway relationships)

• Redundant Relationships

Use of Intelligent vs Surrogate Keys

• A surrogate key is an artificial or synthetic key that is used as a substitute for a natural key aka intelligent key.

• "Surrogate key" may also be known as "System-generated key", "Database Sequence number", "Synthetic key", "Technical key" or an "Arbitrary, unique identifier".

• primary keys are hard to change. • Intelligent keys suffer from this problem because not only

are they used as primary and foreign keys but they also have some business meaning associated with them

• The biggest advantage for intelligent keys is that users understand what they mean whereas surrogate keys don't make any business sense.

Data Models that use surrogate keys usually have more normalization errors. 

Surrogate vs. Intelligent Keys

Natural keys:• are more logical • can sometimes can mean fewer joins • help to encourage good modeling • are traditional/user friendly• make snooping around in the data easier

Surrogate keys:• are shorter • are easier to join• take less storage • enable natural key fields to be easily

changed • are what Object Oriented (and object

relational) databases use

Goals of Database Development

• Develop a Common Vocabulary• Define the meaning of Data• Ensure Data Quality• Find an Efficient Implementation

Final Word

• Remember that the goal of the DB development is to produce a DB that provides an important resource for an organization.

• The DB should be designed so that it can serve the customers and other team members efficiently.