physical database design(database)

33
Lecture 6: Physical Database Design ISOM3260, Spring 2014

Upload: welcometofacebook

Post on 22-May-2015

601 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: Physical database design(database)

Lecture 6: Physical Database Design

ISOM3260, Spring 2014

Page 2: Physical database design(database)

2

Where we are now• Database environment

– Introduction to database• Database development process

– steps to develop a database• Conceptual data modeling

– entity-relationship (ER) diagram; enhanced ER• Logical database design

– transforming ER diagram into relations; normalization• Physical database design

– technical specifications of the database• Database implementation

– Structured Query Language (SQL), Advanced SQL• Advanced topics

– data and database administration

Page 3: Physical database design(database)

3

Database development activities during SDLC

Page 4: Physical database design(database)

4

Physical Database Design

• Physical Database Design Process• Designing Fields• Designing Physical Records and Denormalization• Designing Physical Files• Choosing Database Architectures

Page 5: Physical database design(database)

5

Physical Database Design

• Purpose– translate the logical description of data into the technical

specifications for storing and retrieving data

• Goal– create a design for storing data that will provide adequate

performance and insure database integrity, security and recoverability

– balance between efficient storage space and processing speed

– efficient processing tend to dominate as storage is getting cheaper

Page 6: Physical database design(database)

6

Physical Design Process

Normalized relations

Volume estimates

Frequency of use estimates

Attribute definitions

Response time expectations

Data security, backup, recovery, and integrity requirements

DBMS technology used

Inputs

Attribute data types

Physical record descriptions (doesn’t always match logical design)

File organizations

Indexes and database architectures

Query optimization

Leads to

Key Decisions

Page 7: Physical database design(database)

7

Composite Usage Map

• To estimate data volume and frequency of use statistics

• First step in physical database design or last step in logical database design

• Add notations to the EER diagram

Page 8: Physical database design(database)

8

Figure 5-1: Composite Usage Map

Note: To estimate size and usage patterns of the database.

Page 9: Physical database design(database)

9

Figure 5-1: Composite Usage Map

Data volumes

Page 10: Physical database design(database)

10

Figure 5-1: Composite Usage Map

Access Frequencies (per hour)

Page 11: Physical database design(database)

11

Figure 5-1: Composite Usage Map

Usage analysis:200 purchased parts accessed per hour 80 quotations accessed from these 200 purchased part accesses 70 suppliers accessed from these 80 quotation accesses

Page 12: Physical database design(database)

12

Figure 5-1: Composite Usage Map

Usage analysis:75 suppliers accessed per hour 40 quotations accessed from these 75 supplier accesses 40 purchased parts accessed from these 40 quotation accesses

Note: PURCHASED PART and QUOTATION are candidates for denormalization.

Page 13: Physical database design(database)

13

Designing Fields

• Field– smallest unit of data in database– correspond to a simple attribute from the E-R diagram

• Field design – choosing data types– coding techniques– controlling data integrity– handling missing values

Page 14: Physical database design(database)

14

Choosing Data Types• Correct data type to choose for a field should

– minimize storage space– represent all possible values– improve data integrity (eliminate illegal values)– support all data manipulations

• Examples of data types– CHAR: fixed-length character– VARCHAR2: variable-length character– CLOB: capable of storing up to 4GB (e.g. customer’s comment)– NUMBER: positive/negative number– DATE: actual date and time– BLOB: binary large object (e.g. photograph or sound clip)

Page 15: Physical database design(database)

15

Coding Techniques

• Some attributes may be very large

• These data are further apart; results in slower data processing

• Create a code look-up table

Page 16: Physical database design(database)

16

Figure 5-2: Code look-up table (Pine Valley Furniture Company)

Code saves space, but costs an additional lookup to obtain actual value and additional space for the look-up table.

Note: Acceptable if Finish field is infrequently used.

Page 17: Physical database design(database)

17

Controlling Data Integrity• Control on the possible values a field can assume

– Default value value a field will assume unless a user enters an explicit

value for that field– Range control

limits the set of permissible values a field can assume– Null value control

allowing or prohibiting empty fields e.g. primary keys

– Referential integrity range control for foreign-key to primary-key match-ups

Page 18: Physical database design(database)

18

Handling Missing Data

• Substitute an estimate of the missing value– e.g. using some formula

• Trigger a report listing missing values

• Perform sensitivity analysis– missing data are ignored unless knowing a value

might be significant

Page 19: Physical database design(database)

19

Designing Physical Records

• Physical record– a group of fields stored in adjacent memory locations and

retrieved or written together as a unit by a DBMS

• Sometimes, the normalized relation may not be converted directly into a physical record– often all the attributes in a relation are not used together,

and data from different relations are needed together to produce a report

– efficient processing of data depends on how close together related data are

Page 20: Physical database design(database)

20

Denormalization• Process of transforming normalized relations into unnormalized

physical record specifications– either by joining files, partitioning files or data replication

• Benefit– improve processing speed

• Costs– more storage space needed– data integrity and inconsistency threats

• Common denormalization opportunities – e.g. of combining tables to avoid doing joins– one-to-one relationship– many-to-many relationship with non-key attributes– reference data (1:N relationship where 1-side has data not used in any

other relationship)

Page 21: Physical database design(database)

21

Fig. 5-3: Two entities with a one-to-one relationship

Assume Application_ID is not necessary but can be included if required.

Page 22: Physical database design(database)

22

Fig. 5-4: A many-to-many relationship with non-key attributes

Avoids one join operation but increases data duplication

Page 23: Physical database design(database)

23

Fig. 5-5: A possible denormalization situation: reference data

Extra table access required

Data duplication

Page 24: Physical database design(database)

24

Partitioning• Create more tables• Horizontal partitioning

– distributing the rows of a table into several separate files– useful for situations where different users need access to different rows

• Vertical partitioning– distributing the columns of a table into several separate files– the primary key must be repeated in each file– useful for situations where different users need access to different

columns

• Combinations of horizontal and vertical partitioning– useful for database distributed across multiple computers (distributed

database)

Page 25: Physical database design(database)

25

Data Replication

• purposely storing the same data in multiple locations of the database

• improves performance by allowing multiple users to access the same data at the same time with minimum contention

• sacrifices data integrity due to data duplication

• best for data that is not updated often

Page 26: Physical database design(database)

Figure 5.1 - Composite usage map

Combine into 1 fileCombine into another file

Page 27: Physical database design(database)

27

Designing Physical Files• Physical file

– a named portion of secondary memory (e.g. hard disk) allocated for the purpose of storing physical records

• Basic constructs to link two pieces of data– sequential storage

one field or record is stored right after another field or record– pointers

a field of data that can be used to locate a related field or record

• File organization– technique for physically arranging a file on the disk– three types

Sequential file organization Indexed file organization Hashed file organization

Page 28: Physical database design(database)

28

Fig. 5-7 (a) Sequential file organization

1

2

n

Records of the file are stored in sequence by the primary key field values.

every insert or delete requires file to be resorted

Note: Inflexible; not used in database but may be used to backup data from a database.

Page 29: Physical database design(database)

29

Indexed File Organizations• More popular is indexed sequential file organization

– the storage of records sequentially with an index that allows software to locate individual records

• Primary key index– each index entry points a key value to a unique record– primary keys are automatically indexed

• Secondary key index– each index entry points to more than one record– indexing on a non-primary key field

• Index handled by DBMS

Page 30: Physical database design(database)

30

Fig. 5-7 (b) Indexed file organization

Leaf nodes contain data records or pointers to each record

pointerRoot node

Page 31: Physical database design(database)

31

Fig. 5-7 (c)Hashed file organization

Hashing algorithm- a routine that converts a primary key value into a record address

- typically uses the technique of dividing the primary key by a suitable prime number and then using the remainder as the relative storage position

Address of each record is determined using a hashing algorithm

Page 32: Physical database design(database)

32

Dat

abas

e A

rchi

tect

ures

Legacy

Systems

Current

Technology

Data

Warehouse

Page 33: Physical database design(database)

33

Review Questions

• What is a composite usage map?• What are the 4 issues in designing fields?• What are denormalization, partitioning, and data

replication?• What are the 3 types of file organization?• What are the types of database architectures?