systemsdesignanalysis 1 chapter 17 data modeling jerry post copyright © 1997

27
1 S S Y Y S S T T E E M M S S D D E E S S I I G G N N A A N N A A L L Y Y S S I I S S Chapter 17 Chapter 17 Data Modeling Jerry Post Copyright © 1997

Upload: jessie-mckinney

Post on 03-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

1

SSYYSSTTEEMMSS

DDEESSIIGGNN

AANNAALLYYSSIISS Chapter 17Chapter 17

Data Modeling

Jerry Post

Copyright © 1997

2

SSYYSSTTEEMMSS

DDEESSIIGGNN

Data FilesData Files

Disk drives Platters spin Head moves/rotates Tracks Sectors

File Systems Files Directories/Folders File Allocation Table

Maps logical files to sectors Every file is split into pieces Scattered across the disk Minimum sector size

Old DOS: 32K on 2GB Drive always retrieves

fixed length sectors.

Disk spins Head moves

Tracks

Sectors

Directory EntryFile NameFile ExtensionFile attributeTime of updateDate of updateBeginning disk cluster/sectorFile sizeSecurity attributes

3

SSYYSSTTEEMMSS

DDEESSIIGGNN

Disks: RAIDDisks: RAID

Speed limitations Rotational speed Drive head speed Bus transfer rates

RAID: Redundant Array of Independent (Inexpensive) Drives Store file across many

drives (striping) Some deliberate duplication Speed/parallel Parallel searches

RAID: seen as one drive

FileSector 1Sector 2Sector 3Sector 4Sector 5…

4

SSYYSSTTEEMMSS

DDEESSIIGGNN

Data FilesData Files

Master Sorted Current data totals e.g., Inventory

Transaction Change log Updates to master

Report files (output) Initialization files (software)

Need to separate data storage from physical location. Backup and Restore Changes in physical drive

File Organization Sequential Indexed Linked Lists Hashed

5

SSYYSSTTEEMMSS

DDEESSIIGGNN

Sequential StorageSequential Storage

Common uses When large portions of the

data are always used at one time. e.g., 25%

When table is huge and space is expensive.

When transporting / converting data to a different system.

ID LastName FirstName DateHired1 Reeves Keith 1/29/962 Gibson Bill 3/31/963 Reasoner Katy 2/17/964 Hopkins Alan 2/8/965 James Leisha 1/6/966 Eaton Anissa 8/23/967 Farris Dustin 3/28/968 Carpenter Carlos 12/29/969 O'Connor Jessica 7/23/9610 Shields Howard 7/13/96

6

SSYYSSTTEEMMSS

DDEESSIIGGNN

Indexed SequentialIndexed Sequential

Common uses Large tables. Need many sequential lists. Some random search--with

one or two key columns. Hold index in RAM if

possible/speed.

ID LastName FirstName DateHired1 Reeves Keith 1/29/962 Gibson Bill 3/31/963 Reasoner Katy 2/17/964 Hopkins Alan 2/8/965 James Leisha 1/6/966 Eaton Anissa 8/23/967 Farris Dustin 3/28/968 Carpenter Carlos 12/29/969 O'Connor Jessica 7/23/9610 Shields Howard 7/13/96

A11A22A32A42A47A58A63A67A78A83

Address

LastName PointerCarpenter A67Eaton A58Farris A63Gibson A22Hopkins A42James A47O'Connor A78Reasoner A32Reeves A11Shields A83

7

SSYYSSTTEEMMSS

DDEESSIIGGNN

Linked ListsLinked Lists

Separate each element/key. Pointers to next element. Starting point.

CarpenterB87 B29

GibsonB38 00

EatonB29 B71

FarrisB71 B38

8

SSYYSSTTEEMMSS

DDEESSIIGGNN

Insert into a Linked ListInsert into a Linked List

Get space/location with address. Data: Save row (A97). Key: Save key and pointer

to data (B14).

Find insert location. Eccles would be after Eaton

and before Farris. From prior key (Eaton), put

next address (B71) into new key, next pointer.

Put new address (B14) in prior key, next pointer.

FarrisB71 B38 A63

EatonB29 B71 A58

EcclesB14 B71 A97

NewData = new (. . .)NewKey = new (. . .)NewKey->Key = “Eccles”NewKey->Data = NewData

FindInsertPoint(List, PriorKey, NewKey)

NewKey->Next = PriorKey->NextPriorKey->Next = NewKey

B14

9

SSYYSSTTEEMMSS

DDEESSIIGGNN

Direct Access / HashedDirect Access / Hashed

Convert key value directly to location (relative or absolute). Use prime modulus

Choose prime number greater than expected database size (n).

Divide and use remainder.

Set aside spaces (fixed-length) to hold each row.

Collision/overflow space for duplicates.

Extremely fast retrieval. Very poor sequential access. Reorganize if out of space!

Example Prime = 101 Key = 528 Modulus = 23

Overflow/collisions

10

SSYYSSTTEEMMSS

DDEESSIIGGNN

Why Normalization?Why Normalization?

Need standardized data definition Advantages of DBMS require careful design Define data correctly and the rest is much easier It especially makes it easier to expand database later Method applies to most models and most DBMS

Similar to Entity-Relationship Similar to Objects (without inheritance and methods) Goal: Define tables carefully

Save space Minimize redundancy Protect data

11

SSYYSSTTEEMMSS

DDEESSIIGGNN

NotationNotation

Table name

Primary key is underlined

Table columns

Customer(CustomerID, Phone, Name, Address, City, State, ZipCode)

CustomerID Phone LastName FirstName Address City State Zipcode

1 502-666-7777 Johnson Martha 125 Main Street Alvaton KY 421222 502-888-6464 Smith Jack 873 Elm Street Bowling Green KY 421013 502-777-7575 Washington Elroy 95 Easy Street Smith’s Grove KY 421714 502-333-9494 Adams Samuel 746 Brown Drive Alvaton KY 421225 502-474-4746 Rabitz Victor 645 White Avenue Bowling Green KY 421026 616-373-4746 Steinmetz Susan 15 Speedway Drive Portland TN 371487 615-888-4474 Lasater Les 67 S. Ray Drive Portland TN 371488 615-452-1162 Jones Charlie 867 Lakeside Drive Castalian Springs TN 370319 502-222-4351 Chavez Juan 673 Industry Blvd. Caneyville KY 4272110 502-444-2512 Rojo Maria 88 Main Street Cave City KY 42127

12

SSYYSSTTEEMMSS

DDEESSIIGGNN

Sample: Video DatabaseSample: Video Database

Repeating sectionPossible Keys

13

SSYYSSTTEEMMSS

DDEESSIIGGNN

Initial ObjectsInitial Objects

Customers Key: Assign a CustomerID Sample Properties

Name Address Phone

Videos Key: Assign a MovieID Sample Properties

Title RentalPrice Rating Description

RentalTransaction Event/Relationship Key: Assign TransactionID Sample Properties

CustomerID Date

VideosRented Event/Repeating list Keys: TransactionID +

MovieID Sample Properties

VideoCopy#

14

SSYYSSTTEEMMSS

DDEESSIIGGNN

Initial Form EvaluationInitial Form Evaluation

Collect forms from users Write down properties Find repeating groups ( . . .) Look for potential keys: key Identify computed values Notation makes it easier to

identify and solve problems Results equivalent to

diagrams, but will fit on one or two pages

RentalForm(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode,(VideoID, Copy#, Title, Rent ) )

15

SSYYSSTTEEMMSS

DDEESSIIGGNN

Problems with Repeating SectionsProblems with Repeating Sections

RentalForm(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode,(VideoID, Copy#, Title, Rent ) )

TransID RentDate CustomerID LastName Phone Address VideoID Copy# Title Rent1 4/18/95 3 Washington 502-777-7575 95 Easy Street 1 2 2001: A Space Odyssey $1.501 4/18/95 3 Washington 502-777-7575 95 Easy Street 6 3 Clockwork Orange $1.502 4/30/95 7 Lasater 615-888-4474 67 S. Ray Drive 8 1 Hopscotch $1.502 4/30/95 7 Lasater 615-888-4474 67 S. Ray Drive 2 1 Apocalypse Now $2.002 4/30/95 7 Lasater 615-888-4474 67 S. Ray Drive 6 1 Clockwork Orange $1.503 4/18/95 8 Jones 615-452-1162 867 Lakeside Drive 9 1 Luggage Of The Gods $2.503 4/18/95 8 Jones 615-452-1162 867 Lakeside Drive 15 1 Fabulous Baker Boys $2.003 4/18/95 8 Jones 615-452-1162 867 Lakeside Drive 4 1 Boy And His Dog $2.504 4/18/95 3 Washington 502-777-7575 95 Easy Street 3 1 Blues Brothers $2.004 4/18/95 3 Washington 502-777-7575 95 Easy Street 8 1 Hopscotch $1.504 4/18/95 3 Washington 502-777-7575 95 Easy Street 13 1 Surf Nazis Must Die $2.504 4/18/95 3 Washington 502-777-7575 95 Easy Street 17 1 Witches of Eastwick $2.00

Repeating Section

Causes duplication

Storing data in this raw form would not work very well. For example, repeating sections will cause problems.

Note the duplication of data.

Also, what if a customer has not yet checked out a movie--where do we store that customer’s data?

16

SSYYSSTTEEMMSS

DDEESSIIGGNN

Problems with Repeating SectionsProblems with Repeating Sections

Store repeating data Allocate space How much?

Can’t be short Wasted space

e.g., How many videos will be rented at one time?

A better definition eliminates this problem.

NamePhoneAddressCityStateZipCode

VideoID Copy# Title Rent1. 6 1 Clockwork Orange 1.502. 8 2 Hopscotch 1.503. 4. 5.

{Unused Space}

Not in First Normal Form

Customer Rentals

17

SSYYSSTTEEMMSS

DDEESSIIGGNN

First Normal FormFirst Normal Form

Remove repeating sections Split into two tables Bring key from main and repeating section

RentalLine(TransID, VideoID, Copy#, . . .) Each transaction can have many videos (key VideoID) Each video can be rented on many transactions (key TransID) For each TransID and VideoID, only one Copy# (no key on Copy#)

RentalForm(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode, (VideoID, Copy#, Title, Rent ) )

RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode)

RentalLine(TransID, VideoID, Copy#, Title, Rent )

18

SSYYSSTTEEMMSS

DDEESSIIGGNN

First Normal Form Problems (Data)First Normal Form Problems (Data)

1NF splits repeating groups Still have problems

Replication Hidden dependency: If a video has not been

rented yet, then what is its title?

TransID RentDate CustID Phone LastName FirstName Address City State ZipCode1 4/18/95 3 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 421712 4/30/95 7 615-888-4474 Lasater Les 67 S. Ray Drive Portland TN 371483 4/18/95 8 615-452-1162 Jones Charlie 867 Lakeside Drive Castalian Springs TN 370314 4/18/95 3 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 42171

TransID VideoID Copy# Title Rent1 1 2 2001: A Space Odyssey $1.501 6 3 Clockwork Orange $1.502 8 1 Hopscotch $1.502 2 1 Apocalypse Now $2.002 6 1 Clockwork Orange $1.503 9 1 Luggage Of The Gods $2.503 15 1 Fabulous Baker Boys $2.003 4 1 Boy And His Dog $2.504 3 1 Blues Brothers $2.004 8 1 Hopscotch $1.504 13 1 Surf Nazis Must Die $2.504 17 1 Witches of Eastwick $2.00

19

SSYYSSTTEEMMSS

DDEESSIIGGNN

Second Normal Form DefinitionSecond Normal Form Definition

Each non-key column must depend on the entire key. Only applies to

concatenated keys Some columns only depend

on part of the key Split those into a new table.

Dependence (definition) If given a value for the key

you always know the value of the property in question, then that property is said to depend on the key.

If you change part of a key and the questionable property does not change, then the table is not in 2NF.

RentalLine(TransID, VideoID, Copy#, Title, Rent)

Depend only on VideoID

Depends on both TransID and VideoID

20

SSYYSSTTEEMMSS

DDEESSIIGGNN

Second Normal Form ExampleSecond Normal Form Example

Title depends only on VideoID Each VideoID can have only one title

Rent depends on VideoID This statement is actually a business rule. It might be different at different stores. Some stores might charge a different rent for each video depending

on the day (or time).

Each non-key column depends on the whole key.

RentalLine(TransID, VideoID, Copy#, Title, Rent)

VideosRented(TransID, VideoID, Copy#)

Videos(VideoID, Title, Rent)

21

SSYYSSTTEEMMSS

DDEESSIIGGNN

Second Normal Form Example (Data)Second Normal Form Example (Data)

TransID VideoID Copy#1 1 21 6 32 2 12 6 12 8 13 4 13 9 13 15 14 3 14 8 14 13 14 17 1

VideoID Title Rent1 2001: A Space Odyssey $1.502 Apocalypse Now $2.003 Blues Brothers $2.004 Boy And His Dog $2.505 Brother From Another Planet $2.006 Clockwork Orange $1.507 Gods Must Be Crazy $2.008 Hopscotch $1.50

VideosRented(TransID, VideoID, Copy#)

Videos(VideoID, Title, Rent)

RentalForm2(TransID, RentDate, CustomerID, Phone,Name, Address, City, State, ZipCode)

(Unchanged)

22

SSYYSSTTEEMMSS

DDEESSIIGGNN

Second Normal Form Problems (Data)Second Normal Form Problems (Data)

Even in 2NF, problems remain Replication Hidden dependency If a customer has not rented a video yet, where do we store

their personal data?

Solution: split table.

TransID RentDate CustID Phone LastName FirstName Address City State ZipCode1 4/18/95 3 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 421712 4/30/95 7 615-888-4474 Lasater Les 67 S. Ray Drive Portland TN 371483 4/18/95 8 615-452-1162 Jones Charlie 867 Lakeside Drive Castalian Springs TN 370314 4/18/95 3 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 42171

RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode)

23

SSYYSSTTEEMMSS

DDEESSIIGGNN

Third Normal Form DefinitionThird Normal Form Definition

Each non-key column must depend on nothing but the key. Some columns depend on

columns that are not part of the key.

Split those into a new table. Example: Customers name

does not change for every transaction.

Dependence (definition) If given a value for the key

you always know the value of the property in question, then that property is said to depend on the key.

If you change the key and the questionable property does not change, then the table is not in 3NF.

RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode)

Depend only on CustomerID

Depend on TransID

24

SSYYSSTTEEMMSS

DDEESSIIGGNN

Third Normal Form ExampleThird Normal Form Example

Customer attributes depend only on Customer ID Split them into new table (Customer) Remember to leave CustomerID in Rentals table. We need to be able to reconnect tables.

3NF is sometimes easier to see if you identify primary objects at the start--then you would recognize that Customer was a separate object.

RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode)

Rentals(TransID, RentDate, CustomerID )

Customers(CustomerID, Phone, Name, Address, City, State, ZipCode )

25

SSYYSSTTEEMMSS

DDEESSIIGGNN

Third Normal Form Example DataThird Normal Form Example Data

TransID RentDate CustomerID1 4/18/95 32 4/30/95 73 4/18/95 84 4/18/95 3

CustomerID Phone LastName FirstName Address City State ZipCode1 502-666-7777 Johnson Martha 125 Main Street Alvaton KY 421222 502-888-6464 Smith Jack 873 Elm Street Bowling Green KY 421013 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 421714 502-333-9494 Adams Samuel 746 Brown Drive Alvaton KY 421225 502-474-4746 Rabitz Victor 645 White Avenue Bowling Green KY 421026 615-373-4746 Steinmetz Susan 15 Speedway Drive Portland TN 371487 615-888-4474 Lasater Les 67 S. Ray DrivePortland TN 371488 615-452-1162 Jones Charlie 867 Lakeside Drive Castalian Springs TN 370319 502-222-4351 Chavez Juan 673 Industry Blvd. Caneyville KY 4272110 502-444-2512 Rojo Maria 88 Main Street Cave City KY 42127

Rentals(TransID, RentDate, CustomerID )

Customers(CustomerID, Phone, Name, Address, City, State, ZipCode )

(Unchanged)

VideosRented(TransID, VideoID, Copy#)

Videos(VideoID, Title, Rent)

26

SSYYSSTTEEMMSS

DDEESSIIGGNN

Third Normal Form Tables (3NF)Third Normal Form Tables (3NF)

Rentals(TransID, RentDate, CustomerID )

Customers(CustomerID, Phone, Name, Address, City, State, ZipCode )

VideosRented(TransID, VideoID, Copy#)

Videos(VideoID, Title, Rent)

27

SSYYSSTTEEMMSS

DDEESSIIGGNN

Checking Your Work (Quality Control)Checking Your Work (Quality Control)

Look for one-to-many relationships. Many side should be keyed (underlined). e.g., VideosRented(TransID, VideoID, . . .). Check each column and ask if it should be 1 : 1 or 1: M. If add a key, renormalize.

Verify no repeating sections (1NF) Check 3NF

Check each column and ask: Does it depend on the whole key and nothing but the key?

Verify that the tables can be reconnected (joined) to form the original tables (draw lines).

Each table represents one object. Enter sample data--look for replication.