san diego supercomputer center introduction to database design july 2006 ken nunes knunes @ sdsc.edu

59
SAN DIEGO SUPERCOMPUTER CENTER Introduction to Database Design July 2006 Ken Nunes knunes @ sdsc.edu

Upload: penelope-hoover

Post on 28-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

SAN DIEGO SUPERCOMPUTER CENTER

Introduction to Database Design

July 2006Ken Nunes

knunes @ sdsc.edu

SAN DIEGO SUPERCOMPUTER CENTER

Database Design Agenda

•Introductions•General Design Considerations•Entity-Relationship Model•Normalization•Overview of SQL•Star Schemas•Additional Information•Q&A

SAN DIEGO SUPERCOMPUTER CENTER

General Design Considerations

•Users

•Application Requirements

•Legacy Systems/Data

SAN DIEGO SUPERCOMPUTER CENTER

Users

•Who are they?•Administrative•Scientific•Technical

•Impact•Access Controls•Interfaces•Service levels

SAN DIEGO SUPERCOMPUTER CENTER

Application Requirements

•What kind of database?•OnLine Analytical Processing (OLAP)•OnLine Transactional Processing (OLTP)

•Budget•Platform / Vendor•Workflow?

•order of operations•error handling•reporting

SAN DIEGO SUPERCOMPUTER CENTER

Legacy Systems/Data

•What systems are currently in place?•Where does the data come from?•How is it generated?•What format is it in?•What is the data used for?•Which parts of the system must remain static?

SAN DIEGO SUPERCOMPUTER CENTER

Entity - Relationship Model

A logical design method which emphasizes simplicity and readability.

•Basic objects of the model are:•Entities•Relationships•Attributes

SAN DIEGO SUPERCOMPUTER CENTER

Entities

Data objects detailed by the information in the database.

•Denoted by rectangles in the model.

Employee Department

SAN DIEGO SUPERCOMPUTER CENTER

Attributes

Characteristics of entities or relationships.

•Denoted by ellipses in the model.

Name SSN

Employee Department

Name Budget

SAN DIEGO SUPERCOMPUTER CENTER

Relationships

Represent associations between entities.

•Denoted by diamonds in the model.

Name SSN

Employee Department

Name Budget

works in

Start date

SAN DIEGO SUPERCOMPUTER CENTER

Relationship Connectivity

Constraints on the mapping of the associated entities in the relationship.

•Denoted by variables between the related entities.

•Generally, values for connectivity are expressed as “one” or

“many”

Name SSN

Employee Department

Name Budget

work 1N

Start date

SAN DIEGO SUPERCOMPUTER CENTER

Connectivity

Department Managerhas 11

Department Projecthas N1

Employee Projectworks on NM

one-to-one

one-to-many

many-to-many

SAN DIEGO SUPERCOMPUTER CENTER

ER example

Retailer wants to create an online webstore.

•The retailer requires information on:•Customers•Items•Orders

SAN DIEGO SUPERCOMPUTER CENTER

Webstore Entities & Attributes

•Customers - name, credit card, address

•Items - name, price, inventory

•Orders - item, quantity, cost, date, status

Items Orders

inventory

priceName

item quantity

Customers

credit card

name address

costdate status

SAN DIEGO SUPERCOMPUTER CENTER

Webstore Relationships

Identify the relationships.

•The orders are recorded each time a customer purchases items, so the customer and order entities are related.

•Each customer may make several purchases so the relationship

is one-to-many

OrderCustomerN1

purchase

SAN DIEGO SUPERCOMPUTER CENTER

Webstore Relationships

Identify the relationships.

•The order consists of the items a customer purchases but each item can be found in multiple orders.

•Since a customer can purchase multiple items and make multiple orders the relationship is many to many.

Order Itemconsists NM

SAN DIEGO SUPERCOMPUTER CENTER

Webstore ER Diagram

Orders

Customers

Items

purchase

consists

N

N

1

M

item quantity cost name price

credit cardname address

inventory

statusdate

SAN DIEGO SUPERCOMPUTER CENTER

Logical Design to Physical Design

Creating relational SQL schemas from entity-relationship models.

•Transform each entity into a table with the key and its attributes.

•Transform each relationship as either a relationship table (many-to-many) or a “foreign key” (one-to-many and many-to-many).

SAN DIEGO SUPERCOMPUTER CENTER

Entity tables

Transform each entity into a table with a key and its attributes.

Name SSN

Employeecreate table employee

(emp_no number,name varchar2(256),ssn number,primary key (emp_no));

SAN DIEGO SUPERCOMPUTER CENTER

Foreign Keys

Transform each one-to-one or one-to-many relationship as a “foreign key”.

•Foreign key is a reference in the child (many) table to the primary key of the parent (one) table.

create table employee(emp_no number,dept_no number,name varchar2(256),ssn number,primary key (emp_no),foreign key (dept_no) references department);

Employee

Department

has

1

N

create table department(dept_no number,name varchar2(50),primary key (dept_no));

SAN DIEGO SUPERCOMPUTER CENTER

Foreign Key

dept_no Name1 Accounting2 Human Resources3 IT

emp_no dept_no Name1 2 Nora Edwards2 3 Ajay Patel3 2 Ben Smith4 1 Brian Burnett5 3 John O'Leary6 3 Julia Lenin

Department

Employee

Accounting has 1 employee:Brian Burnett

Human Resources has 2 employees:Nora EdwardsBen Smith

IT has 3 employees:Ajay PatelJohn O’LearyJulia Lenin

SAN DIEGO SUPERCOMPUTER CENTER

Many-to-Many tables

Transform each many-to-many relationship as a table.•The relationship table will contain the foreign keys to the related entities as well as any relationship attributes.

create table project_employee_details(proj_no number,emp_no number,start_date date,primary key (proj_no, emp_no),foreign key (proj_no) references projectforeign key (emp_no) references employee);

Employee

Project

has

N

M

Start date

SAN DIEGO SUPERCOMPUTER CENTER

Many-to-Many tables

emp_no dept_no Name1 2 Nora Edwards2 3 Ajay Patel3 2 Ben Smith4 1 Brian Burnett5 3 John O'Leary6 3 Julia Lenin

Project

Employee

Project_employee_detailsproj_no Name1 Employee Audit2 Budget3 Intranet

proj_no emp_no start_date1 4 4/7/033 6 8/12/023 5 3/4/012 6 11/11/023 2 12/2/032 1 7/21/04

Employee Audit has 1 employee:Brian Burnett

Budget has 2 employees:Julia LeninNora Edwards

Intranet has 3 employees:Julia LeninJohn O’LearyAjay Patel

SAN DIEGO SUPERCOMPUTER CENTER

Normalization

A logical design method which minimizes data redundancy and reduces design flaws.

•Consists of applying various “normal” forms to the database design.

•The normal forms break down large tables into smaller subsets.

SAN DIEGO SUPERCOMPUTER CENTER

First Normal Form (1NF)

Each attribute must be atomic• No repeating columns within a row.• No multi-valued columns.

1NF simplifies attributes• Queries become easier.

SAN DIEGO SUPERCOMPUTER CENTER

1NF

Employee (unnormalized)

emp_no name dept_no dept_name skills1 Kevin Jacobs 201 R&D C, Perl, Java2 Barbara Jones 224 IT Linux, Mac3 Jake Rivera 201 R&D DB2, Oracle, Java

emp_no name dept_no dept_name skills1 Kevin Jacobs 201 R&D C1 Kevin Jacobs 201 R&D Perl1 Kevin Jacobs 201 R&D Java2 Barbara Jones 224 IT Linux2 Barbara Jones 224 IT Mac3 Jake Rivera 201 R&D DB23 Jake Rivera 201 R&D Oracle3 Jake Rivera 201 R&D Java

Employee (1NF)

SAN DIEGO SUPERCOMPUTER CENTER

Second Normal Form (2NF)

Each attribute must be functionally dependent on the primary key.

• Functional dependence - the property of one or more attributes that uniquely determines the value of other attributes.• Any non-dependent attributes are moved into a smaller (subset) table.

2NF improves data integrity.• Prevents update, insert, and delete anomalies.

SAN DIEGO SUPERCOMPUTER CENTER

Functional Dependence

Name, dept_no, and dept_name are functionally dependent on emp_no. (emp_no -> name, dept_no, dept_name)

Skills is not functionally dependent on emp_no since it is not unique to each emp_no.

emp_no name dept_no dept_name skills1 Kevin Jacobs 201 R&D C1 Kevin Jacobs 201 R&D Perl1 Kevin Jacobs 201 R&D Java2 Barbara Jones 224 IT Linux2 Barbara Jones 224 IT Mac3 Jake Rivera 201 R&D DB23 Jake Rivera 201 R&D Oracle3 Jake Rivera 201 R&D Java

Employee (1NF)

SAN DIEGO SUPERCOMPUTER CENTER

2NF

emp_no name dept_no dept_name skills1 Kevin Jacobs 201 R&D C1 Kevin Jacobs 201 R&D Perl1 Kevin Jacobs 201 R&D Java2 Barbara Jones 224 IT Linux2 Barbara Jones 224 IT Mac3 Jake Rivera 201 R&D DB23 Jake Rivera 201 R&D Oracle3 Jake Rivera 201 R&D Java

Employee (1NF)

emp_no name dept_no dept_name1 Kevin Jacobs 201 R&D2 Barbara Jones 224 IT3 Jake Rivera 201 R&D

Employee (2NF)emp_no skills1 C1 Perl1 Java2 Linux2 Mac3 DB23 Oracle3 Java

Skills (2NF)

SAN DIEGO SUPERCOMPUTER CENTER

Data Integrity

• Insert Anomaly - adding null values. eg, inserting a new department does not require the primary key of emp_no to be added. • Update Anomaly - multiple updates for a single name change, causes performance degradation. eg, changing IT dept_name to IS• Delete Anomaly - deleting wanted information. eg, deleting the IT department removes employee Barbara Jones from the database

emp_no name dept_no dept_name skills1 Kevin Jacobs 201 R&D C1 Kevin Jacobs 201 R&D Perl1 Kevin Jacobs 201 R&D Java2 Barbara Jones 224 IT Linux2 Barbara Jones 224 IT Mac3 Jake Rivera 201 R&D DB23 Jake Rivera 201 R&D Oracle3 Jake Rivera 201 R&D Java

Employee (1NF)

SAN DIEGO SUPERCOMPUTER CENTER

Third Normal Form (3NF)

Remove transitive dependencies.• Transitive dependence - two separate entities exist within one table.• Any transitive dependencies are moved into a smaller (subset) table.

3NF further improves data integrity.• Prevents update, insert, and delete anomalies.

SAN DIEGO SUPERCOMPUTER CENTER

Transitive Dependence

Dept_no and dept_name are functionally dependent on emp_no however, department can be considered a separate entity.

emp_no name dept_no dept_name1 Kevin Jacobs 201 R&D2 Barbara Jones 224 IT3 Jake Rivera 201 R&D

Employee (2NF)

SAN DIEGO SUPERCOMPUTER CENTER

3NF

emp_no name dept_no dept_name1 Kevin Jacobs 201 R&D2 Barbara Jones 224 IT3 Jake Rivera 201 R&D

Employee (2NF)

emp_no name dept_no1 Kevin Jacobs 2012 Barbara Jones 2243 Jake Rivera 201

Employee (3NF)

dept_no dept_name201 R&D224 IT

Department (3NF)

SAN DIEGO SUPERCOMPUTER CENTER

Other Normal Forms

Boyce-Codd Normal Form (BCNF)• Strengthens 3NF by requiring the keys in the functional dependencies to be superkeys (a column or columns that uniquely identify a row)

Fourth Normal Form (4NF)• Eliminate trivial multivalued dependencies.

Fifth Normal Form (5NF)• Eliminate dependencies not determined by keys.

SAN DIEGO SUPERCOMPUTER CENTER

Normalizing our webstore (1NF)

customers

itemsitem_id name price inventory34 sweater red 50 2135 sweater blue 50 1056 t-shirt 25 7672 jeans 75 581 jacket 175 9

cust_id name address credit_card_num credit_card_type45 Mike Speedy 123 A St. 45154 visa45 Mike Speedy 123 A St. 32499 mastercard45 Mike Speedy 123 A St. 12834 discover78 Frank Newmon 2 Main St. 45698 visa102 Joe Powers 343 Blue Blvd. 94065 mastercard102 Joe Powers 343 Blue Blvd. 10532 discover

ordersorder_id cust_id item_id quantity cost date status405 45 34 2 100 2/306 shipped405 45 35 1 50 2/306 shipped405 45 56 3 75 2/306 shipped408 78 56 2 50 3/5/06 refunded410 102 72 2 150 3/10/06 shipped410 102 81 1 175 3/10/06 shipped

SAN DIEGO SUPERCOMPUTER CENTER

Normalizing our webstore (2NF & 3NF)

customers credit_cardscust_id name address45 Mike Speedy 123 A St.78 Frank Newmon 2 Main St.102 Joe Powers 343 Blue Blvd.

cust_id num type45 45154 visa45 32499 mastercar

d45 12834 discover78 45698 visa102 94065 mastercar

d102 10532 discover

SAN DIEGO SUPERCOMPUTER CENTER

Normalizing our webstore (2NF & 3NF)

order detailsorder_id item_id quantity cost405 34 2 100405 35 1 50405 56 3 75408 56 2 50410 72 2 150410 81 1 175

itemsitem_id name price inventory34 sweater red 50 2135 sweater blue 50 1056 t-shirt 25 7672 jeans 75 581 jacket 175 9

order_id cust_id date status405 45 2/306 shipped408 78 3/5/06 refunded410 102 3/10/06 shipped

orders

SAN DIEGO SUPERCOMPUTER CENTER

Revisit webstore ER diagram

Orders

Customers

Items

purchase

have

N

N11

name price

name

address

inventory

consists NM

Credit card

card numbercard type

status

date

Order details

consists

N

1

quantity

cost

SAN DIEGO SUPERCOMPUTER CENTER

Structured Query Language

SQL is the standard language for data definition and data manipulation for relational database systems.

• Nonprocedural• Universal

SAN DIEGO SUPERCOMPUTER CENTER

Data Definition Language

The aspect of SQL that defines and manipulates objects in a database.

• create tables• alter tables• drop tables• create views

SAN DIEGO SUPERCOMPUTER CENTER

Create Table

create table customer (cust_id number, name varchar(50) not null, address varchar(256) not null, primary key (cust_id));

create table credit_card (cust_id number not null, credit_card_type char(5) not null, credit_card_num number not null, foreign key (cust_id) references customer);

Customer

have

N

1

nameaddress

Credit card

card numbercard type

SAN DIEGO SUPERCOMPUTER CENTER

Modifying Tables

alter table customer modify name varchar(256);

alter table customer add credit_limit number;

drop table customer;

SAN DIEGO SUPERCOMPUTER CENTER

Data Manipulation Language

The aspect of SQL used to manipulate the data in a database.

• queries• updates• inserts• deletes

SAN DIEGO SUPERCOMPUTER CENTER

Data Manipulation Language

The aspect of SQL used to manipulate the data in a database.

• queries• updates• inserts• deletes

SAN DIEGO SUPERCOMPUTER CENTER

Select command

Used to query data from database tables.

• Format:

Select <columns> From <table>Where <condition>;

SAN DIEGO SUPERCOMPUTER CENTER

Query example

Select name from customers;

result:Mike SpeedyFrank NewmonJoe Powers

customerscust_id name address45 Mike Speedy 123 A St.78 Frank Newmon 2 Main St.102 Joe Powers 343 Blue Blvd.

SAN DIEGO SUPERCOMPUTER CENTER

Query example

select name from customerswhere address = ‘123 A St.’;

result:Mike Speedy

customerscust_id name address45 Mike Speedy 123 A St.78 Frank Newmon 2 Main St.102 Joe Powers 343 Blue Blvd.

SAN DIEGO SUPERCOMPUTER CENTER

Query example

select * from customers where customers.cust_id = credit_cards.cust_idand type = ‘visa’;

returns:

customerscust_id name address45 Mike Speedy 123 A St.78 Frank Newmon 2 Main St.102 Joe Powers 343 Blue Blvd.

credit_cardscust_id num type45 45154 visa45 32499 mastercar

d45 12834 discover78 45698 visa102 94065 mastercar

d102 10532 discover

Cust_id Name Address Cust_id Num type

45 Mike Speedy 123 A St. 45 45154 visa

78 Frank Newmon 2 Main St. 78 45698 visa

SAN DIEGO SUPERCOMPUTER CENTER

Changing Data

There are 3 commands that change data in a table.

Insert:

insert into <table> (<columns>) values (<values>);

insert into customer (cust_id, name) values (3, ‘Fred Flintstone’);

Update:

update <table> set <column> = <value> where <condition>;

update customer set name = ‘Mark Speedy’ where cust_id = 45;

Delete:delete from <table> where <condition>;

delete from customer where cust_id = 45;

SAN DIEGO SUPERCOMPUTER CENTER

Star Schemas

Designed for data retrieval• Best for use in decision support tasks such as Data Warehouses and Data Marts.• Denormalized - allows for faster querying due to less joins. • Slow performance for insert, delete, and update transactions.• Comprised of two types tables: facts and dimensions.

SAN DIEGO SUPERCOMPUTER CENTER

Fact Table

The main table in a star schema is the Fact table.• Contains groupings of measures of an event to be analyzed.

•Measure - numeric data

Invoice Facts

units soldunit amounttotal sale price

SAN DIEGO SUPERCOMPUTER CENTER

Dimension Table

Dimension tables are groupings of descriptors and measures of the fact.

•descriptor - non-numeric data

Customer Dimension

cust_dim_keynameaddressphone

Time Dimension

time_dim_keyinvoice datedue datedelivered date

Location Dimension

loc_dim_keystore numberstore addressstore phone

Product Dimension

prod_dim_keyproductpricecost

SAN DIEGO SUPERCOMPUTER CENTER

Star Schema

The fact table forms a one to many relationship with each dimension table.

Customer Dimension

cust_dim_keynameaddressphone

Time Dimension

time_dim_keyinvoice datedue datedelivered date

Location Dimension

loc_dim_keystore numberstore addressstore phone

Product Dimension

prod_dim_keyproductpricecost

Invoice Facts

cust_dim_keyloc_dim_keytime_dim_keyprod_dim_keyunits soldunit amounttotal sale price

1

1

1

1

N

NN

N

SAN DIEGO SUPERCOMPUTER CENTER

Analyzing the webstore

Order Facts

dateitemscustomers

The manager needs to analyze the orders obtained from the webstore.

• From this we will use the order table to create our fact table.

SAN DIEGO SUPERCOMPUTER CENTER

Webstore Dimension

Item Dimension

item_dim_keynamepriceinventory

We have 2 dimensions for the schema: customers and items.

Customer Dimension

cust_dim_keynameaddresscredit_card_type

SAN DIEGO SUPERCOMPUTER CENTER

Webstore Star Schema

1

N

1

N

Order Facts

dateitemscustomers

Item Dimension

item_dim_keynamepriceinventory

Customer Dimension

cust_dim_keynameaddresscredit_card_type

SAN DIEGO SUPERCOMPUTER CENTER

Books and Reference

•Database Design for Mere Mortals, Michael J. Hernandez

•Information Modeling and Relational Databases,Terry Halpin

•Database Modeling and Design, Toby J. Teorey

SAN DIEGO SUPERCOMPUTER CENTER

Continuing Education

UCSD Extension

Data Management Courses

DBA Certificate Program

Database Application Developer Certificate Program

SAN DIEGO SUPERCOMPUTER CENTER

Data Central

The Data Services Group provides Data Allocations for the research community.

• http://datacentral.sdsc.edu/

•Tools and expertise for making data collections available to the broader scientific community.•Provide disk, tape, and database storage resources.