san diego supercomputer center introduction to database design july 2006 ken nunes knunes @ sdsc.edu
TRANSCRIPT
SAN DIEGO SUPERCOMPUTER CENTER
Database Design Agenda
•Introductions•General Design Considerations•Entity-Relationship Model•Normalization•Overview of SQL•Star Schemas•Additional Information•Q&A
SAN DIEGO SUPERCOMPUTER CENTER
General Design Considerations
•Users
•Application Requirements
•Legacy Systems/Data
SAN DIEGO SUPERCOMPUTER CENTER
Users
•Who are they?•Administrative•Scientific•Technical
•Impact•Access Controls•Interfaces•Service levels
SAN DIEGO SUPERCOMPUTER CENTER
Application Requirements
•What kind of database?•OnLine Analytical Processing (OLAP)•OnLine Transactional Processing (OLTP)
•Budget•Platform / Vendor•Workflow?
•order of operations•error handling•reporting
SAN DIEGO SUPERCOMPUTER CENTER
Legacy Systems/Data
•What systems are currently in place?•Where does the data come from?•How is it generated?•What format is it in?•What is the data used for?•Which parts of the system must remain static?
SAN DIEGO SUPERCOMPUTER CENTER
Entity - Relationship Model
A logical design method which emphasizes simplicity and readability.
•Basic objects of the model are:•Entities•Relationships•Attributes
SAN DIEGO SUPERCOMPUTER CENTER
Entities
Data objects detailed by the information in the database.
•Denoted by rectangles in the model.
Employee Department
SAN DIEGO SUPERCOMPUTER CENTER
Attributes
Characteristics of entities or relationships.
•Denoted by ellipses in the model.
Name SSN
Employee Department
Name Budget
SAN DIEGO SUPERCOMPUTER CENTER
Relationships
Represent associations between entities.
•Denoted by diamonds in the model.
Name SSN
Employee Department
Name Budget
works in
Start date
SAN DIEGO SUPERCOMPUTER CENTER
Relationship Connectivity
Constraints on the mapping of the associated entities in the relationship.
•Denoted by variables between the related entities.
•Generally, values for connectivity are expressed as “one” or
“many”
Name SSN
Employee Department
Name Budget
work 1N
Start date
SAN DIEGO SUPERCOMPUTER CENTER
Connectivity
Department Managerhas 11
Department Projecthas N1
Employee Projectworks on NM
one-to-one
one-to-many
many-to-many
SAN DIEGO SUPERCOMPUTER CENTER
ER example
Retailer wants to create an online webstore.
•The retailer requires information on:•Customers•Items•Orders
SAN DIEGO SUPERCOMPUTER CENTER
Webstore Entities & Attributes
•Customers - name, credit card, address
•Items - name, price, inventory
•Orders - item, quantity, cost, date, status
Items Orders
inventory
priceName
item quantity
Customers
credit card
name address
costdate status
SAN DIEGO SUPERCOMPUTER CENTER
Webstore Relationships
Identify the relationships.
•The orders are recorded each time a customer purchases items, so the customer and order entities are related.
•Each customer may make several purchases so the relationship
is one-to-many
OrderCustomerN1
purchase
SAN DIEGO SUPERCOMPUTER CENTER
Webstore Relationships
Identify the relationships.
•The order consists of the items a customer purchases but each item can be found in multiple orders.
•Since a customer can purchase multiple items and make multiple orders the relationship is many to many.
Order Itemconsists NM
SAN DIEGO SUPERCOMPUTER CENTER
Webstore ER Diagram
Orders
Customers
Items
purchase
consists
N
N
1
M
item quantity cost name price
credit cardname address
inventory
statusdate
SAN DIEGO SUPERCOMPUTER CENTER
Logical Design to Physical Design
Creating relational SQL schemas from entity-relationship models.
•Transform each entity into a table with the key and its attributes.
•Transform each relationship as either a relationship table (many-to-many) or a “foreign key” (one-to-many and many-to-many).
SAN DIEGO SUPERCOMPUTER CENTER
Entity tables
Transform each entity into a table with a key and its attributes.
Name SSN
Employeecreate table employee
(emp_no number,name varchar2(256),ssn number,primary key (emp_no));
SAN DIEGO SUPERCOMPUTER CENTER
Foreign Keys
Transform each one-to-one or one-to-many relationship as a “foreign key”.
•Foreign key is a reference in the child (many) table to the primary key of the parent (one) table.
create table employee(emp_no number,dept_no number,name varchar2(256),ssn number,primary key (emp_no),foreign key (dept_no) references department);
Employee
Department
has
1
N
create table department(dept_no number,name varchar2(50),primary key (dept_no));
SAN DIEGO SUPERCOMPUTER CENTER
Foreign Key
dept_no Name1 Accounting2 Human Resources3 IT
emp_no dept_no Name1 2 Nora Edwards2 3 Ajay Patel3 2 Ben Smith4 1 Brian Burnett5 3 John O'Leary6 3 Julia Lenin
Department
Employee
Accounting has 1 employee:Brian Burnett
Human Resources has 2 employees:Nora EdwardsBen Smith
IT has 3 employees:Ajay PatelJohn O’LearyJulia Lenin
SAN DIEGO SUPERCOMPUTER CENTER
Many-to-Many tables
Transform each many-to-many relationship as a table.•The relationship table will contain the foreign keys to the related entities as well as any relationship attributes.
create table project_employee_details(proj_no number,emp_no number,start_date date,primary key (proj_no, emp_no),foreign key (proj_no) references projectforeign key (emp_no) references employee);
Employee
Project
has
N
M
Start date
SAN DIEGO SUPERCOMPUTER CENTER
Many-to-Many tables
emp_no dept_no Name1 2 Nora Edwards2 3 Ajay Patel3 2 Ben Smith4 1 Brian Burnett5 3 John O'Leary6 3 Julia Lenin
Project
Employee
Project_employee_detailsproj_no Name1 Employee Audit2 Budget3 Intranet
proj_no emp_no start_date1 4 4/7/033 6 8/12/023 5 3/4/012 6 11/11/023 2 12/2/032 1 7/21/04
Employee Audit has 1 employee:Brian Burnett
Budget has 2 employees:Julia LeninNora Edwards
Intranet has 3 employees:Julia LeninJohn O’LearyAjay Patel
SAN DIEGO SUPERCOMPUTER CENTER
Normalization
A logical design method which minimizes data redundancy and reduces design flaws.
•Consists of applying various “normal” forms to the database design.
•The normal forms break down large tables into smaller subsets.
SAN DIEGO SUPERCOMPUTER CENTER
First Normal Form (1NF)
Each attribute must be atomic• No repeating columns within a row.• No multi-valued columns.
1NF simplifies attributes• Queries become easier.
SAN DIEGO SUPERCOMPUTER CENTER
1NF
Employee (unnormalized)
emp_no name dept_no dept_name skills1 Kevin Jacobs 201 R&D C, Perl, Java2 Barbara Jones 224 IT Linux, Mac3 Jake Rivera 201 R&D DB2, Oracle, Java
emp_no name dept_no dept_name skills1 Kevin Jacobs 201 R&D C1 Kevin Jacobs 201 R&D Perl1 Kevin Jacobs 201 R&D Java2 Barbara Jones 224 IT Linux2 Barbara Jones 224 IT Mac3 Jake Rivera 201 R&D DB23 Jake Rivera 201 R&D Oracle3 Jake Rivera 201 R&D Java
Employee (1NF)
SAN DIEGO SUPERCOMPUTER CENTER
Second Normal Form (2NF)
Each attribute must be functionally dependent on the primary key.
• Functional dependence - the property of one or more attributes that uniquely determines the value of other attributes.• Any non-dependent attributes are moved into a smaller (subset) table.
2NF improves data integrity.• Prevents update, insert, and delete anomalies.
SAN DIEGO SUPERCOMPUTER CENTER
Functional Dependence
Name, dept_no, and dept_name are functionally dependent on emp_no. (emp_no -> name, dept_no, dept_name)
Skills is not functionally dependent on emp_no since it is not unique to each emp_no.
emp_no name dept_no dept_name skills1 Kevin Jacobs 201 R&D C1 Kevin Jacobs 201 R&D Perl1 Kevin Jacobs 201 R&D Java2 Barbara Jones 224 IT Linux2 Barbara Jones 224 IT Mac3 Jake Rivera 201 R&D DB23 Jake Rivera 201 R&D Oracle3 Jake Rivera 201 R&D Java
Employee (1NF)
SAN DIEGO SUPERCOMPUTER CENTER
2NF
emp_no name dept_no dept_name skills1 Kevin Jacobs 201 R&D C1 Kevin Jacobs 201 R&D Perl1 Kevin Jacobs 201 R&D Java2 Barbara Jones 224 IT Linux2 Barbara Jones 224 IT Mac3 Jake Rivera 201 R&D DB23 Jake Rivera 201 R&D Oracle3 Jake Rivera 201 R&D Java
Employee (1NF)
emp_no name dept_no dept_name1 Kevin Jacobs 201 R&D2 Barbara Jones 224 IT3 Jake Rivera 201 R&D
Employee (2NF)emp_no skills1 C1 Perl1 Java2 Linux2 Mac3 DB23 Oracle3 Java
Skills (2NF)
SAN DIEGO SUPERCOMPUTER CENTER
Data Integrity
• Insert Anomaly - adding null values. eg, inserting a new department does not require the primary key of emp_no to be added. • Update Anomaly - multiple updates for a single name change, causes performance degradation. eg, changing IT dept_name to IS• Delete Anomaly - deleting wanted information. eg, deleting the IT department removes employee Barbara Jones from the database
emp_no name dept_no dept_name skills1 Kevin Jacobs 201 R&D C1 Kevin Jacobs 201 R&D Perl1 Kevin Jacobs 201 R&D Java2 Barbara Jones 224 IT Linux2 Barbara Jones 224 IT Mac3 Jake Rivera 201 R&D DB23 Jake Rivera 201 R&D Oracle3 Jake Rivera 201 R&D Java
Employee (1NF)
SAN DIEGO SUPERCOMPUTER CENTER
Third Normal Form (3NF)
Remove transitive dependencies.• Transitive dependence - two separate entities exist within one table.• Any transitive dependencies are moved into a smaller (subset) table.
3NF further improves data integrity.• Prevents update, insert, and delete anomalies.
SAN DIEGO SUPERCOMPUTER CENTER
Transitive Dependence
Dept_no and dept_name are functionally dependent on emp_no however, department can be considered a separate entity.
emp_no name dept_no dept_name1 Kevin Jacobs 201 R&D2 Barbara Jones 224 IT3 Jake Rivera 201 R&D
Employee (2NF)
SAN DIEGO SUPERCOMPUTER CENTER
3NF
emp_no name dept_no dept_name1 Kevin Jacobs 201 R&D2 Barbara Jones 224 IT3 Jake Rivera 201 R&D
Employee (2NF)
emp_no name dept_no1 Kevin Jacobs 2012 Barbara Jones 2243 Jake Rivera 201
Employee (3NF)
dept_no dept_name201 R&D224 IT
Department (3NF)
SAN DIEGO SUPERCOMPUTER CENTER
Other Normal Forms
Boyce-Codd Normal Form (BCNF)• Strengthens 3NF by requiring the keys in the functional dependencies to be superkeys (a column or columns that uniquely identify a row)
Fourth Normal Form (4NF)• Eliminate trivial multivalued dependencies.
Fifth Normal Form (5NF)• Eliminate dependencies not determined by keys.
SAN DIEGO SUPERCOMPUTER CENTER
Normalizing our webstore (1NF)
customers
itemsitem_id name price inventory34 sweater red 50 2135 sweater blue 50 1056 t-shirt 25 7672 jeans 75 581 jacket 175 9
cust_id name address credit_card_num credit_card_type45 Mike Speedy 123 A St. 45154 visa45 Mike Speedy 123 A St. 32499 mastercard45 Mike Speedy 123 A St. 12834 discover78 Frank Newmon 2 Main St. 45698 visa102 Joe Powers 343 Blue Blvd. 94065 mastercard102 Joe Powers 343 Blue Blvd. 10532 discover
ordersorder_id cust_id item_id quantity cost date status405 45 34 2 100 2/306 shipped405 45 35 1 50 2/306 shipped405 45 56 3 75 2/306 shipped408 78 56 2 50 3/5/06 refunded410 102 72 2 150 3/10/06 shipped410 102 81 1 175 3/10/06 shipped
SAN DIEGO SUPERCOMPUTER CENTER
Normalizing our webstore (2NF & 3NF)
customers credit_cardscust_id name address45 Mike Speedy 123 A St.78 Frank Newmon 2 Main St.102 Joe Powers 343 Blue Blvd.
cust_id num type45 45154 visa45 32499 mastercar
d45 12834 discover78 45698 visa102 94065 mastercar
d102 10532 discover
SAN DIEGO SUPERCOMPUTER CENTER
Normalizing our webstore (2NF & 3NF)
order detailsorder_id item_id quantity cost405 34 2 100405 35 1 50405 56 3 75408 56 2 50410 72 2 150410 81 1 175
itemsitem_id name price inventory34 sweater red 50 2135 sweater blue 50 1056 t-shirt 25 7672 jeans 75 581 jacket 175 9
order_id cust_id date status405 45 2/306 shipped408 78 3/5/06 refunded410 102 3/10/06 shipped
orders
SAN DIEGO SUPERCOMPUTER CENTER
Revisit webstore ER diagram
Orders
Customers
Items
purchase
have
N
N11
name price
name
address
inventory
consists NM
Credit card
card numbercard type
status
date
Order details
consists
N
1
quantity
cost
SAN DIEGO SUPERCOMPUTER CENTER
Structured Query Language
SQL is the standard language for data definition and data manipulation for relational database systems.
• Nonprocedural• Universal
SAN DIEGO SUPERCOMPUTER CENTER
Data Definition Language
The aspect of SQL that defines and manipulates objects in a database.
• create tables• alter tables• drop tables• create views
SAN DIEGO SUPERCOMPUTER CENTER
Create Table
create table customer (cust_id number, name varchar(50) not null, address varchar(256) not null, primary key (cust_id));
create table credit_card (cust_id number not null, credit_card_type char(5) not null, credit_card_num number not null, foreign key (cust_id) references customer);
Customer
have
N
1
nameaddress
Credit card
card numbercard type
SAN DIEGO SUPERCOMPUTER CENTER
Modifying Tables
alter table customer modify name varchar(256);
alter table customer add credit_limit number;
drop table customer;
SAN DIEGO SUPERCOMPUTER CENTER
Data Manipulation Language
The aspect of SQL used to manipulate the data in a database.
• queries• updates• inserts• deletes
SAN DIEGO SUPERCOMPUTER CENTER
Data Manipulation Language
The aspect of SQL used to manipulate the data in a database.
• queries• updates• inserts• deletes
SAN DIEGO SUPERCOMPUTER CENTER
Select command
Used to query data from database tables.
• Format:
Select <columns> From <table>Where <condition>;
SAN DIEGO SUPERCOMPUTER CENTER
Query example
Select name from customers;
result:Mike SpeedyFrank NewmonJoe Powers
customerscust_id name address45 Mike Speedy 123 A St.78 Frank Newmon 2 Main St.102 Joe Powers 343 Blue Blvd.
SAN DIEGO SUPERCOMPUTER CENTER
Query example
select name from customerswhere address = ‘123 A St.’;
result:Mike Speedy
customerscust_id name address45 Mike Speedy 123 A St.78 Frank Newmon 2 Main St.102 Joe Powers 343 Blue Blvd.
SAN DIEGO SUPERCOMPUTER CENTER
Query example
select * from customers where customers.cust_id = credit_cards.cust_idand type = ‘visa’;
returns:
customerscust_id name address45 Mike Speedy 123 A St.78 Frank Newmon 2 Main St.102 Joe Powers 343 Blue Blvd.
credit_cardscust_id num type45 45154 visa45 32499 mastercar
d45 12834 discover78 45698 visa102 94065 mastercar
d102 10532 discover
Cust_id Name Address Cust_id Num type
45 Mike Speedy 123 A St. 45 45154 visa
78 Frank Newmon 2 Main St. 78 45698 visa
SAN DIEGO SUPERCOMPUTER CENTER
Changing Data
There are 3 commands that change data in a table.
Insert:
insert into <table> (<columns>) values (<values>);
insert into customer (cust_id, name) values (3, ‘Fred Flintstone’);
Update:
update <table> set <column> = <value> where <condition>;
update customer set name = ‘Mark Speedy’ where cust_id = 45;
Delete:delete from <table> where <condition>;
delete from customer where cust_id = 45;
SAN DIEGO SUPERCOMPUTER CENTER
Star Schemas
Designed for data retrieval• Best for use in decision support tasks such as Data Warehouses and Data Marts.• Denormalized - allows for faster querying due to less joins. • Slow performance for insert, delete, and update transactions.• Comprised of two types tables: facts and dimensions.
SAN DIEGO SUPERCOMPUTER CENTER
Fact Table
The main table in a star schema is the Fact table.• Contains groupings of measures of an event to be analyzed.
•Measure - numeric data
Invoice Facts
units soldunit amounttotal sale price
SAN DIEGO SUPERCOMPUTER CENTER
Dimension Table
Dimension tables are groupings of descriptors and measures of the fact.
•descriptor - non-numeric data
Customer Dimension
cust_dim_keynameaddressphone
Time Dimension
time_dim_keyinvoice datedue datedelivered date
Location Dimension
loc_dim_keystore numberstore addressstore phone
Product Dimension
prod_dim_keyproductpricecost
SAN DIEGO SUPERCOMPUTER CENTER
Star Schema
The fact table forms a one to many relationship with each dimension table.
Customer Dimension
cust_dim_keynameaddressphone
Time Dimension
time_dim_keyinvoice datedue datedelivered date
Location Dimension
loc_dim_keystore numberstore addressstore phone
Product Dimension
prod_dim_keyproductpricecost
Invoice Facts
cust_dim_keyloc_dim_keytime_dim_keyprod_dim_keyunits soldunit amounttotal sale price
1
1
1
1
N
NN
N
SAN DIEGO SUPERCOMPUTER CENTER
Analyzing the webstore
Order Facts
dateitemscustomers
The manager needs to analyze the orders obtained from the webstore.
• From this we will use the order table to create our fact table.
SAN DIEGO SUPERCOMPUTER CENTER
Webstore Dimension
Item Dimension
item_dim_keynamepriceinventory
We have 2 dimensions for the schema: customers and items.
Customer Dimension
cust_dim_keynameaddresscredit_card_type
SAN DIEGO SUPERCOMPUTER CENTER
Webstore Star Schema
1
N
1
N
Order Facts
dateitemscustomers
Item Dimension
item_dim_keynamepriceinventory
Customer Dimension
cust_dim_keynameaddresscredit_card_type
SAN DIEGO SUPERCOMPUTER CENTER
Books and Reference
•Database Design for Mere Mortals, Michael J. Hernandez
•Information Modeling and Relational Databases,Terry Halpin
•Database Modeling and Design, Toby J. Teorey
SAN DIEGO SUPERCOMPUTER CENTER
Continuing Education
UCSD Extension
Data Management Courses
DBA Certificate Program
Database Application Developer Certificate Program
SAN DIEGO SUPERCOMPUTER CENTER
Data Central
The Data Services Group provides Data Allocations for the research community.
• http://datacentral.sdsc.edu/
•Tools and expertise for making data collections available to the broader scientific community.•Provide disk, tape, and database storage resources.