introduction to database design donghui zhang ccis, northeastern university

44
Introduction to Introduction to Database Design Database Design Donghui Zhang Donghui Zhang CCIS, Northeastern CCIS, Northeastern University University

Upload: india-rimmer

Post on 14-Dec-2015

244 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

Introduction toIntroduction toDatabase DesignDatabase Design

Donghui ZhangDonghui Zhang

CCIS, Northeastern UniversityCCIS, Northeastern University

Page 2: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

OutlineOutline

Database and DBMSDatabase and DBMS Architecture of Database ApplicationsArchitecture of Database Applications Database DesignDatabase Design Database Application ProgrammingDatabase Application Programming

Page 3: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

Database, DBMSDatabase, DBMS A A DatabaseDatabase is a very large, is a very large,

integrated collection of integrated collection of datadata.. A A Database Management System Database Management System

(DBMS)(DBMS) is a is a softwaresoftware designed to designed to store and manage databases.store and manage databases.

A A Database ApplicationDatabase Application is a is a softwaresoftware which enables the users which enables the users to access the database.to access the database.

Page 4: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

Why DBMS?Why DBMS?

We currently live in a world experiencing We currently live in a world experiencing information explosion.information explosion.

To manage the huge amount of data: To manage the huge amount of data: DBMSDBMS

the total RDBMS market in 2003 was $7 the total RDBMS market in 2003 was $7 billion in license revenues.billion in license revenues.

Much more money was spent to develop Much more money was spent to develop Database applications.Database applications.

Page 5: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

RDBMS New Li scence Revenue

0500

10001500200025003000

I BM

Orac

l e

Micr

osof

t

NCR

Tera

data

Othe

rs

#mil

lion

dol

lars

20022003

Total revenue: 7.1 billion in 2003.

Page 6: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

The worldwide database management The worldwide database management software market saw double-digit software market saw double-digit growth in 2004. growth in 2004.

The five-year forecast calls for a The five-year forecast calls for a compound annual growth rate of nearly compound annual growth rate of nearly 6 percent, bringing the market to $12.7 6 percent, bringing the market to $12.7 billion in new license revenue by 2009. billion in new license revenue by 2009.

Title: Forecast: Database Management Title: Forecast: Database Management Systems Software, Worldwide, 2003-2009 Systems Software, Worldwide, 2003-2009

Author: Colleen Graham, GartnerAuthor: Colleen Graham, Gartner Time: April 21, 2005Time: April 21, 2005

Page 7: Introduction to Database Design Donghui Zhang CCIS, Northeastern University
Page 8: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

DBMS can Provide …DBMS can Provide …

Data independence and efficient Data independence and efficient access.access.

Reduced application development Reduced application development time.time.

Data integrity and security.Data integrity and security. Uniform data administration.Uniform data administration. Concurrent access, recovery from Concurrent access, recovery from

crashes.crashes.

Page 9: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

DBMS Historic PointsDBMS Historic Points

First DBMS developed by Turing First DBMS developed by Turing Award winner Award winner Charles BachmanCharles Bachman in in the early 1960s.the early 1960s.

in 1970, Turing Award winner in 1970, Turing Award winner Edgar Edgar CoddCodd proposed the relational data proposed the relational data model.model.

in the late 1980s, IBM proposed SQL.in the late 1980s, IBM proposed SQL.

Page 10: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

OutlineOutline

Database and DBMSDatabase and DBMS Architecture of Database ApplicationsArchitecture of Database Applications Database DesignDatabase Design Database Application ProgrammingDatabase Application Programming

Page 11: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

Components of Data-Intensive SystemsComponents of Data-Intensive Systems

Three separate types of functionality:Three separate types of functionality: Data managementData management Application logicApplication logic PresentationPresentation

Page 12: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

Example: Course EnrollmentExample: Course Enrollment

-- -- Build a system using which students can Build a system using which students can enroll in courses:enroll in courses:

Data ManagementData Management• Student info, course info, instructor info, Student info, course info, instructor info,

course availability, pre-requisites, etc.course availability, pre-requisites, etc. Application LogicApplication Logic

• Logic to add a course, drop a course, create Logic to add a course, drop a course, create a new course, etc.a new course, etc.

PresentationPresentation• Log in different users (students, staff, Log in different users (students, staff,

faculty), display forms and human-readable faculty), display forms and human-readable outputoutput

Page 13: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

The Three-Tier ArchitectureThe Three-Tier Architecture

Database System

Application Server

Client Program (Web Browser)Presentation tier

Middle tier

Data managementtier

Page 14: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

E.g. What we useE.g. What we use

Database System

Application Server

Client Program (Web Browser)Presentation tier

Middle tier

Data managementtier

MySQL

ApacheJSP

Page 15: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

HTML: An ExampleHTML: An Example<HTML><HTML> <HEAD></HEAD><HEAD></HEAD> <BODY><BODY> <h1>Barns and Nobble Internet <h1>Barns and Nobble Internet

Bookstore</h1>Bookstore</h1> Our inventory:Our inventory:

<h3>Science</h3><h3>Science</h3> <b>The Character of Physical <b>The Character of Physical

Law</b>Law</b> <UL><UL> <LI>Author: Richard <LI>Author: Richard

Feynman</LI>Feynman</LI><LI>Published 1980</LI><LI>Published 1980</LI><LI>Hardcover</LI><LI>Hardcover</LI>

</UL></UL>

<h3>Fiction</h3><h3>Fiction</h3>

<b>Waiting for the Mahatma</b><b>Waiting for the Mahatma</b>

<UL><UL>

<LI>Author: R.K. Narayan</LI><LI>Author: R.K. Narayan</LI>

<LI>Published 1981</LI><LI>Published 1981</LI>

</UL></UL>

<b>The English Teacher</b><b>The English Teacher</b>

<UL><UL>

<LI>Author: R.K. Narayan</LI><LI>Author: R.K. Narayan</LI>

<LI>Published 1980</LI><LI>Published 1980</LI>

<LI>Paperback</LI><LI>Paperback</LI>

</UL></UL>

</BODY></BODY>

</HTML></HTML>

Page 16: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

HTML: static vs dynamicHTML: static vs dynamic

Static: you create an HTML file which is Static: you create an HTML file which is sent to the client’s web browser upon sent to the client’s web browser upon request. E.g.:request. E.g.:• your CCIS login is ‘donghui’, your CCIS login is ‘donghui’, • your HTML file is your HTML file is

/home/donghui/.www/index.html/home/donghui/.www/index.html• The URL is The URL is

http://www.ccs.neu.edu/home/donghui Dynamic: the HTML file is generated Dynamic: the HTML file is generated

dynamically via your ASP.NET code.dynamically via your ASP.NET code.

Page 17: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

Another ViewAnother View

MySQL

Machine 1

Apache

Your JSP

Code

Machine 2

Client Machines

Client browser 1

Client browser 2

Client browser 3

Your database

Page 18: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

Client-Server ArchitectureClient-Server Architecture

Data Management: DBMS @ Server.Data Management: DBMS @ Server. Presentation: Client program.Presentation: Client program. Application Logic: can go either way.Application Logic: can go either way.

• If combined with server: If combined with server: thin-clientthin-client architecturearchitecture

• If combined with client: If combined with client: thick-clientthick-client architecturearchitecture

Server Client

Page 19: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

Thin-Client ArchitectureThin-Client Architecture

• Database server and web server too closely Database server and web server too closely coupled,coupled,

• E.g. Does not allow the application logic to E.g. Does not allow the application logic to access multiple databases on different servers.access multiple databases on different servers.

Server Client

Client

Client

Page 20: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

Thick-Client ArchitectureThick-Client Architecture

• No central place to update the business logicNo central place to update the business logic• Security issues: Server needs to trust clientsSecurity issues: Server needs to trust clients• Does not scale to more than several 100s of Does not scale to more than several 100s of

clientsclients

Server Client

Client

Client

Page 21: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

Advantages of the Three-Tier ArchitectureAdvantages of the Three-Tier Architecture Heterogeneous systems Heterogeneous systems

• Tiers can be independently maintained, modified, and Tiers can be independently maintained, modified, and replacedreplaced

Thin clientsThin clients• Only presentation layer at clients (web browsers)Only presentation layer at clients (web browsers)

Integrated data accessIntegrated data access• Several database systems can be handled transparently at Several database systems can be handled transparently at

the middle tierthe middle tier• Central management of connectionsCentral management of connections

ScalabilityScalability• Replication at middle tier permits scalability of business logicReplication at middle tier permits scalability of business logic

Software developmentSoftware development• Code for business logic is centralizedCode for business logic is centralized• Interaction between tiers through well-defined APIs: Can Interaction between tiers through well-defined APIs: Can

reuse standard components at each tierreuse standard components at each tier

Page 22: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

OutlineOutline

Database and DBMSDatabase and DBMS Architecture of Database ApplicationsArchitecture of Database Applications Database DesignDatabase Design Database Application ProgrammingDatabase Application Programming

Page 23: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

ER-ModelER-Model

EntityEntity: Real-world object : Real-world object distinguishable from other objects. distinguishable from other objects. E.g. Students, Courses.E.g. Students, Courses.

An entity has multiple An entity has multiple attributesattributes. . E.g. Students have ssn, name, E.g. Students have ssn, name, phone.phone.

Entities have Entities have relationshipsrelationships with with each other. E.g. Students each other. E.g. Students enrollenroll Courses.Courses.

Page 24: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

Example of ER DiagramExample of ER Diagram

title

unitcidphone

name

ssn

EnrollStudents Courses

time

To implement the above design, store three tables in the database.

Page 25: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

ssnssn namename phonephone11111111 JohnJohn 617-373-5120617-373-5120

22222222 AliceAlice 781-322-6084781-322-6084

33333333 VictorVictor 617-442-7798617-442-7798

Students

cidcid titletitle unitunitCSU430CSU430 Database DesignDatabase Design 44

CSG131CSG131 Transaction ProcessingTransaction Processing 44

CSG339CSG339 Data MiningData Mining 44

Courses

ssnssn cidcid timetime11111111 CSU430CSU430 Fall’03Fall’03

11111111 CSG339CSG339 Spring’04Spring’04

22222222 CSG131CSG131 Winter’03Winter’03

22222222 CSG339CSG339 Spring’04Spring’04

33333333 CSU430CSU430 Winter’01Winter’01

Enroll

Page 26: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

Key Constraint in ER DiagramKey Constraint in ER Diagram

dname

addressdidphone

name

ssn

BelongsToStudents Departments

Many-to-one relationship: no need to be implemented as a table!

Page 27: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

ssnssn namename phonephone diddid11111111 JohnJohn 617-373-5120617-373-5120 11

22222222 AliceAlice 781-322-6084781-322-6084 11

33333333 VictorVictor 617-442-7798617-442-7798 33

Students

diddid dnamedname addressaddress11 Computer ScienceComputer Science #161 Cullinane#161 Cullinane

22 Electrical EngineeringElectrical Engineering #300 Egan#300 Egan

33 PhysicsPhysics #112 Richard#112 Richard

Departments

Page 28: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

Some Other Design ConceptsSome Other Design Concepts

Primary keyPrimary key Participation constraintParticipation constraint Normal forms (BCNF, 3-NF, etc.)Normal forms (BCNF, 3-NF, etc.) IS-A hierarchyIS-A hierarchy Ternary relationshipsTernary relationships

Page 29: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

OutlineOutline

Database and DBMSDatabase and DBMS Architecture of Database ApplicationsArchitecture of Database Applications Database DesignDatabase Design Database Application ProgrammingDatabase Application Programming

Page 30: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

SQL QuerySQL Query

Find the students in Computer Science Department .

SELECT S.nameFROM Students SWHERE S.did=1

• if we know the did is 1:

• otherwise:

SELECT S.nameFROM Students S, Departments DWHERE D.did=S.did AND D.dname=`Computer Science’

Page 31: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

SQL in Application CodeSQL in Application Code SQL SQL commands can be called from commands can be called from

within a host language (e.g., within a host language (e.g., C++C++, , JavaJava) ) program.program.

Two main integration approaches:Two main integration approaches:• Embed SQL in the host language Embed SQL in the host language

(Embedded SQL, SQLJ)(Embedded SQL, SQLJ)• Create special API to call SQL Create special API to call SQL

commands (JDBC)commands (JDBC)

Page 32: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

3232

Implementation of Implementation of Database SystemDatabase System

IntroductionIntroduction

Donghui ZhangDonghui Zhang

Partially using Prof. Hector Garcia-Molina’s slides (Notes01)http://www-db.stanford.edu/~ullman/dscb.html

Page 33: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

3333

Isn’t Implementing a Database Isn’t Implementing a Database System Simple?System Simple?

Relations Statements Results

Page 34: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

3434

Introducing the

Database Management System

• The latest from Megatron Labs• Incorporates latest relational technology• UNIX compatible

Page 35: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

3535

Megatron 3000 Implementation Megatron 3000 Implementation DetailsDetails

Relations stored in files (ASCII)Relations stored in files (ASCII)

e.g., relation R is in /usr/db/Re.g., relation R is in /usr/db/R

Smith # 123 # CSJones # 522 # EE

.

.

.

Page 36: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

3636

Megatron 3000 Implementation Megatron 3000 Implementation DetailsDetails

Directory file (ASCII) in Directory file (ASCII) in /usr/db/directory/usr/db/directory

R1 # A # INT # B # STR …R2 # C # STR # A # INT …

.

.

.

Page 37: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

3737

Megatron 3000Megatron 3000Sample SessionsSample Sessions

% MEGATRON3000 Welcome to MEGATRON 3000!&

& quit%

.

.

.

Page 38: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

3838

Megatron 3000Megatron 3000Sample SessionsSample Sessions

& select * from R #

Relation R A B C SMITH 123 CS

&

Page 39: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

3939

Megatron 3000Megatron 3000Sample SessionsSample Sessions

& select A,B from R,S where R.A = S.A and S.C > 100 #

A B 123 CAR 522 CAT

&

Page 40: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

4040

Megatron 3000Megatron 3000

To execute “To execute “select * from R where select * from R where conditioncondition”:”:

(1) Read directory file to get R attributes(1) Read directory file to get R attributes

(2) Read R file, for each line:(2) Read R file, for each line:

(a) Check condition(a) Check condition

(b) If OK, display(b) If OK, display

Page 41: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

4141

Megatron 3000Megatron 3000

To execute “To execute “select A,B from R,S where select A,B from R,S where conditioncondition”:”:

(1) Read dictionary to get R,S attributes(1) Read dictionary to get R,S attributes

(2) Read R file, for each line:(2) Read R file, for each line:

(a) Read S file, for each line:(a) Read S file, for each line:

(i) Create join tuple(i) Create join tuple

(ii) Check condition(ii) Check condition

(iii) Display if OK(iii) Display if OK

Page 42: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

4242

What’s wrong with the Megatron What’s wrong with the Megatron 3000 DBMS?3000 DBMS?

Expensive update and searchExpensive update and searche.g.,e.g., - To locate an employee with a given SSN, file - To locate an employee with a given SSN, file

scan.scan.

- To change “Cat” to “Cats”, complete file - To change “Cat” to “Cats”, complete file write.write.

• Solution: Indexing!

Page 43: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

4343

What’s wrong with the Megatron What’s wrong with the Megatron 3000 DBMS?3000 DBMS?

Brute force query processingBrute force query processinge.g.,e.g., select *select *

from R,Sfrom R,S

where R.A = S.A and S.B > 1000where R.A = S.A and S.B > 1000

- Do select first?- Do select first?

- More efficient join?- More efficient join?

• Solution: Query optimization!

Page 44: Introduction to Database Design Donghui Zhang CCIS, Northeastern University

4444

What’s wrong with the Megatron What’s wrong with the Megatron 3000 DBMS?3000 DBMS?

No concurrency control or reliabilityNo concurrency control or reliabilitye.g.,e.g., - if two client programs read your bank - if two client programs read your bank

balance ($5000) and add $1000 to it…balance ($5000) and add $1000 to it…

- Crash.- Crash.

• Solution: Transaction management!