1 cop 4710 databases fall, 2000 today’s topic review for final exam david a. gaitros november 6,...

64
1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr. Greg Riccardi Copyright by Dr. Greg Riccardi

Upload: roderick-richardson

Post on 13-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

1

COP 4710 DatabasesFall, 2000

Today’s Topic

Review for Final Exam

David A. GaitrosNovember 6, 2000Department of Computer Science

Copyright by Dr. Greg RiccardiCopyright by Dr. Greg Riccardi

Page 2: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

2

Outline of Course Study of principals and techniques of

databases Grades assigned as in information sheet Examples of use of databases Programming projects in database design and

implementation– Programming in Microsoft Access– Programming in Java with a Unix database– Development of a web site with database support

Course notes in http://www.cs.fsu.edu/cop4710/lectures

Next class, Chapter 2

Page 3: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

3

Representation of Information

Data is collections of bits– physical database

Information is data with meaning– logical database

Representation of meta-data– database system is self-describing

Database Management System (DBMS)– define information content– construct database– manipulate by queries, reports and updates– data plus software

Page 4: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

4

Vocabulary Glossary of terms Define the terms as used in this subject

– Database literature is filled with terms Example of terms

– Data, bits– Information, bits with meaning (type)– Entity– Schema

Page 5: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

5

Data Modeling A data model is a specification of the

information content of a system– conceptual data model describes information in

terms the users will understand– logical data model describes information in a way

that can be used to build a database– physical data model describes information in terms

of its representation in physical storage

Page 6: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

6

Schemas and Instances Schema is the structure of a database

– intention or meaning of the data– data models are schemas– table definitions are schemas– class definitions are schemas

Instances are the contents of a database– extension or values of the data– objects are instances– objects in a database are typically rows in a table

Page 7: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

7

Levels of database schemas Different schemas are presented to different

users

External level

internal to logical mapping

logical to external mappings

disk

Internal Schema Internal level

External View 1 External View 2 External View 3

Logical Schema Logical level

Page 8: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

8

Database Languages

DDL, data definition language, conceptual schema– describe conceptual schemas

SDL, storage definition language, internal schema– describe file structures, indexes

VDL, view definition language, external schema DML, data manipulation language

– High-level or non-procedural (e.g. SQL)• Select Last Name from Roster where Section = 2

– Low-level or procedural• For r in Roster loop

if r.section = 2 thenresult.Add ( r.lastname );

Page 9: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

9

Principals of ER Modeling

Entities and classes– Entity, a thing in the real world– Entity Class, the structure of a collection of

similar entities Attributes

– Attribute, a property of an entity– Each entity has a value for each of its attributes

Types of attributes– simple vs. composite, single-valued vs. multi-

valued, stored vs. derived– domains of attributes

Page 10: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

10

Relationships Between Entities Relationship type defines a set of

associations among given types. Relationsip Instances are particular

relationships among objects. Examples of relationship types in company

database– Manages: 1:1 between employee and department– Works-for: 1:N between department and employee– Controls: 1:N between department and project

Page 11: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

11

Find the Entities, Attributes and Relationships

BigHit VideoRental Receipt

Account Id: 101 Videotape Id: 90987 date: January 9, 1999 cost: $2.99Jane Block1010 Main St.Apopka, FL 30458

Elizabeth date due: January 11, 1999

Page 12: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

12

ER schema diagram for BigHit Video

0/MRents Videotape0/1

Customer

lastName

address

customerIdfirstName

zipcodestatecitystreet

videotapeId title genre

dateRented costdateDue

otherUsers

numberRentals

Page 13: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

13

Chapter 4 The Relational Data Model A Relation is a two-dimensional table

– Fixed list of columns– One object per row

An attribute represents a single column of a table and has a name and a type

A relation schema is the name and the list of attributes of a relation– Grade (studentId, assignmentId, points,

dateSubmitted) A tuple is a row of a table, one value for each

attribute– (123, 14, 27, 5/28/98)

Page 14: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

Characteristics of Relational Model Relation is a set of tuples

– No ordering of tuples– No duplicate tuples

• no two rows have all the same values Each attribute value is atomic

– hence no multiple-valued or composite attributes– called first normal form

Each relation is a set of assertions– Each represents a fact– Some facts are about relationships

That’s it!– no other data structures– no explicit representation of relationships

Page 15: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

15

Representing E-R Model as Relations Entity class Relation schema Entity row of table

– set of all entities of class table Attribute column definition (attribute)

– attribute value table element Relationship type

– relation schema– attribute(s) of relation schema

Page 16: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

16

Rules for Relationship Types One-to-many

– For each one-to-many relationship type R between subject class S and target class T, add the key attributes of class S to class T as foreign keys. Name the attributes using the role that S plays in relationship type R.

– Add the attributes of the relationship type R to class T.

One-to-one– choose one side and use above rule

Examples in class

Page 17: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

17

Many-to-many relationship types Create a relation schema for the relationship

type– foreign key attributes for the key of the related

schema– add attributes of the relationship type

Examples in class!

Page 18: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

18

Representing relationships as attributes One-to-many

– For each one-to-many relationship type R• subject class S (one side) • target class T (many side),

– add the key attributes of S to the schema of T • as foreign keys.

– Name the foreign key attributes• ues the role that S plays in relationship type R.

– Add the attributes of the relationship type R to schema for T.

One-to-one– choose one side and use above rule

Page 19: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

19

Representing Weak Entity Classes Create a relation schema

– Add foreign key for each defining relationship type– Key is partial key plus defining foreign keys

Consider Fig. 2.5, weak class Rental Schema: Rental (videoId,dateDue,

dateRented, cost)– key

• videoId (foreign key)

Videotape1

Customer1

costdateDue dateRented

Has Rental1

MHas

IdentifyingRelationship

Type

Weak EntityClass

ParticipationConstraint

Page 20: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

20

Representing specialization hierarchies Three possibilities

– 1. Create a table for the superclass with its attributes and a table for each subclass with its attributes

– 2. Create a table for the superclass with all of the subclass attributes

– 3. Create a table for each subclass that includes both subclass and superclass attributes

Page 21: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

21

Functional Dependencies and Normalization Begin by discussing good and bad relation

schemas Informal measures of the quality of relation

schema design– Semantics of the attributes– Reducing the redundant values in tuples– Reducing the null values in tuples– Disallowing spurious tuples

Define Normal Forms as formal measures of the quality of schemas– restrictions on the form of relation schemas

Page 22: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

22

Update Anomalies Insertion Anomalies

– When inserting a new owner, we must correctly insert the Manuf field, or will create inconsistencies

– Cannot create a car without an owner– Cannot create a make without a car and an owner

Deletion Anomalies– Deletion of owner of a car also deletes make and

manufacturer of car– Deletion of owner of the last Plymouth deletes relationship

between Plymouth and Chrysler Modification Anomalies

– Changing the make of a car requires consistency check– Cannot change so that a Plymouth is made by Ford

Guideline 2: no insertion, deletion, or modification anomalies allowed!

Page 23: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

23

Some definitions superkey: a set of attributes of a relation

whose values are unique within the relation. key, a superkey in which removal of any

attribute makes it not a superkey. If there is more than one key, they are called candidate keys.

primary key, arbitrarily designated candidate key, all other candidate keys are secondary keys.

prime attribute, one which is a member of any key.

nonprime attribute, one which is not prime.

Page 24: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

24

Definition of Functional Dependency A functional dependency is a constraint between 2

sets of attributes from the database – For each value of the first set there is a unique value of the

second set

X-->Y restricts the tuples that can be instances of R if t1 and t2 are instances of R

– t1(X) = t2(X) then t1(Y) = t2(Y)

For example,– {DLNum} --> {Oname}– {CarId} --> {Make, Manuf}– {Make} --> {Manuf}

Candidate keys are left hand sides of functional dependencies

Page 25: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

25

Second Normal Form (2NF) X-->Y is a full functional dependency if the

removal of any attribute A from X removes the dependency– not X-{A} --> Y

X-->Y is a partial dependency if some attribute A may be removed without removing the dependency– X-{A} --> Y

A relation schema R is in 2NF if every nonprime attribute is fully functionally dependent on the primary key of R

Page 26: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

26

Putting the CarReg Schema into 2NF

Consider the Owner relation schema– {DLNum} is the primary key– Hence Owner is in 2NF

Consider the Car relation schema– {CarId, DLNum} is primary key (multiple owners)– {CarId} --> {Make, Model,...}– Hence Car is not 2NF

Create new relations– CarOwner = {CarId, Owner, PurchDate, TagNum,

RegisDate}– Car = {CarId, Make, Model, Manuf, Year, Color}

Is it 2NF?

Page 27: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

27

Rules for Functional Dependencies Given a particular set of functional

dependencies, we can find others using inference rules – Splitting/combining rules

• A -> B1 B2 <=> A-> B1 and A->B2– Trivial rules

• A B -> B, for all A, B– Transitive rule

• A -> B and B -> C => A B -> C We are interested in the closure of the set of

functional dependencies under these (and other) rules

Page 28: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

28

Inference Rules for Functional Dependency There are semantically obvious functional

dependencies, usually specified by schema designer

Other functional dependencies can be inferred from those

Inference rules– Reflexive, X includes Y, X-->Y– Augmentation, X-->Y then XZ-->YZ– Transitive, X-->Y-->Z then X-->Z– Decomposition, X-->YZ then X-->Y– Union, X-->Y and X-->Z then X-->YZ– Pseudotransitive, X-->Y and WY-->Z then WX-->Z

Page 29: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

29

Definition of Key A set of one or more attributes {A1,...Ak} is a

key for a relation R– Those attributes functionally determine all other

attributes of R• no 2 distinct tuples can agree on the key

– no proper subset of {A1,... Ak} is a key of R• a key must be minimal

There can be more than one key in a relation– Department (DeptName, DeptNo,...)

• since both are unique, both are keys A superkey (superset of a key) is a set of

attributes that functionally determine all other attributes of the relation.

Page 30: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

30

Third Normal Form (3NF)

Based on transitive dependency, or non-key dependency

A functional dependency X-->Y is a transitive dependency if there is a set Z which is not a subset of any key, and for which X-->Z and Z-->Y

A relation schema is in 3NF if there is no nonprime attribute which is functionally dependent on a non-key set of attributes.

Example of {make}-->{manuf} violates 3NF since make is not a key.

Page 31: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

31

Section 6.1 Relational Algebra Look at the formal basis for operations on the

relational data model An “algebra” is a collection of operations on

some domain Relational Algebra is a collection of operators

– operands and results are relations– operators

• projection and selection remove parts of a relation

• set operators, union, intersection and difference• joins and products combine the tuples of two

relations– other operators follow

Page 32: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

32

Join Operations Natural join is based on the cartesian product

– With a restriction on the tuples and attributes• each common attribute appears once in result• tuples are included only where the common

attributes have the same values– R join S on A has those tuples of RS

• where R.A = S.A– Each tuple from R is joined to all tuples of S that

have the same value for attribute A Example

– Every combination of Customer and Rental where the accountId fields match

Page 33: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

33

Combining Operations to Form Queries Can put all operations together

– Names and grades of students who made took quiz 1

We’ll see how this works in in Access In class, time permitting

– Demonstration of Queries in Access

Page 34: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

34

Relational Expressions Select account 113, project videoId and

dateDue videoId, dateDue( accountId=113(Rental))

VideoId, title and date due for account 113 videoId, title, dateDue(( accountId=113(Rental)) videoId Videotape

movieId Movie)

videoId, title, dateDue( accountId=113(Rental videoId Videotape movieId Movie))

What is the order of evaluation?

Page 35: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

35

Chapter 7: SQL Standard Query Language

– ANSI and ISO standard– SQL2 or SQL-92 is current standard

SQL is a data manipulation language (DML) and a data definition language (DDL) and a programming language

We can use SQL for– Logical database specification (database schema

definitions– Physical database specifications (indexes, etc.)– Querying database contents– Modifying database contents

Page 36: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

36

Relational Operations in SQL Select statement

– select <attribute names> from <tables> where <condition>

Projection in SQL using select clause– Select title from Movies

Selection in SQL using where clause– select * from Customer where lastName = 'Doe'– select distinct lastName, firstName from

Customer• no duplicates with distinct

Page 37: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

37

Products and Joins in SQL Cartesian product in SQL using from clause

– Select * from Employee, Timecard Join using from and where clauses

– Select * from Employee, Timecard where Employee.ssn = Timecard.ssn

Join using join and on (non-standard)– Select * from Employee join TimeCard

on Employee.ssn = TimeCard.ssn

Page 38: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

38

Nested Queries Nested select query

– Select videoId, dateAcquired from Videotape where videoId = (

select videoId from Rental where dateRented=‘1/1/99’)

compare with– Select v.videoId, dateAcquired

from Videotape v, Rental r where v.videoId = r.videoId and

dateRented=‘1/1/99’) Same result?

Page 39: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

39

Select Using Group by and Having Group by forms groups of rows with the

same column values What is the average hourly rate by store?

– select storeId, avg(hourlyRate) from HourlyEmployee e, WorksAt wwhere e.ssn = w.ssngroup by stroreId

How many employees work at each store?– select storeId, name, count (*)

from Store s, WorksAt wwhere s.storeId = w.storeIdgroup by storeId, name

Having filters the groups– having count (*)>2

Page 40: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

40

Substrings, arithmetic and order Find a movie with ‘Lion’ in the title

– select title from Movie where title like ‘%Lion%’ List the monthly salaries of salaried

employees who work in in store 3– select salary/12 from Employees e, WorksAt w

where e.ssn=w.ssn and storeId=3 Give the list of employees in store 3, ordered

by salary– select firstName, lastName

from Employees e, WorksAt w where e.ssn=w.ssn and storeId=3

Page 41: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

41

Modifying Content with SQL Insert queries

– insert into Customer values (555, 'Yu', 'Jia','540 Magnolia Hall','Tallahassee', 'FL', '32306')

– insert into Customer (firstName, lastName, accountId) values ('Jia', 'Yu', 555)

Update queries– update TimeCard set paid = true

where paid = false– update HourlyEmployee set hourlyRate =

hourlyRate *1.1 where ssn = '145-09-0967' Samples in Access

Page 42: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

42

Creating Pay Statements with SQL Find the number of hours worked for each

employee entry– select TimeCard.ssn, sum((endTime-

startTime)*24) as hoursWorked from TimeCard where paid=false group by ssn

Create the Pay Statement entries for each Employee– select ssn, hourlyRate, hoursWorked,

hoursWorked * hourlyRate as amountPaid, today from …

Insert into the PayStatement table– Insert into PayStatement select …

Look at the Access example in BigHit.mdb

Page 43: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

43

Create Table Statement create table Customer (

accountId int,lastName varchar(32),firstName varchar(32),street varchar(100),city varchar(32),state char(2),zipcode varchar(9)

) Note that SQL has specific types

Page 44: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

44

Key Constraints in SQL Key declarations are part of create table

– create table Store (storeId int primary key,

– create table Movie (movieId varchar(10) primary key,

– create table Rental (accountId int, videoId varchar(10),primary key (accountId, videoId)

Page 45: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

45

Java Objects and variables Objects are dynamically allocated

– Figures A.1 and A.2 show String variables• Assignment (=) and equality (==)

b. firstName="Jane"; String

Fred

String

Jane

firstName

a. firstName="Fred"; String

Fred

String

Jane

firstName

obj1

obj2

obj3

String

Jane

String

Jane

Page 46: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

46

Java DB Connectivity (JDBC)

Figure 8.4 Strategies for implementing JDBC packages

Intermediarycomputer

Databasecomputer

Local computer JavaProgram

java.sqlpackage

JDBC-ODBCbridge

ODBC databaseclient

Oracle Databaseclient

Middlewareclient

Middlewareserver

Oracleserver

Accessserver

Sybaseserver

Oracle Databaseclient

Sybase Databaseclient

Oracle ODBCclient

Access ODBCclient

JDBC package JDBC package JDBC package

Page 47: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

47

Executing Insert and Update Statements Create new customer, using String +

int rowcount = stmt.executeUpdate(”insert into Customer ” +”(accountId,lastName,firstName)

” +”values

(1239,’Brown’,’Mary’)”);if (rowcount == 0) // insert failed

Update – String updateSQL = “update TimeCard set “ +”TimeCard.paid = 'yes’ where “ +”paid<>'yes’”;int count = stmt.execute(updateSQL);// count is number of rows affected

Page 48: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

48

Chapter 13 Query Processing Strategies for processing queries Query optimization First: How to represent relational DB?

– Each table is a file• Record structure to store tuples• File is a random access collection of records

– Query is executed by reading records from files• Read record, create object in memory• Process object• Write result as a file of records or keep in

memory

Page 49: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

49

Processing a range query Figure 13.3 Illustration of query processing

for query– select * from Customer where accountId >= 101 and accountId < 300

B+ tree index foraccountId

347 903

10 24 35

700 767 876 901

902 903

123 200 246 347

101 110

35 110 347 901 903

Data file

10 ... 24 ... 101 ... 110 ...35 ...

123 ... 200 ... 246 ... 347 ... 700 ... 767 ...

902 ... 903 ...876 ... 901 ...

Page 50: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

50

Using hashing to eliminate duplicates A hash function partitions values so that

– All values that are the same are in the same partition

– Values that are different are often in different partitions

We can find duplicates by hashing – For each tuple in the table

• Mash all attribute values in the tuple into a single value

• Apply hash function – For each partition

• Compare all pairs of tuples• Eliminate duplicates

– Why does this work?

Page 51: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

51

Processing join queries with indexes Indexed nested loop join

while (not customer.eof()) {

Customer c= customer.read();

rental.reset();

while (not rental.eof()) {

Rental r[] = rental.readByAcctId(c.accountId);

for (int i=0; i<r.length; i++) {result.write(c,r[i]);

result.write(c,r[I]);

} } }

Cost is Bc + Rr instead of Bc + Rc × Br

without index Reduce cost by processing a block at a time?

Page 52: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

52

ACID Transactions Atomicity: the property of a transaction that all of the

updates are successful, or there is no update at all. Consistency: each transaction should leave the

database in a consistent state. Properties such as referential integrity must be preserved.

Isolation: each transaction when executed concurrently with other transactions should have the same affect as if it had been executed by itself.

Durability: once a transaction has completed successfully, its changes to the database should be permanent. Even serious failures should not affect the permanence of a transaction.

Page 53: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

53

Example of transactionopen transactionvideoId video1 = select id of a copy of "Star Wars"

if (video1 == null) rollback transactioninsert row into Reservation for video1videoId video2 = select id of a copy of "Return of the Jedi"

if (video2 == null) rollback transactioninsert row into Reservation for video2videoId video3 = select id of a copy of "The Empire Strikes Back"

if (video3 == null) rollback transactioninsert row into Reservation for video3commit transaction

Page 54: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

54

Transaction isolation Consider these transactions

– Actions of T1• A: balance1 = (select balance from Customer where

accountId = 101); balance1 += 5.00;• B: update Customer set balance = ?balance1 where

accountId = 101;– Actions of T2

• A: balance2 = (select balance from Customer where accountId = 101); balance2 += 10.00;

• B: update Customer set balance = ?balance1 where accountId = 101;

Problems– Lost update:

• T1.a, T2.a, T1.b, T2.b– Dirty read:

• T1.a, T1.b, T2.a, T1.rollback, T2.b, and T2 commit – Incorrect Summary: example in class

Page 55: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

55

Locking database objects Allow transaction operations to lock objects

– Read (shared) locks– Write (exclusive) locks

Lock granularity– What size object to lock?– Table, row, field, column

Effect on concurrency– T1:Select sum(balance) from Customers– T2: Update Customers set firstName=‘Joe’ where

accountId=101 Effect on size and cost

– Smaller objects = more locks

Page 56: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

56

Two phase locking (2PL) Locks granted and released in two phases

– Growing phase• Request and upgrade locks

– Request read on X– Request write on X

– Shrinking phase• Release and downgrade locks

– Request read on X (downgrade from write)– Release read on X

2PL guarantees serializability– Any conflicting operation is blocked

Page 57: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

57

Transaction problems Lost update

– Two transactions update, last one persists Dirty read

– One transaction reads a value written by a transaction that subsequently rolls back

Incorrect summary– One transaction calculates an aggregate while another is

updating Unrepeatable read

– One transaction reads the same object twice and receives two different values

Phantom read– A transaction reads a value inserted by another transaction

that subsequently rolls back Deadlock

– Two transactions hold and request

Page 58: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

58

Transactions in SQL Transaction management statements

– set transaction read only;– set transaction read write;– set transaction isolation level serializable;– commit transaction;– rollback transaction;

Executing SQL statement without opening transaction– autocommit mode

Page 59: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

59

Causes of Failure, Possibilities of Recovery Database server

– computer crashes– server program crashes– disk drive corruption

Client failure– computer crashes– client program crashes

Network failure– connection fails, often temporary

Transaction failure– executes rollback (voluntary)– executes illegal operation (server created)– deadlock– introduces errors into the database

Page 60: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

60

Recovery from failure Primary technique, restart from consistent

backup/checkpoint Reprocessing

– ask all committed transactions to execute again Roll Forward

– Back to consistent backup state– Apply redo transaction log

Roll Back– Remove the effect of each transaction with undo

log– Can be used to cancel the effects of rogue

transactions

Page 61: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

61

Security in Relational Database Systems Account security for validation of users

– Database accounts– Operating system accounts

SQL statements for security– create user– alter user– create profile– create role– grant privileges to users, roles

Page 62: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

62

Stored Procedures Define numberRented function

– create function numberRented (accId int) return int as select sum(*) from Rental where Customer.accountId = accId;

Define checkIn procedure– create procedure checkIn (vidId int, cost double)

as begin insert into PreviousRental … Grant privileges to procedures

– grant update on PreviousRental to checkIn– grant checkIn to clerk– revoke update on PreviousRental to public

User in the clerk role can update the table, no one else can

Page 63: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

63

Distributed Database Systems

OsOsnet net = Network Communications = Network Communications

portion of Operating System portion of Operating System

OsOsdm dm = Data management portion of = Data management portion of

Operating SystemOperating System

DDBMS = Distributed Database SystemDDBMS = Distributed Database System

DatabaseDatabase

DatabaseDatabase

DatabaseDatabase

DDBMSDDBMSAP 1AP 1

AP 2AP 2

OSOSnetnet

OSOSdmdm

DDBMSDDBMSAP 1AP 1

AP 2AP 2

OSOSnetnet

OSOSdmdm

DDBMSDDBMSAP 2AP 2

AP 3AP 3

OSOSnetnet

OSOSdmdm

Page 64: 1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr

64

Distributed Databases Single schema with multiple servers

– Not one application connecting to multiple servers– An application connects to a single server

Fragmentation of tables– Horizontal, rows in different servers– Vertical, columns in different servers– Replicated, some rows or columns in multiple

servers Distributed Transactions

– Two phase commit– Discussion in class