csc 370 – database systems introduction. in essence a database is nothing more than a collection...

18
CSC 370 – CSC 370 – Database Systems Database Systems Introduction Introduction

Upload: kenneth-cheever

Post on 02-Apr-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSC 370 – Database Systems Introduction. In essence a database is nothing more than a collection of information that exists over a long period of time

CSC 370 – CSC 370 – Database Systems Database Systems

IntroductionIntroduction

Page 2: CSC 370 – Database Systems Introduction. In essence a database is nothing more than a collection of information that exists over a long period of time

• In essence a database is nothing more than a collection of information that exists over a long period of time.

• Databases are empowered by a body of knowledge and technology embodied in specialized software called a database management system, or DBMS.• A DBMS is a powerful tool for creating and managing large amounts of

data efficiently and allowing it to persist over long periods of time, safely.

• Among the most complex types of software available.

What’s a database?

Page 3: CSC 370 – Database Systems Introduction. In essence a database is nothing more than a collection of information that exists over a long period of time

1. Allows users to create new databases and specify their schema (logical structure of the data), using a data-definition language.

2. Enables users to query and modify the data, using a query language and data-manipulation language.

3. Supports intelligent storage of very large amounts of data.

• Protects data from accident or not proper use. • Example: We can require from the DBMS to not allow the insertion of two

different employees with the same SIN.

• Allows efficient access to the data for queries and modifications. • Example: Indexes over a specified fields

4. Controls access to data from many users at once (concurrency), without allowing “bad” interactions that can corrupt the data accidentally.

5. Recovers from software failures and crashes.

The database [management] system

Page 4: CSC 370 – Database Systems Introduction. In essence a database is nothing more than a collection of information that exists over a long period of time

• The first commercial database systems evolved from file systems.

• The file systems allow the storage of big amounts of data (not safely though).

• But file systems do not provide a query language for the data in files. • People had to write programs in order to extract even the most elementary

information from a set of files.  

Example: Suppose we have stored in a file called Employees records having the fields:

(code, name, dept_code)and in another file called Departments records having the fields:

(dept_code, dept_name)

 Suppose now that given an employee, for instance with name

“Smith”, we want to find what department is he working for.

Early database systems and file syst.

Page 5: CSC 370 – Database Systems Introduction. In essence a database is nothing more than a collection of information that exists over a long period of time

In the absence of a query language we have to write a program which will:1. open the file Employees2. declare a variable of the same type as the records stored in the file 3. scan the file:

while the end of the file is not yet encountered assign the current record to above variable.

4. If the value of the name field is “Smith” get the value of the dept_code field. Suppose it is “10000”

5. Search in a similar way for a record with “10000” for the dept_code field in the Department file.

6. Print the dept_name when successfully finding the dept_code.

Very painful procedure even for the simplest queries. Compare it to the short and elegant SQL query  

SELECT dept_nameFROM Employees, DepartmentWHERE Employees.name=”Smith” AND

Employees.dept_code = Department.dept_code

Cont.

Page 6: CSC 370 – Database Systems Introduction. In essence a database is nothing more than a collection of information that exists over a long period of time

• The ones where the data was composed of • many small items, and • many queries or modifications were made.

• Airline reservation systems• Banking systems• Corporate records

The first important applications of DBMS’s

Page 7: CSC 370 – Database Systems Introduction. In essence a database is nothing more than a collection of information that exists over a long period of time

• Here the items of data include:– Reservations by a single customer on a single flight, including such information as

assigned seat…– Flights information – the airport they fly from and to, their departure and arrival

times…– Ticket information – prices, requirements, and availability.

• Typical queries ask for:– Flights leaving about a certain time from one given city to another, what seats are

available, and at what prices.

• Typical data modifications include:– Making a reservation in a flight for a customer, assign a seat etc.

• Many agents will be accessing parts of the data at any given time. • The DBMS must allow concurrent accesses preventing problems such as two

agents assigning the same seat simultaneously. • Also, the DBMS should protect against loss of records if the system

suddenly fails.

Airline Reservation Systems

Page 8: CSC 370 – Database Systems Introduction. In essence a database is nothing more than a collection of information that exists over a long period of time

• Data items include:– Customers, their names, addresses etc.– Accounts, and their balances– Loans, and their balances– Connections between customers and their accounts and loans.

• Typical queries are those for account and loan balances.

• Typical modifications are those representing a payment from or deposit to an account.

• In banking systems failures cannot be tolerated.

– E.g, once the money has been ejected from an ATM machine, the bank must record the debit, even if the power immediately fails.

– On the other hand, it is not permissible for the bank to record the debit and then not to deliver the money because the power fails.

– The proper way to handle this operation is far from obvious and is one of the significant achievements in DBMS architecture.

Banking Systems

Page 9: CSC 370 – Database Systems Introduction. In essence a database is nothing more than a collection of information that exists over a long period of time

• They encouraged the user to view the data much as it was stored.

• The chief models were the Hierarchical and Network.

• The main characteristic of these models was the possibility of easy jumping or navigating from one object to another through pointers.– E.g. From one employee to his department.

• However these models didn’t provide a high-level query language for the data.

– So, one had still to write programs for querying the data.

• Also they didn’t allow on-line schema modifications.

Early DBMS’s (1960’s)

Page 10: CSC 370 – Database Systems Introduction. In essence a database is nothing more than a collection of information that exists over a long period of time

• Codd (1970): A database system should present the user with a view of data organized as tables (also called relations).

• Behind the scene there could be a complex data structure that allows rapid response to a variety of queries.• But the user would not be concerned with the storage structure.

• Queries could be expressed in a very high-level language, which greatly increases the efficiency of database programmers.• This high-level query language for relational databases is called:

Structured Query Language (SQL)

Relational databases

Page 11: CSC 370 – Database Systems Introduction. In essence a database is nothing more than a collection of information that exists over a long period of time

• Relations = Tables. The columns are “headed” by attribute names. • A relation Accounts might be:

Example of a Relational DB

• Below the attributes are the rows, or tuples. • Suppose we want to know the balance of account “67890”. We could ask this

query in SQL as in (1).

accountNo balance type12345 1000.00 savings67890 2846.92 checking

… … …

• For another e.g., we ask for the sav. accounts with neg. balances (2).

• We examine all the tuples of the relation Accounts in FROM-clause.• Pick out those tuples that satisfy some criterion in the WHERE-clause,• Produce as an answer certain attributes of those tuples, as indicated in the

SELECT-clause.

SELECT balanceFROM AccountsWHERE accountNo = 67890;

SELECT accountNoFROM AccountsWHERE type = ‘savings’ AND balance < 0;

Page 12: CSC 370 – Database Systems Introduction. In essence a database is nothing more than a collection of information that exists over a long period of time

• The “cylindrical” component contains not only data, but also metadata, i.e. info about the structure of data.

• If DBMS is relational metadata includes:

– names of relations,

– names of attributes of those relations, and

– data types for those attributes (e.g., integer or character string).

• A database maintains indexes for the data.

– Indexes are part of the stored data.

– Description of which attributes have indexes is part of the metadata.

Architecture of a DBMS

Page 13: CSC 370 – Database Systems Introduction. In essence a database is nothing more than a collection of information that exists over a long period of time

• Similar to a book indexes. • A book index associates words with page numbers where they appear. • A database index associates values of some object field(s) with the

physical address of the corresponding objects in the disk.

• Two are the main properties of an index: a) it is sorted, and

b) its size is much smaller than the record set being indexed.

• Hence, searching in an index is much faster than searching in the corresponding record set.

A few words about indexes

Page 14: CSC 370 – Database Systems Introduction. In essence a database is nothing more than a collection of information that exists over a long period of time

• The job of the storage manager is to

– obtain data from the data storage, and

– modify the data to the data storage when requested.

• Storage manager has two components:

– File manager handles files. Keeps track of the location of files Obtains block(s) of a file on request

from the buffer manager.

– Buffer manager handles main memory. Obtains and returns blocks of data

from/to the file manager Stores blocks temporarily in main

memory pages.

• 1 block = 1 page = 4,000 to 16,000 bytes.  – Smallest unit of data that is read/written

from/to disk.

Storage Manager

Page 15: CSC 370 – Database Systems Introduction. In essence a database is nothing more than a collection of information that exists over a long period of time

• Query processor handles: queries and modifications to the data.• Finds the best way to carry out a

requested operation and• Issues commands to the storage

manager that will carry them out.

• E.g. A bank has a DB with two relat.:  Customers (name, SIN, address), Accounts (accountNo, balance, SIN)

Query: “Find the balances of all accounts of which Sally is the owner.”SELECT Accounts.balanceFROM Customers, AccountsWHERE Customers.SIN = Accounts.SIN

AND Customers.name = 'Sally';

Query Processor

Page 16: CSC 370 – Database Systems Introduction. In essence a database is nothing more than a collection of information that exists over a long period of time

• What this query logically says is:1. Make Cartesian product of tables specified in the FROM-clause, 2. Choose from R the tuples satisfying the condition in the WHERE clause.3. Produce as answer only the values of attributes in SELECT-clause.

• If answer this query as it says the performance would be terrible.– Because of the usually enormous Cartesian product.

• Suppose we have – Index on name of Customer and

– Index on SIN of Accounts. • Then, query processor will cleverly create a plan which

inexpensively:– Retrieves the tuple for “Sally” and gets the SIN number. – Retrieves the account tuples for this SIN number.

Query Processor (Cont.)

Page 17: CSC 370 – Database Systems Introduction. In essence a database is nothing more than a collection of information that exists over a long period of time

• Transaction manager is responsible for the integrity of the system. It must assure that:– several queries running

simultaneously do not interfere with each other and that,

– the system will not lose data even if there is a power failure.

• Transaction manager interacts with:• query manager,

– Because it may need to delay certain query operations to avoid conflicts.

• storage manager – Because schemes for protecting

data involve storing a log of changes to the data.

Transaction Manager

Page 18: CSC 370 – Database Systems Introduction. In essence a database is nothing more than a collection of information that exists over a long period of time

Database Studies• Design of databases.

– What kinds of information go into the database?

– How is the information structured?

– How do data items connect?

• Database programming. – How does one express queries on the database?

– How does one use other capabilities of a DBMS, such as transactions or constraints, in an application?

– How is database programming combined with conventional programming?

• Database system implementation. – How does one build a DBMS, including such matters as query

processing, transaction processing and organizing storage for efficient access?