CS370 Spring 2007
CS 370 Database SystemsCS 370 Database Systems
Lecture 1 Overview of Database
Systems
CS370 Spring 2007
Questions in your mind…Questions in your mind…
• What is in this subject?
• Why we are studying this subject?
• What we shall get from this subject?
CS370 Spring 2007
Introduction to DatabasesIntroduction to Databases
• Objectives:– What is Data and Information.– The characteristics of file-based systems.– The problems with the file-based approach.– The meaning of the term database.– The meaning of the term Database Management
System (DBMS).– The typical functions of a DBMS.– The major components of the DBMS environment.– The problems involved in the DBMS environment.– The history of the development of DBMS.– The advantages and disadvantages of DBMS.
CS370 Spring 2007
Important Terms to RememberImportant Terms to Remember
• Database: organized collection of logically related data
• Data: stored representations of meaningful objects and events– Structured: numbers, text, dates
– Unstructured: images, video, documents
• Information: data processed to increase knowledge in the person using the data
• Metadata: data that describes the properties and context of user data
CS370 Spring 2007
Data Management ExampleData Management Example
• Suppose– You are a video store owner.– Customers rent video tape copies of movies.– Several copies of each movie.
• Needs– Which tapes has a customer rented?– Are any tapes overdue?– When will a tape become available?
CS370 Spring 2007
Solution: File – based SystemSolution: File – based System
• Edit rented.txt file
• Advantages– Text editors are easy to use– Simple to insert a record– Simple to delete a record
CS370 Spring 2007
Complications: QueriesComplications: Queries
• Does not address to needs• Query: What movies has Ali Raza rented?• Execute (not quite right): Search for ‘Ali Raza’.
• Query: Are any tapes overdue? Execute: ???
• Requirements– Robust, sophisticated query language– Clear separation between data organization
(schema) and data
DBMS Concept
DML
SQL
CS370 Spring 2007
Complications: Multiple usersComplications: Multiple users
• Two clerks edit rented.txt file at the same time.– Ahmed starts to edit rented.txt, reads it into memory.– Sarah starts to edit rented.txt.– Ahmed adds a record.– Ahmed saves rented.txt to disk.– Sarah saves rented.txt to disk.
Ahmed’s added record disappears!
• Requirements– Must support multiple readers and writers.– Updates to data must (appear to) occur in serial
DBMS Concept
Locks
Concurrency control
CS370 Spring 2007
Complications: CrashesComplications: Crashes
• Crash during update may lead to inconsistent state.
• Some body makes 250 of 500 edits to change records
• Before he saves it, Windows crashes!
• Requirements– Must update on all or none basis.– Implemented by commit or rollback if necessary.
DBMS Concepts
Locks, Transactions
Commit, Rollback
Recovery
CS370 Spring 2007
Persistent storage
RentedTapefile
InventoryMaster
file
Customerfile
Tape rentalcheck in
New tapeordering
Customerinfo
FILE-BASED SYSTEM
CS370 Spring 2007
Limitations of File – based ApproachLimitations of File – based Approach
• Separation and isolation of data– Each program maintains its own set of data. Users of one
program may be unaware of potentially useful data held by other programs.
• Duplication of data– Same data is held by different programs. Wasted space
and potentially different values and/or different formats for the same item.
• Data dependence– File structure is defined in the program code.
• Incompatible file formats– Programs are written in different languages, and so cannot
easily access each others files.
CS370 Spring 2007
Database ApproachDatabase Approach
• Definition of data was embedded in application
programs, rather than being stored separately
and independently.
• No control over access and manipulation of
data beyond that imposed by application
programs.
• ResultThe database and Database Management System (DBMS)
CS370 Spring 2007
Database ApproachDatabase Approach
• INFORMATION– Information can be defines as data that has been
organized in such a way as to be useful for someone or some use e.g.
“A telephone directory is an information source”
• PROCESSING– It is the activity when a computer converts data into
information.
• TYPES OF PROCESSING– Sorting, Searching, Filtering and Aggregating
CS370 Spring 2007
Database ApproachDatabase Approach
• SORTING– Recording data in a way so that it is easier to find data
items.
• SEARCHING– Finding a particular data from among many(thousands
even millions )
• FILTERING– Selecting a smaller set of data items.
• AGGREGATING– Grouping, adding, counting etc of data items to produce a
summary of the data.
CS370 Spring 2007
Sources of DataSources of Data
• Data from orders placed by customers• From public sources such as libraries• Or more recently the Internet• From commercial sources that provide specialised data
such as mailing lists. • CHARACTERISTIC OF USEFUL INFORMATION
– It -- should be:-• Up to date • on time• Relevant• Complete• Consistent• Presented in a usable way and Secured against unauthorised access
CS370 Spring 2007
What is a Database?What is a Database?
• A database is a well-organized collection of data that are related in a meaningful way, which can be accessed in different logical orders but are stored only once. The data in the database is therefore integrated, structured, and shared.
• The main features of data in a database therefore are:– It is well organized – It is related– It is accessible in different orders without great difficulty– It is stored only once
CS370 Spring 2007
Database Users?Database Users?
• Database administrators:– Responsible for authorizing access to the database, for
coordinating and monitoring its use, acquiring software, and hardware resources, controlling its use and monitoring efficiency of operations
• Database Designers:– Responsible to define the content, the structure, the
constraints, and functions or transactions against the database. They must communicate with the end-users and understand their needs
• End-users:– They use the data for queries, reports and some of them
actually update the database
CS370 Spring 2007
Categories of End UsersCategories of End Users
• Casual:– Access database occasionally when needed
• Naïve or Parametric:– They make up a large section of the end-user population. They use
previously well-defined functions in the form of “canned transactions” against the database. Examples are bank-tellers or reservation clerks who do this activity for an entire shift of operations.
• Sophisticated:– These include business analysts, scientists, engineers, others thoroughly
familiar with the system capabilities. Many use tools in the form of software packages that work closely with the stored database.
• Stand-alone:– Mostly maintain personal databases using ready-to-use packaged
applications. An example is a tax program user that creates his or her own internal database.