© michael lang, national university of ireland, galway 1 introduction to database systems

48
© Michael Lang, National University of Ireland, Galway Introductio n to Database Systems

Upload: jeremy-weaver

Post on 11-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway1

Introduction to

Database Systems

Page 2: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway2

Today’s session• Some history

• Origins of DB

• Organisational structure

• Organisational information systems

• Data storage

• DBMS

• Data models

Page 3: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway3

What is “Data”?• ANSI definition:

Data A representation of facts, concepts, or instructions in a

formalized manner suitable for communication, interpretation, or processing by humans or by automatic means.

Any representation such as characters or analog quantities to which meaning is or might be assigned. Generally, we perform operations on data or data items to supply some information about an entity.

• Volatile vs. persistent data Our concern is primarily with persistent data

Page 4: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway4

Early Data Management - Ancient History

• Data are not stored on disk• Programmer defines both logical data structure and physical

structure (storage structure, access methods, I/O modes, etc)• One data set per program. High data redundancy.

PROGRAM 2

DataManagement

PROGRAM 3

DataManagement

PROGRAM 1

DataManagement

DATA SET 1

DATA SET 2

DATA SET 3

Page 5: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway5

Problems• There is no persistence.

All data is transient and disappears when the program terminates.

• Random access memory (RAM) is expensive and limited All data may not fit available memory

• Programmer productivity low The programmer has to do a lot of tedious work.

Page 6: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway6

File Processing - Recent (and Current)

History• Data are stored in files with interface between programs and files.• Various access methods exist (e.g., sequential, indexed, random)• One file corresponds to one or several programs.

PROGRAM 1

DataManagement FILE 1

FILE 2 Red

unda

nt D

ata

PROGRAM 2

DataManagement

PROGRAM 3

DataManagement

FileSystemServices

Page 7: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway7

File System Functions

• Mapping between logical files and physical files Logical files: a file viewed by users and programs.

• Data may be viewed as a collection of bytes or as a collection of records (collection of bytes with a particular structure)

• Programs manipulate logical files Physical files: a file as it actually exists on a storage

device.

• Data usually viewed as a collection of bytes located at a physical address on the device

• Operating systems manipulate physical files.

• A set of services and an interface (usually called application independent interface – API)

Page 8: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway8

Problems With File Systems

• Data are highly redundant sharing limited and at the file level

• Data is unstructured “flat” files

• High maintenance costs data dependence; accessing data is difficult ensuring data consistency and controlling access

to data

• Sharing granularity is very coarse

• Difficulties in developing new applications

Page 9: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway9

PROGRAM 1

PROGRAM 1

PROGRAM 2

IntegratedDatabase

DBMS

Query Processor

Transaction Mgr

Database Approach

Page 10: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway10

Origins of Databases• c1455: Gutenberg invents printing. Explosive interest

in publication of books (analogous with explosive growth of Web in early 1990s) leads to public libraries

• Libraries were first to introduce standards for information storage and retrieval

• These paper-based systems were extended and enhanced, and filing, indexing and classification schemes were developed

• Second World War: accelerated R&D in computing technologies spawned capability to computerise maintenance of records

Page 11: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway11

Computerised Data Storage

• Advantages of computerised data storage over paper-based systems include: ability to store data compactly (e.g Britannica CD) enhanced data retrieval ability to access data remotely, e.g. from a mobile

workstation, off-site location, or distant branch ability to share data amongst multiple users with

concurrent access facility to automate regular, speedy back-ups enhanced data editing

• Most significant disadvantage is vulnerability e.g system crash, corrupted data, viruses, hackers, etc.

Page 12: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway12

Information: A Corporate Asset

• Information is a vital corporate asset. Without accurate, current, relevant information, mistakes and misjudgements may be made

• Data management is an essential capability in the modern business environment / information society

• A knowledge organisation is one in which the primary asset is information; its competitive advantage is derived from effective use of documented knowledge. Examples: accounting firms, marketing companies, software houses

• Organisational memory extends and amplifies information / knowledge by capturing, organising, disseminating, and reusing it

Page 13: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway13

Organisational MemoryOrganisational Memory

Field(part of a record reserved for a particular data item)

File / Table(collection of related records)

Database(logical grouping of related files)

Record(collection of related fields, bound together as single units)

Byte(group of eight binary bits - can represent a single character)

Page 14: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway14

Organisational Memory

Organisational memory

Data

Improved products and

services

Informed decisions

Page 15: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway15

Attributes of Organisational Memory

• Current

• Timely real-time systems are commonplace in modern

business environment

• Relevant data is only useful if relevant to task in hand

• Shareable

• Complete

• Accurate and consistent

Page 16: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway16

Attributes of Organisational Memory

• Transportable authorised personnel should have access to data

anywhere, anytime

• Secure against unauthorised access

Page 17: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway17

Organisational Information Systems

• Generally, there is an alignment between business units and core operational systems

• Typical core systems are: Sales & Marketing Department: Customer

management system, Order processing system

Operations Unit: Purchasing system, Inventory control system

Finance Department: Accounts payable and receivable systems, Credit Management system

• There are interdependencies between these systems; hence the need for an integrated data management approach

Page 18: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway18

Data Storage• Information systems create, read, update and

delete data

• Data can be stored in conventional files or databases Files are collections of similar records

Databases are collections of interrelated files. • Records can be linked through specified relationships to

records in other files

Page 19: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway19

Conventional Files• In the file environment, data

storage is built around the applications that will use the files

• Essentially, the file “belongs” to a specific application. This is termed program-data dependence

• As applications are developed, customised files are created which may be unusable by other applications

InformationSystem

FileFile

File

File

Page 20: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway20

Conventional Files• First attempts at computerised storage of records

followed traditional paper-based metaphors (“Flat file” systems)

• Flat files were inefficient for data retrieval: it might be necessary to search entire file for a record (which, it may transpire, does not exist). Remedy: index files

• Indexing improved data retrieval, but conventional files have other disadvantages: Program-Data dependence

Proprietary file formats (closed systems)

Poor scalability

Page 21: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway21

Conventional Files• Disadvantages (Cont’d):

Duplicated and redundant data• ambiguity: same thing being referred to by different names in different

places• inconsistency: conflicting / unsynchronised data• wasted effort

Separation and isolation of data• data dispersed amongst many files complicates processing

Inflexibility• cumbersome data structures and report layouts• not responsive to ad hoc queries• excessive program maintenance

Development environment• procedural -v- non-procedural (3GL -v- 4GL)

Page 22: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway22

Conventional Files• Advantages:

Historically, conventional files have been faster to process than DBMS applications

• As legacy file-based systems become candidates for reengineering, the trend is to replace them with database systems

Page 23: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway23

Database Management Systems

• A database is a large, integrated collection of data which models a real-world enterprise

• A Database Management System (DBMS) is a software package designed to store and manage databases

• In a DBMS environment, applications are built around an integrated adaptable database

Information System

Database

Information System

Information System

Information System

Page 24: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway24

• Advantages: ability to share the same data across multiple applications and

systems

data independence

control of redundancy

enforced data integrity

improved data security

uniform data administration

concurrent access

improved backup and recovery facility

flexible data structures

Database Management Systems

Page 25: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway25

Database Management Systems

• Advantages (Cont’d): databases allow the use of the data in ways not

originally specified by the end-users (ad hoc queries)

database definition can be extended without impacting existing programs that use it

economies of scale

Page 26: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway26

• Disadvantages: database technology is more complex than file

technology• requires more sophisticated hardware and software (DBMS)

DBMS’s can still be slower than file-based systems

database technology requires a significant investment• database administration• operating costs and ongoing maintenance• end-user training

higher impact of system failure

Database Management Systems

Page 27: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway27

Database Management Systems

• Roles in a DBMS environment Data Administrator

Database Administrator

Database Designer

Application Programmer

End-User

Page 28: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway28

Database Architectures

• Hierarchical Data Model

• Network Data Model

• Relational Data Model

• Object-Relational Model

Page 29: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway29

Data Relationships• One-to-One (1:1)

Example: Bank Manager manages one and only one Bank Branch; Bank Branch is managed by one and only one Bank Manager

• One-to-Many (1:M) Example: An Employee works in a designated

Department; in any Department, there may be many Employees

• Many-to-Many (M:M) Example: A Student registers for one or more

Courses; for any Course, there may be one or more registered Students

Page 30: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway30

Hierarchical Data Model

Page 31: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway31

Hierarchical Data Model

• Arose as a solution to a practical problem Managing millions of parts for the space program

(standard “Bill of Materials” problem)

• The basic structure is a hierarchy or tree Parent-child relationship

• Relationships are represented by pointers

• Restrictions: Each segment has at most one parent

All relationships are 1:M

Page 32: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway32

Hierarchical Data Model

• Problem: how to represent a M:M relationship ? A hierarchical structure (tree) can only support

1:M relationships. Therefore, to represent M:M, must create multiple hierarchies

… but, this means that records are duplicated in different hierarchies

duplication gives rise to data anomalies

duplication can be eliminated using pointers; must decide in which table to store data, and in which table to store pointer. This is a very awkward means of implementing M:M

Page 33: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway33

Network Data Model

Page 34: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway34

Network Data Model• Objective was to overcome shortcomings of the

hierarchical data model, in particular, representation of M:M relationship

• Like the hierarchical model, it can be likened to trees; unlike hierarchical model, several trees can share branches

• In practice, enjoyed little commercial success Too complex: suited to use by programmers as

opposed to end-users

Overtaken by the relational model

No clear theoretical base

Page 35: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway35

Network Model: Structures

• Data item a field or attribute

• Record a collection of data items

• Vectors (repeating structures) are permitted

• Relationships are represented by sets

• Sets have owners (parent) and members (children)

• A member cannot have two parents in the one set cannot directly represent M:M relationships

• Member records of a set can be ordered

Page 36: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway36

Shortcomings of Early Models

• Languages of both hierarchical and network are procedural and record-at-a-time

• To retrieve data … you must navigate (find a path) to the required

record

issue multiple statements directing the system to traverse that path

• It was necessary to issue multiple requests to the DBMS to retrieve a data item

• It was necessary to have a detailed understanding of how data was stored and structured this is contrary to data independence principle

Page 37: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway37

Relational Data Model• Relational model, developed by E. F. Codd, has a strong

theoretical basis and overcomes shortcomings of network / hierarchical models

• Relational model is non-procedural (What?, not How?), and set-at-a-time

• Navigation is automatic• Relations

2-dimensional data set consisting of N columns (fields / attributes) and M rows (records)

• All rows in a relation are unique• A relational database is a set of relations

Page 38: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway38

Relational Model: Structures

Page 39: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway39

Relational Model: Structures

Page 40: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway40

Relational Model: Structures

• Primary key A unique identifier of a row in a relation; can be

composite

• Candidate key An attribute that could be a primary key

• Alternative key A candidate key that is not selected as the primary key

• Foreign key An attribute of a relation that is the primary key of

another relation; can be composite

Page 41: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway41

Relational Algebra and SQL

• A major strength of the relational model is that it supports SQL, a flexible data retrieval language which facilitates ad hoc queries

• Relational algebra is a standard for judging a data retrieval language

Relational Algebra SQL

Restrict

ProjectProduct

Union

Difference

SELECT * FROM AWHERE condition

SELECT X FROM A

SELECT * FROM A, B

SELECT * FROM A UNIONSELECT * FROM B

SELECT * FROM AWHERE NOT EXISTS (SELECT * FROM B WHERE A.X = B.X AND A.Y=B.Y AND ...

A where condition

A times B

A union B

A minus B

A [X]

Page 42: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway42

Relational Databases• A truly relational database supports

structures (domains and relations)

integrity rules

a manipulation language

• The word “relational” is sometimes used too freely; many commercial systems are not fully relational because they do not support domains and integrity rules

• E. F. Codd has set forward 12 rules that a database must satisfy before it can be said to be truly relational

Page 43: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway43

Codd’s 12 Rules• Information representation

All data, including metadata (data definition, constraints, user names, etc.), is represented solely and explicitly as values in a table

• Guaranteed access Every value in a database is accessed by specifying a

combination of table name, column name, and value of the primary key of the row in which it is stored

No artificial paths, such as linked lists

• Systematic treatment of null values There must be a distinct representation for unknown data,

irrespective of data type

Null values are not equivalent to zero or the empty string

Page 44: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway44

Codd’s 12 Rules• Dynamic on-line catalog based on the relational model

database description is represented at the logical level in the same way as ordinary data

thus, only one manipulation language needs to be learned

• Comprehensive data sublanguage rule A relational system may support several languages and

various modes of terminal use

However, there must be at least one language that supports data definition, data manipulation, security and integrity constraints, and transaction processing operations

• View updating If the base tables of a view are updated, then the view itself

should be updated

Page 45: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway45

Codd’s 12 Rules• High-level insert, update, and delete

The system must support set-at-a-time operations; for example, a set of rows can be deleted by a single statement

• Physical data independence Changes to storage representation or access methods will

not affect application programs

• Logical data independence Implementation of changes to base tables will not affect

application programs; for example, if a table is restructured or split, applications should be immune to change (views are beneficial here)

Page 46: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway46

Codd’s 12 Rules• Integrity independence

Integrity constraints should be part of a database's definition rather than embedded within application programs

It must be possible to change integrity constraints without affecting any existing application programs

• Distribution independence Introduction of a distributed DBMS or redistributing existing

distributed data should have no impact on existing applications

• Nonsubversion It must not be possible to use low-level record-at-a-time

interface to by-pass high-level set-at-a-time security or integrity constraints

Page 47: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway47

Codd’s Rule 0• A relational DBMS must be able to manage

databases entirely through its relational capacities

• A DBMS is either totally relational or it is not relational

Page 48: © Michael Lang, National University of Ireland, Galway 1 Introduction to Database Systems

© Michael Lang, National University of Ireland, Galway48

Tasks in the seminars 1 - 3

• In order to do MMDB you need to be able to use SQL – self study sessions with tutor support