introduction to database

183
INTRODUCTION TO DATABASE 1 A Presentation on Prepared by: Jyoti Giri Assistant Professor GDRCST, Bhilai

Upload: csgdrcst

Post on 21-Mar-2017

52 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: INTRODUCTION TO DATABASE

1

INTRODUCTION TO DATABASE

A Presentation on

Prepared by:Jyoti GiriAssistant ProfessorGDRCST, Bhilai

Page 2: INTRODUCTION TO DATABASE

2

What is data? A collection of facts from which conclusion may be

drawn such as “statistical data”. Data is the plural form of datum. It is a representation of facts or concepts in an

organized manner in order that it may be stored, communicated, interpreted or processed by automated means.

Example: researchers who conduct market research survey might ask member of the public to complete questionnaires about a product or a service. These completed questionnaires are data; they are processed and analyze in order to prepare a report on the survey.

Page 3: INTRODUCTION TO DATABASE

3

Properties of Data (In database)

Data should be well organized. Data should be related. Data should be accessible in any order. One data should be stored minimum

number of times.

Page 4: INTRODUCTION TO DATABASE

4

What is a Database? Database is a collection of related data,

that contains information relevant to an enterprise.

For example:1. University database2. Employee database3. Student database4. Airlines database

etc…..

Page 5: INTRODUCTION TO DATABASE

5

PROPERTIES OF A DATABASE

A database represents some aspect of the real world, sometimes called the miniworld or the universe of discourse (UoD).

A database is a logically coherent collection of data with some inherent meaning.

A database is designed, built and populated with data for a specific purpose.

Page 6: INTRODUCTION TO DATABASE

6

What is Database Management System (DBMS)?

A database management system (DBMS) is a collection of programs that enables users to create & maintain a database. It facilitates the definition, creation and manipulation of the database.

Definition – it holds only structure of database, not the data. It involves specifying the data types, structures & constraints for the data to be stored in the database.

Creation –it is the inputting of actual data in the database. It involves storing the data itself on some storage medium that is controlled by the DBMS.

Manipulation-it includes functions such as updation, insertion, deletion, retrieval of specific data and generating reports from the data.

Page 7: INTRODUCTION TO DATABASE

7

Typical DBMS Functionality Define a database : in terms of data

types, structures and constraints Construct or Load the Database on a

secondary storage medium Manipulating the database : querying,

generating reports, insertions, deletions and modifications to its content

Concurrent Processing and Sharing by a set of users and programs – yet, keeping all data valid and consistent

Page 8: INTRODUCTION TO DATABASE

8

Typical DBMS FunctionalityOther features:

Protection or Security measures to prevent unauthorized access

“Active” processing to take internal actions on data

Presentation and Visualization of data

Page 9: INTRODUCTION TO DATABASE

9

Database System

The database and the DBMS together is called the database system.

Database systems are designed to manage large bodies of information.

It involves both defining structures for storage of information & providing mechanisms for the manipulation of information.

Database system must ensure the safety of the information stored.

Page 10: INTRODUCTION TO DATABASE

10

A simplified database system environment

Page 11: INTRODUCTION TO DATABASE

11

Database System Applications

Banking- for customer information, accounts & loans, and banking transactions.

Airlines-for reservations & schedule information. Universities-for student information, course registration and

grades. Credit card transactions-for purchases on credit cards &

generation of monthly statements. Telecommunication-for keeping records of calls made,

generating monthly bills, maintaining balances, information about communication networks.

Finance-for storing information about holdings, sales & purchases of financial instruments such as stocks & bonds.

Sales-for customer, product and purchase information. Manufacturing-for management of supply chain & for tracking

production of items in factories. Human resources-for information about employees, salaries,

payroll taxes and benefits

Page 12: INTRODUCTION TO DATABASE

12

Traditional File systems

Before the evolution of DBMS, organizations used to store information in file systems.

A typical file processing system is supported by a conventional operating system.

The system stores permanent records in various files & it need application program to extract records , or to add or delete records .

In traditional file processing, each user defines and implements the files needed for a specific application.

Page 13: INTRODUCTION TO DATABASE

13

Traditional file system For example, one user, the grade reporting office, may keep a

file on students and their grades. Programs to print a student’s transcript and to enter new grades into the file are implemented.

A second user, the accounting office, may keep track of students’ fees and their payments.

Although both users are interested in data about students, each user maintains separate files—and programs to manipulate these files—because each requires some data not available from the other user’s files.

This redundancy in defining and storing data results in wasted storage space and in redundant efforts to maintain common data up-to-date.

Page 14: INTRODUCTION TO DATABASE

14

Disadvantages of File systems1.Data Redundancy & Inconsistency 2.Difficulty in Accessing data 3.Data Isolation 4.Integrity Problems 5.Atomicity Problems 6.Concurrent access Anomalies or

Problems 7.Security Problems

Page 15: INTRODUCTION TO DATABASE

15

Data Redundancy & Inconsistency

Different programmers work on a single project , so various files are created by different programmers at some interval of time.

So various files are created in different formats & different programs are written in different programming language.

Same information is repeated. For example: name & address may appear in saving account file as

well as in salary account. This redundancy results in higher storage space & access cost. It also leads to data inconsistency which means that if we change

some record in one place the change will not be reflected in all the places.

For ex. a changed customer address may be reflected in saving record but not any where else.

Page 16: INTRODUCTION TO DATABASE

16 Accessing data from a list is also a difficulty in file

system. Suppose we want to see the records of all

customers who has a balance less than Rs10,000, we can either check the list & find the names manually or write an application program.

If we write an application program & at some later time, we need to see the records of customer who have a balance of less than Rs 20,000, then again a new program has to be written.

It means that file processing system do not allow data to be accessed in a convenient manner.

Difficulty in Accessing data

Page 17: INTRODUCTION TO DATABASE

17

As the data is stored in various files, & various files may be stored in different format, writing application program to retrieve the data is difficult.

Data Isolation

Page 18: INTRODUCTION TO DATABASE

18

Integrity Problems

We need that data stored should satisfy certain constraints as in a bank a minimum deposit should be of 1000 Rs.

Developers enforce these constraints by writing appropriate programs but if later on some new constraint has to be added then it is difficult to change the programs to enforce them.

Page 19: INTRODUCTION TO DATABASE

19 Any mechanical or electrical device is subject to failure,

and so is the computer system. In this case we have to ensure that data should be

restored to a consistent state. For example an amount of Rs 50 has to be transferred

from Account A to Account B. Let the amount has been debited from account A but

have not been credited to Account B and in the mean time, some failure occurred.

So, it will lead to an inconsistent state. So, we have to adopt a mechanism which ensures that

either full transaction should be executed or no transaction should be executed i.e. the fund transfer should be atomic.

Atomicity Problems

Page 20: INTRODUCTION TO DATABASE

20 Many systems allows multiple users to

update the data simultaneously. It can also lead the data in an inconsistent

state. Suppose a bank account contains a balance

of Rs 500 & two customers want to withdraw Rs100 & Rs 50 simultaneously.

Both the transaction reads the old balance & withdraw from that old balance which will result in Rs 450 , Rs 400 which is incorrect.

Concurrent access Problems

Page 21: INTRODUCTION TO DATABASE

21 All the user of database should not be

able to access all the data. For example a payroll Personnel needs

to access only that part of data which has information about various employees & are not needed to access information about customer accounts.

Security Problems

Page 22: INTRODUCTION TO DATABASE

22

Advantages of DBMS Controlling Redundancy Restricting Unauthorized Access Providing Storage Structures for Efficient Query

Processing Providing Backup and Recovery Providing Multiple User Interfaces Representing Complex Relationship among

Data Enforcing Integrity Constraints Permitting Inferencing and Actions using Rules

Page 23: INTRODUCTION TO DATABASE

23

Disadvantages of DBMS Cost of Hardware & Software  Cost of Data Conversion Cost of Staff Training Appointing Technical Staff Database Damage

Page 24: INTRODUCTION TO DATABASE

24

Users may be divided into Those who actually use and control the

content (called “Actors on the Scene”) those who enable the database to be

developed and the DBMS software to be designed and implemented (called “Workers Behind the Scene”).

Page 25: INTRODUCTION TO DATABASE

25

Actors on the scene

Database administrators Database Designers End-users

Page 26: INTRODUCTION TO DATABASE

26

Database administrators (DBA)

Database administrators is the controller of the overall operations of the database.

But he is not responsible for creating the database or the structure of the database.

Database administrators is the most powerful actor on the scene.

 

Page 27: INTRODUCTION TO DATABASE

27

Functions of DBA

Authorizing access to the database Coordinating & monitoring the database For acquiring hardware & software

resources as needed by the user Concurrency control checking Security of the database Making backups & recovery Modification of the database structure &

its relation to the physical database

Page 28: INTRODUCTION TO DATABASE

28

Database Designers (DBD)

Database Designers is the person who designs the database structure for the first time pre-requisites i.e. to collect data from which source is decided by DBD.

Page 29: INTRODUCTION TO DATABASE

29

Functions of DBD the creation of original description of the

database structure database designers interact with

different group of users & integrate their views to make the best structure.

Page 30: INTRODUCTION TO DATABASE

30

End-users

They use the data for queries, reports and some of them actually update the database content.

Types of end-users Casual Naïve or Parametric Sophisticated Specialized or Stand-alone

Page 31: INTRODUCTION TO DATABASE

31

Casual: they can only browse through the database; they cannot create, update or make any changes in the database.

Naïve or Parametric: they use the readymade software which deals with the database. They can only update the database. Examples are bank-tellers or reservation clerks who do this activity for an entire shift of operations.

Page 32: INTRODUCTION TO DATABASE

32

Sophisticated: these include business analysts, scientists, engineers, others thoroughly familiar with the system capabilities. Many use tools in the form of software packages that work closely with the stored database.

Stand-alone: mostly maintain personal databases using ready-to-use packaged applications. An example is a tax program user that creates his or her own internal database.

Page 33: INTRODUCTION TO DATABASE

33

Workers behind the scene

DBMS system implementers: they are the creators of the DBMS.

Tools Developers: tools are the facilities provided to help the DBMS or the user. They are packages for database design, performance monitoring, graphical interfaces, and simulation package. Tool developers, develop the tools for DBMS.

  Operators & maintenance personnel: these are the

workers/persons required for maintaining the hardware or software of the DBMS.

 

Page 34: INTRODUCTION TO DATABASE

34

DATA MODEL 

A data model is a collection of concepts that can be used to describe the structure of a database.

By structure of a database we mean the Data types, Relationships, Constraints that should hold on the data.

Page 35: INTRODUCTION TO DATABASE

35

Categories of data models Conceptual (high-level, semantic) data models Physical (low-level, internal) data models Implementation (representational, record based)

data models

Page 36: INTRODUCTION TO DATABASE

36

Conceptual data models

Page 37: INTRODUCTION TO DATABASE

37

Conceptual data models  Before implementation, a rough model of database

is created. This model is never implemented but is used for

designing purpose. Also called entity-based or object-based data

models.

Example: E-R Model   

Page 38: INTRODUCTION TO DATABASE

38

E-R model Stands for entity-relationship model. Terms used in E-R model:

Field – AttributeRecord – EntityFile – Entity Type

Page 39: INTRODUCTION TO DATABASE

39

E-R ModelEntity – It is an object with a physical existence.Ex: An object with a physical existence – a

person, a car, a house or it may be an object with conceptual existence – a company, a job or a university.

Attribute – Attributes are the particular properties that describe an entity.

Ex: A STUDENT entity may be described by student’s name, age, address, class, grade.

Page 40: INTRODUCTION TO DATABASE

40

EXAMPLE

QuickTime™ and a decompressor

are needed to see this picture.QuickTime™ and a

decompressorare needed to see th is p icture.

QuickTime™ and a decompressor

are needed to see this picture.QuickTime™ and a

decompressorare needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

Page 41: INTRODUCTION TO DATABASE

41

Physical data models

Page 42: INTRODUCTION TO DATABASE

42

Physical data models It provides concepts that describe the

details of how data is stored in the computer.

Concepts provided by physical data models are generally meant for computer specialists, not for typical end users.

Page 43: INTRODUCTION TO DATABASE

43

Implementation data models

Page 44: INTRODUCTION TO DATABASE

44

Implementation data models

Provide concepts that fall between the above two.

It also provides concepts that may be understood by end users but that are not too far removed from the way data is organized within the computer.

Example: relational model, network model, hierarchical model.

Page 45: INTRODUCTION TO DATABASE

45

Relational Model

The relational model uses a collection of tables to represent both data and the relationships among those data.

Cust_id Cust_Name Cust_Add Cust_City1000 Ajay Kohka Bhilai1001 Vishal Shanti

NagarNagpur

Acc_No BalA-101 5000A-102 10000

Cust_id Acc_No1000 A-1011001 A-102

Customer Table Account Table

Depositer Table

Page 46: INTRODUCTION TO DATABASE

46

Hierarchical Model Hierarchical data model uses tree structures to represent

relationship among records. Trees structures occur naturally in many data

organizations because some entities have an intrinsic hierarchical order .

Institute ->Programs->courses->Students

Page 47: INTRODUCTION TO DATABASE

47

Network Model

This model uses two different data structures to represent the database entities and relationships between the entities, namely record type & Set type

A record type is used to represent an entity type . It is made up of a number of data items that represent the attributes of the entity.

A set type is used to represent a directed relationship between two record types called owner record type & member record type.

Page 48: INTRODUCTION TO DATABASE

48

Record Type (Department & Employee)Set Type (Dept - Emp) with department as

the owner record type & employee as the member record type.

Example

Page 49: INTRODUCTION TO DATABASE

49

SCHEMAS AND INSTANCES

The description of a database is called the database schema, which is specified during database design and is not expected to change frequently.

The collection of information stored in the database at a particular moment is called an instance of the database. It changes very frequently than the schema.

Page 50: INTRODUCTION TO DATABASE

50

Schema and Instance 

Student(studno,name,address)Course(courseno,lecturer)

Student(123,Bloggs,Woolton) (321,Jones,Owens)

SCHEMA

INSTANCE

Page 51: INTRODUCTION TO DATABASE

51

View of Data

A major purpose of a database system is to provide users with an abstract view of the data.

That is, the system hides certain details of how the data are stored and maintained.

So the method of hiding the actual (complex) details from users is called as the levels of data abstraction.

 

Page 52: INTRODUCTION TO DATABASE

52

Levels of data abstraction

Page 53: INTRODUCTION TO DATABASE

53

Physical level

It is the lowest level of abstraction & specifies how the data is actually stored.

Example: A banking enterprise may have several such record types, including Customer, with customer-id, customer-name, customer-street,

customer-city account, with fields account-number and balance employee, with fields employee-name and salary At the physical level, a customer, account, or employee record can be

described as a block of consecutive storage locations (for example, words or bytes). The language compiler hides this level of detail from programmers. Similarly, the database system hides many of the lowest-level storage details from database programmers.

Page 54: INTRODUCTION TO DATABASE

54

Logical level

It is the next level of abstraction & describes what data are stored in database & what relationship exists between various data.

Example : At the logical level, each such record is

described by a type definition, and the interrelationship of these record types is defined as well.

Programmers using a programming language work at this level of abstraction.

Page 55: INTRODUCTION TO DATABASE

55• This level contains the actual data which is shown to

the users. • This is the highest level of abstraction & the user of this

level need not know the actual details of data storage. Example:

At the view level, several views of the database are defined, and database users see these views. For example, tellers in a bank see only that part of the database that has information on customer accounts; they cannot access information about salaries of employees.

View level

Page 56: INTRODUCTION TO DATABASE

56

ANSI-SPARC 3-level DBMS Architecture 

Page 57: INTRODUCTION TO DATABASE

57

Three-schema architecture The three-schema architecture is a

convenient tool for the user to visualize the schema levels in a database system.

In this architecture, schemas can be defined at the following three levels: Internal schema/Physical schema Conceptual schema External schema

Page 58: INTRODUCTION TO DATABASE

58   The internal level has an internal schema, which describes the

physical storage structure of the database.

The conceptual level has a conceptual schema, which describes the structure of the whole database for a community of users. The conceptual schema hides the details of physical storage structures and concentrates on describing entities, data types, relationships, user operations, and constraints.

The external or view level includes a number of external schemas or user views.

The processes of transforming requests and results between levels are called mappings.

 

Page 59: INTRODUCTION TO DATABASE

59

Example: university database 

Conceptual schema: Student (sid: string, name: string, age: number, percent: real) Courses (cid: string, cname: string, credits: number) Enrolled (sid: string, cid: string, grade: string) Physical schema: Relations stored as unordered files. Index on first column of students

External schema: Course_info(cid: string, enrollment: integer) 

Page 60: INTRODUCTION TO DATABASE

60

DATA INDEPENDENCE 

The changes can be made in one level without affecting the other levels that is called data independence.

  Data independence is the capacity to

change the schema at one level of a database system without having to change the schema at the next higher level.

 

Page 61: INTRODUCTION TO DATABASE

61

Types of data independence Logical data independence is the

capacity to change the conceptual schema without having to change external schemas or application programs.

  Physical data independence is the

capacity to change the internal schema without having to change the conceptual (or external) schemas.

Page 62: INTRODUCTION TO DATABASE

62

DBMS Structure

Telecomm System

Compiled User Interface

Compiled Application

Prog.

Batch User

Naive User Casual User

DBA

Telecomm System

Telecomm System

Query Processor

DBMS & its Data Manager

OS or Own File Manager

OS Disk Manager

Data Files & Data Dictionary

DDL Compiler

Page 63: INTRODUCTION TO DATABASE

63 DDL Compiler (Data definition Language Compiler): the DDL

compiler converts the data definition statements into a set of tables. These tables contain the metadata concerning the database & are in a form that can be used by other components of the DBMS.

Data Manager: the data manager is the central software component of the DBMS. It is sometimes referred to as the database control system.

Functions: Converts operations in the user’s queries coming directly via the query

processor or indirectly via an application program from the user’s logical view to a physical file system.

Responsible for interfacing with the file system. Tasks of enforcing constraints to maintain the consistency, integrity &

security of the data. Synchronizing the simultaneous operations performed by concurrent

users. Entrusting backups & recovery operations.

Page 64: INTRODUCTION TO DATABASE

64

File Manager: Responsibility for the structure of the files & managing the

file space rests with the file manager. Responsible for locating the block containing the required

record, requesting this block from the disk manager & transmitting the required record to the data manager.

 Disk Manager:  The disk manager is part of the operating system & all

physical input & output operations are performed by it. Transfers the block or page requested by the file manager.

Page 65: INTRODUCTION TO DATABASE

65Query Processor: The query processor is used to interpret the online user’s

query & convert it into an efficient series of operations in a form capable of being sent to the data manger for execution.

Functions: The data manipulation statements are compiled separately into a sequence of

optimized operations on the database. Transfers data to & form a work-area indicated in a subroutine call & control

returns to the applications programs. During execution, when a subroutine call inserted in place of the data

manipulation statements, control transfers to the run-time system. This system in turn transfers control to the compiled version of the original data manipulation statements. These data manipulation are executed by the data manager.

A user action that requires a database operation causes the application program to request the service via its run time system & data manager.

Batch users of the database also interact with the database via their application program, its run-time system & data manager.

Page 66: INTRODUCTION TO DATABASE

66

Telecommunication System: it is a software system used to communicate

a remote or local computer by sending or receiving messages over communication lines.

  Messages from the user are routed by the

telecommunication system to the appropriate target & responses are sent back to the user.

Page 67: INTRODUCTION TO DATABASE

67

Data Files: data files contain the data portion of the database.  Data dictionary:   Information pertaining to the structure & usage of data contained

in the database, the metadata, is maintained in a data dictionary. It stores information concerning the external, conceptual &

internal levels of the database. It contains the source of each data-field value, the frequency of

its use & details concerning updates. The data dictionary itself is a database, documents the data.

 

Page 68: INTRODUCTION TO DATABASE

68

Entity- Relationship Model

The E-R model is the most commonly used conceptual model.

In this model, the real world consists of a collection of basic objects called entities and the relationships among these objects.

The end product of the modeling process is an entity-relationship diagram (ERD) or ER diagram.

This is very important conceptual data model. But it is not implemented but design for

creating the database.

Page 69: INTRODUCTION TO DATABASE

69

The E-R data model employs three basic notions:

Entity Attributes Relationship

Page 70: INTRODUCTION TO DATABASE

70

Entity

It is an object with a physical existence. For example, each person in an

enterprise , car, house, a company, or a university course.

Page 71: INTRODUCTION TO DATABASE

71

Entity Type & Entity Sets Entity Type –

collection of entities that have the same attributes. Ex: STUDENT, UNIVERSITY

Entity Set – The collection of all entities of a particular entity type.

Ex: Set of all rows 10 rows of STUDENT

Name

Age Rollno

STUDENT

Page 72: INTRODUCTION TO DATABASE

72

Graphical representation of entity sets

Page 73: INTRODUCTION TO DATABASE

73

Attributes Attributes are the particular properties

that describe an entity. Ex: A STUDENT entity may be described

by student’s name, student’s roll_number.

Page 74: INTRODUCTION TO DATABASE

74

Graphical representation of attributes

Page 75: INTRODUCTION TO DATABASE

75

Types of Attributes

Simple (Atomic) and Composite Attributes

Single Valued & Multi-valued Attributes

Stored and Derived Attributes Null Valued Attributes Complex Attributes

Page 76: INTRODUCTION TO DATABASE

76

Simple (Atomic) and Composite Attributes

Simple attributes are not divisible into parts. For example, EmployeeNumber and Age.

Composite attributes can be divided into smaller subparts. These subparts represent basic attributes with independent meanings of their own. For example, take Name and address attributes.

Page 77: INTRODUCTION TO DATABASE

77

Address

Street Address city state Pin

numberstreet apartment no.

Page 78: INTRODUCTION TO DATABASE

78

Single Valued & Multi-valued Attributes

Single-valued attributes have a single value for particular entity. Example: Roll_no, Age.

Multi-valued attributes may have more than one value for a single entity. Example: Phone_no

Page 79: INTRODUCTION TO DATABASE

79

Stored and Derived Attributes

Derived attribute is not stored in the database but it is derived from some attributes.

Example: If DOB is stored in the database then we can calculate age of a student by subtracting DOB from current date.

Hence, in this case DOB is the stored attribute and age is considered as derived.

Page 80: INTRODUCTION TO DATABASE

80

Null Valued Attributes

Null value is a value which is not inserted but it does not hold zero value.

The attributes which can have a null value called null valued attributes.

Example: Mobile_no attributes of a person may not be having mobile phones.

Page 81: INTRODUCTION TO DATABASE

81

Complex Attributes

Complex attribute is a combination of composite and multi-valued attributes. Complex attributes are represented by { } and composite attributes are represented by ( ).

Example: Address_phone attribute will hold both the address and phone_no of any person.

Example: {(2-A, St-5, Sec-4, Bhilai), 2398124}

Page 82: INTRODUCTION TO DATABASE

82

Key attribute in an entity type

Key attributes will be having a unique value for each entity of that attribute.

It identifies every entity in the entity set. Key attribute will never be a null valued

attribute. Any composite attribute can also be a key

attribute. There could be more than one key attributes

for an entity type.Example: roll_no, enrollment _no

Page 83: INTRODUCTION TO DATABASE

83

Domain of value set of an attribute

Domain of an attribute is the allowed set of values of that attribute.

Example: if attribute is ‘grade’, then its allowed values are A,B,C,F.

Grade ={A, B,C,F}

Page 84: INTRODUCTION TO DATABASE

84

TYPES OF ENTITY TYPESStrong entity type – Entity types that have at least one key attribute.

Weak entity type – Entity type that does not have any key attribute.An entity in a weak entity type is identified by a relationship with a strong entity type and that relationship is called Identifying Relationship and that strong entity type is called the owner of the weak entity type.

Page 85: INTRODUCTION TO DATABASE

85

TYPES OF ENTITY TYPESRoll No. Name Age

1 Rakesh 202 Nikhil 213 Nikhil 21

Name M1 M2 M3Nikhil 50 45 40Nikhil 80 75 82

Student

MarksSecured

Identifying Relationship

Page 86: INTRODUCTION TO DATABASE

86

Relationship Relates two or more distinct entities with a specific meaning. For example, EMPLOYEE John works on the ProductX

PROJECT or

EMPLOYEE Franklin manages the Research DEPARTMENT.

Terms used: Relationship type, Relationship set, Relationship instances.

Page 87: INTRODUCTION TO DATABASE

87

BACK

Page 88: INTRODUCTION TO DATABASE

88

Relationship type: securedRelationship set: {R1, R2, R3, R4}Relationship instances: R1

Page 89: INTRODUCTION TO DATABASE

89

Graphical Representation of Relationship Sets 

Page 90: INTRODUCTION TO DATABASE

90

NOTATIONS USED IN E-R DIAGRAMEntity Type

Attribute

Key Attribute

Weak Entity Type

Page 91: INTRODUCTION TO DATABASE

91

NOTATIONS USED IN E-R DIAGRAM

Composite Attribute

Derived Attribute

Multivalued Attribute

Page 92: INTRODUCTION TO DATABASE

92

NOTATIONS USED IN E-R DIAGRAM

Identifying Relationship

Relationship Type

Page 93: INTRODUCTION TO DATABASE

93

Constraints

Relationship types usually have certain constraints. Two main types of relationship constraints:

Mapping cardinalities Participation constraints 

Page 94: INTRODUCTION TO DATABASE

94

Mapping cardinalities, or cardinality ratios

Specifies the number of relationship instances that an entity can participate in.

For example, in the WORKS_FOR relationship type.

Page 95: INTRODUCTION TO DATABASE

95

Mapping Cardinalities

One-to-one (1:1) One-to-many (1: N) Many-to-one (N: 1) Many-to-many (M: N)

Page 96: INTRODUCTION TO DATABASE

96

(a) One-to-one (b) One-to-many

Page 97: INTRODUCTION TO DATABASE

97

(a) Many-to-one (b) Many-to-many

Page 98: INTRODUCTION TO DATABASE

98

Example of E-R Diagrams

Rectangles represent entity types. Diamonds represent relationship types. Lines link attributes to entity types and entity types to

relationship types. Ellipses represent attributes Underline indicates primary key attributes (will study later)

Page 99: INTRODUCTION TO DATABASE

99

E-R Diagram With Composite, Multivalued, and Derived Attributes

Page 100: INTRODUCTION TO DATABASE

100

Relationship Types with Attributes

we have the access_date attribute attached to the relationship set depositor to specify the most recent date on which a customer accessed that account.

Page 101: INTRODUCTION TO DATABASE

101

Cardinality ratio

We express cardinality ratio by drawing directed line (→), signifying “one,” or an undirected line (—), signifying “many,”

Page 102: INTRODUCTION TO DATABASE

102

One-To-One Relationship

Page 103: INTRODUCTION TO DATABASE

103

One-To-Many Relationship

In the one-to-many relationship a customer is associated with several loans via borrower

Page 104: INTRODUCTION TO DATABASE

104

Many-To-One Relationships

In a many-to-one relationship a loan is associated with several customers via borrower.

Page 105: INTRODUCTION TO DATABASE

105

Many-To-Many Relationship

Page 106: INTRODUCTION TO DATABASE

106

Find out the Cardinality ratio

Prime minister-country classroom –students students –classroom customer -loan

Page 107: INTRODUCTION TO DATABASE

Participation constraints107

Total participation : every entity in the entity type participates in at least one relationship in the relationship type E.g. participation of loan in borrower is total

every loan must have a customer associated to it via borrower

Partial participation: some entities may not participate in any relationship in the relationship type Example: participation of customer in borrower is partial

some customers may not participate in any loan

Page 108: INTRODUCTION TO DATABASE

108

KEYS

Key is used to identify every entity in the entity set.

Page 109: INTRODUCTION TO DATABASE

109

Types of keys Candidate Key Alternate & Primary key Superkey

Page 110: INTRODUCTION TO DATABASE

110

Candidate Key

It is the minimal set of attributes that uniquely identifies any entity in entity set.

There can be more than one candidate keys in entity set. More than one attribute can together form a single candidate key. Suppose that a combination of customer-name and customer-street is

sufficient to distinguish among members of the customer entity set. Then, both {customer-id} and {customer-name, customer-street} are

candidate keys. Although the attributes customer-id and customer-name together can

distinguish customer entities, their combination does not form a candidate key, since the attribute customer-id alone is a candidate key.

 

Page 111: INTRODUCTION TO DATABASE

111

Alternate & Primary key Alternate & Primary key is related with candidate

key. In entity set, primary key is a candidate key but

only one key is the primary key & the left candidate keys are called alternate key.

AK=CK-PK

Page 112: INTRODUCTION TO DATABASE

112

Superkey A superkey is the superset of any candidate key. For example, the customer-id attribute of the entity

set customer is sufficient to distinguish one customer entity from another.

Thus, customer-id is a superkey. Similarly, the combination of customer-name and

customer-id is a superkey for the entity set customer. The customer-name attribute of customer is not a

superkey, because several people might have the same name. 

Example: {customer-id}, {customer-name, customer-id}

Page 113: INTRODUCTION TO DATABASE

Weak Entity Types

113

An entity type that does not have a primary key is referred to as a weak entity type.

Page 114: INTRODUCTION TO DATABASE

Weak Entity types (Cont.)114

We depict a weak entity type by double rectangles. We underline the partial key of a weak entity type with a

dashed line. payment_number – partial key of the payment entity type Primary key for payment – (loan_number, payment_number)

Page 115: INTRODUCTION TO DATABASE

115

Give me answer? Can we convert weak entity type into

strong entity type?

Page 116: INTRODUCTION TO DATABASE

116

PROBLEMS ON E-R DIAGRAM

Question: An employee works in one department. The department contains phone, the employee also has phone. Assume that an employee works in maximum 2 departments or minimum one department. Each department must have maximum 3 phones or minimum zero phone. Design an E-R diagram for the above.

Page 117: INTRODUCTION TO DATABASE

117

Page 118: INTRODUCTION TO DATABASE

118

Steps in ER Modeling

Identify the Entities Find relationships Identify the key attributes for every

Entity Identify other relevant attributes Draw complete E-R diagram with all

attributes including Primary Key

Page 119: INTRODUCTION TO DATABASE

119

EER (Enhanced Entity-Relationship )

The EER model is a high-level or conceptual data model incorporating extensions to the original Entity-relationship (ER) model.

EER includes all the concepts of ER model. EER=ER all the concepts + some extension Additionally it includes the concepts of

superclass and subclass specialization and generalization.

Page 120: INTRODUCTION TO DATABASE

120

Subclasses and Superclasses

An entity type may have additional meaningful subgroupings. Example: EMPLOYEE may be further grouped into

SECRETARY, ENGINEER,MANAGER, TECHNICIAN,SALARIED_EMPLOYEE,HOURLY_EMPLOYEE,…

Each is called a subclass of EMPLOYEE EMPLOYEE is the superclass for each of these

subclasses.

Page 121: INTRODUCTION TO DATABASE

121

Specialization Specialization is the process of defining a set of

subclasses of a superclass. The set of subclasses is based upon some

characteristics of the entities in the superclass.• Attributes of a subclass are called specific attributes. It follows top-down design process. Represented by a triangle component labeled ISA

(E.g. customer “is a” person).

Page 122: INTRODUCTION TO DATABASE

122

Example of Specialization Consider an entity set person, with attributes name, street, and

city. A person may be further classified as one of the following: customer employee

Each of these person types is described by a set of attributes that includes all the attributes of entity set person plus possibly additional attributes.

For example, customer entities may be described further by the attribute customer-id, whereas employee entities may be described further by the attributes employee-id and salary.

The specialization of person allows us to distinguish among persons according to whether they are employees or customers.

Page 123: INTRODUCTION TO DATABASE

123

Generalization It is a bottom-up design process. Generalization is a simple inversion of specialization. In this process multiple entity sets are synthesized into a

higher-level entity set on the basis of common features. For example, customer entity set with the attributes name,

street, city, and customer-id, and an employee entity set with the attributes name, street, city, employee-id, and salary.

There are similarities between the customer entity set and the employee entity set in the sense that they have several attributes in common.

This commonality can be expressed by generalization. person is the higher-level entity set and customer and

employee are lower-level entity sets.

Page 124: INTRODUCTION TO DATABASE

124

Continued………. The person entity set is the superclass of the

customer and employee subclasses. Differences in the two approaches may be

characterized by their starting point and overall goal.

Page 125: INTRODUCTION TO DATABASE

125

Specialization and generalization

Page 126: INTRODUCTION TO DATABASE

Design Constraints on a Specialization/Generalization

126

Constraint on which entities can be members of a given lower-level entity set. Condition-defined

Example: all customers over 65 years are members of senior-citizen entity set; senior-citizen ISA person.

User-defined Constraint on whether or not entities may belong to more than

one lower-level entity set within a single generalization. Disjoint

an entity can belong to only one lower-level entity set Noted in E-R diagram by writing disjoint next to the ISA

triangle Overlapping

an entity can belong to more than one lower-level entity set

Page 127: INTRODUCTION TO DATABASE

Design Constraints on a Specialization/Generalization

(Cont.)127

Completeness constraint -- specifies whether or not an entity in the higher-level entity set must belong to at least one of the lower-level entity sets within a generalization. total : an entity must belong to one of the

lower-level entity sets partial: an entity need not belong to one of

the lower-level entity sets

Page 128: INTRODUCTION TO DATABASE

128

AGGREGATION

Page 129: INTRODUCTION TO DATABASE

129

A ternary relationship

Page 130: INTRODUCTION TO DATABASE

130

E-R diagram with redundant relationships

Page 131: INTRODUCTION TO DATABASE

131

Aggregation Aggregation is an abstraction through which

relationships are treated as higher level entities. Thus, for our example, we regard the relationship set

works-on (relating the entity sets employee, branch, and job) as a higher-level entity set called works-on.

Such an entity set is treated in the same manner as is any other entity set.

We can then create a binary relationship manages between works-on and manager to represent who manages what tasks.

Page 132: INTRODUCTION TO DATABASE

132

E-R Diagram With Aggregation

Page 133: INTRODUCTION TO DATABASE

133

Assignment 1. Construct an E-R diagram for a car-insurance company whose customers

own one or more cars each. Each car has associated with it zero to any number of recorded accidents.

2. A university registrar’s office maintains data about the following entities: Courses, including number, title, credits, syllabus, and prerequisites Course offerings, including course number, year, semester, section number,

instructor(s), timings, and classroom Students, including student-id, name, and program Instructors, including identification number, name, department, and title. Further, the enrollment of students in courses and grades awarded to students

in each course they are enrolled for must be appropriately modeled. Construct an E-R diagram for the registrar’s office. Document all assumptions that you make about the mapping constraints.

.  

Page 134: INTRODUCTION TO DATABASE

134

Continued…..3. Design an E-R diagram for keeping track of the

exploits of your favorite sports team. You should store the matches played, the scores in each match, the players in each match and individual player statistics for each match. Summary statistics should be modeled as derived attributes.

4. Construct an E-R diagram for a hospital with a set of patients and a set of medical doctors. Associate with each patient a log of the various tests and examinations conducted.

Page 135: INTRODUCTION TO DATABASE

135

Continued…..

5. Consider a university database for the scheduling of classrooms for final exams. This database could be modeled as the single entity set exam, with attributes course-name, section-number, room-number, and time. Alternatively, one or more additional entity sets could be defined, along with relationship sets to replace some of the attributes of the exam entity set, as

course with attributes name, department, and c-number section with attributes s-number and enrollment, and dependent as a weak entity

set on course room with attributes r-number, capacity, and building (a) Show an E-R diagram illustrating the use of all three additional entity sets listed. (b) Explain what application characteristics would influence a decision to include or

not to include each of the additional entity sets.

  6. Construct an E-R diagram for a Bank.

Page 136: INTRODUCTION TO DATABASE

136

Storage-device hierarchy

Page 137: INTRODUCTION TO DATABASE

137

Storage hierarchy includes two main categories: 

Primary storage (main memory, cache memory)

Secondary storage (Magnetic disks, Magnetic tapes and optical disks)

Page 138: INTRODUCTION TO DATABASE

138

Buffer Manager

Files reside permanently on disks. Each file is partitioned into fixed-length

storage units called blocks. The buffer is the part of main memory

available for storage of copies of disk blocks.

The subsystem responsible for the allocation of buffer space is called the buffer manager.

Page 139: INTRODUCTION TO DATABASE

139

Buffer Manager techniques

• Buffer replacement strategy: When there is no room left in the buffer, a block must be removed from the buffer. Most operating systems use a least recently used (LRU) scheme.

• Pinned blocks: Most recovery systems require that a block should not be written to disk while an update on the block is in progress. A block that is not allowed to be written back to disk is said to be pinned.

• Forced output of blocks: There are situations in which it is necessary to write back the block to disk, even though the buffer space that it occupies is not needed. This write is called the forced output of a block.

Page 140: INTRODUCTION TO DATABASE

140

Record Structure

The database is stored as a collection of files. Each file is a sequence of records. A record is a sequence of fields.

Types of records Fixed-Length Records: every record in the file has

exactly the same size (in bytes). Variable-Length Records: different records in the file

have different sizes.

Page 141: INTRODUCTION TO DATABASE

141

Fixed-Length Records

Let us consider a file of account records for bank database.

Each record of this file is defined as:

Account-number: char (10);Branch-name: char (22);Balance: real; //Real size=8 Record size= 10+22+8= 40 bytesA simple approach is to use the first 40 bytes for the first

record, the next 40 bytes for the second record, and so on.

Page 142: INTRODUCTION TO DATABASE

142

There are two problems with this simple approach:

 1. It is difficult to delete a record from this

structure. The space occupied by the record to be deleted must be filled with some other record of the file.

2. Unless the block size happens to be a multiple of 40, some records will cross block boundaries. It would thus require two block accesses to read or write such a record.

 

Page 143: INTRODUCTION TO DATABASE

143

Deletion of record 1st approach

When a record is deleted, we could move the record that came after it into the space occupied by the deleted record, and so on, until every record following the deleted record has been moved ahead. Such an approach requires moving a large number of records.

Page 144: INTRODUCTION TO DATABASE

144

Deletion of record 2nd approach

It might be easier simply to move the final record of the file into the space occupied by the deleted record. It is undesirable to move records to occupy the space freed by a deleted record, since doing so requires additional block accesses.

Page 145: INTRODUCTION TO DATABASE

145

Deletion of record 3rd approach Since insertions tend to be more frequent than deletions, it is acceptable to

leave open the space occupied by the deleted record, and to wait for a subsequent insertion before reusing the space.

Page 146: INTRODUCTION TO DATABASE

146

Variable-Length Records

Variable-length records arise in database systems in several ways: Storage of multiple record types in a file. Record types that allow variable lengths for

one or more fields. Record types that allow repeating fields

(used in some older data models).

Page 147: INTRODUCTION TO DATABASE

147

Techniques for implementing variable-length records 

Byte-String Representation Fixed-Length Representation

Page 148: INTRODUCTION TO DATABASE

148

Byte-String Representation

A simple method for implementing variable-length records is to attach a special end-of-record (⊥) symbol to the end of each record.

Page 149: INTRODUCTION TO DATABASE

149

Byte-string representation disadvantages:

It is not easy to reuse space occupied formerly by a deleted record.

There is no space, in general, for records to grow longer.

Page 150: INTRODUCTION TO DATABASE

150

Slotted-page structure A modified form of the byte-string representation, called the

slotted-page structure, is commonly used for organizing records within a single block.

  

Page 151: INTRODUCTION TO DATABASE

151

There is a header at the beginning of each block, containing the following information:

1. The number of record entries in the header2. The end of free space in the block3. An array whose entries contain the location and size of each record  The actual records are allocated contiguously in the block, starting

from the end of the block. The free space in the block is contiguous, between the final entry in

the header array, and the first record. If a record is inserted, space is allocated for it at the end of free space,

and an entry containing its size and location is added to the header. If a record is deleted, the space that it occupies is freed, and its entry

is set to deleted.

Page 152: INTRODUCTION TO DATABASE

152

Fixed-Length Representation

Another way to implement variable-length records efficiently in a file system is to use one or more fixed-length records to represent one variable-length record.

There are two ways of doing this: 1. Reserved space: If there is a maximum record

length that is never exceeded, we can use fixed-length records of that length. Unused space (for records shorter than the maximum space) is filled with a special null, or end-of-record, symbol.

2. List representation: We can represent variable-length records by lists of fixed length records, chained together by pointers.

Page 153: INTRODUCTION TO DATABASE

153

File organization File organization includes the way

records and blocks are placed on the storage medium.

There are two types of file organization Primary File Organizations Secondary File Organizations

Page 154: INTRODUCTION TO DATABASE

154

Primary File Organizations Unordered or Heap or Pile Files Ordered or Sorted or sequential Files Hash or Direct Files

Page 155: INTRODUCTION TO DATABASE

155

Unordered or Heap or Pile Files Records are placed in the file in the order

in which they are inserted. Inserting a new record is very efficient. Searching can be done by linear search

(inefficient). Deletion is very inefficient.

Page 156: INTRODUCTION TO DATABASE

156

Ordered or Sorted or sequential Files

It store records in sequential order, based on the value of the search key of each record.

An attribute or set of attribute used to look up records in a file is called a search key.

Page 157: INTRODUCTION TO DATABASE

157

Advantages of Ordered Files Reading of the records in order of the ordering

field is extremely efficient, because no sorting is required.

Finding the next record is fast.

Page 158: INTRODUCTION TO DATABASE

158

Disadvantages of Ordered Files Searches on non-ordering fields are inefficient. Insertion and deletion of records are very

expensive.

Page 159: INTRODUCTION TO DATABASE

159

Hash or Direct Files

Hash function computed on some attribute of each record; the result specifies where record should be placed.

Page 160: INTRODUCTION TO DATABASE

160

Secondary File Organizations Secondary file organization uses the index to access

the records. An index for a file in a database system works in the

same way as the index in any textbook. If we want to learn about a particular topic (specified

by a word or a phrase) , we can search for the topic in the index at the back of the book.

Indexes provide faster access to data.

Page 161: INTRODUCTION TO DATABASE

161

Types of Indexes• Single-level ordered indexes

• Primary indexes• Secondary indexes• Clustering indexes

• Multi-level Indexes• Dynamic Multi-level indexes using B-trees and B+-

trees

Page 162: INTRODUCTION TO DATABASE

162

Primary indexes A Primary Index is constructed of two parts: The

first field is the same data type of the primary key of a file block of the data file and the second field is file block pointer.

Page 163: INTRODUCTION TO DATABASE

163

Indexes can also be characterized as Dense: A dense index has an index entry for every

search key value (and hence every record) in the data file.

Sparse (nondense): A sparse (or nondense) index, on the other hand, has index entries for only some of the search values.

A primary index is hence a nondense (sparse) index, since it includes an entry for each disk block of the data file rather than for every search value (or every record).

Page 164: INTRODUCTION TO DATABASE

164

Problem with a primary index A major problem with a primary index—

as with any ordered file—is insertion and deletion of records.

Page 165: INTRODUCTION TO DATABASE

165

Clustering Indexes If records of a file are physically ordered on a

nonkey field—which does not have a distinct value for each record—that field is called the clustering field.

A clustering index is also an ordered file with two fields; the first field is of the same type as the clustering field of the data file, and the second field is a block pointer.

Page 166: INTRODUCTION TO DATABASE

166

Page 167: INTRODUCTION TO DATABASE

167

Secondary Indexes A Secondary Index is an ordered file with two

fields. The first is of the same data type as some

nonordering field and the second is either a block or a record pointer.

If the entries in this nonordering field must be unique this field is sometime referred to as a Secondary Key. This results in a dense index.

Page 168: INTRODUCTION TO DATABASE

168

Page 169: INTRODUCTION TO DATABASE

169

Comparison between indexes

Page 170: INTRODUCTION TO DATABASE

170

Multilevel Indexes

A Multilevel Index is where you construct an Second- Level index on a First-Level Index. Continue this process until the entire index can be contained in a Single File Block.

Page 171: INTRODUCTION TO DATABASE

171

Page 172: INTRODUCTION TO DATABASE

172

Dynamic Multilevel Indexes Using B-Trees and B+-Trees

B-trees and B+-trees are special cases of the well-known tree data structure.

A tree is formed of nodes. Each node in the tree, except for a special node called the

root, has one parent node and several—zero or more—child nodes.

The root node has no parent. A node that does not have any child nodes is called a leaf node; a nonleaf node is called an internal node.

The level of a node is always one more than the level of its parent, with the level of the root node being zero.

A subtree of a node consists of that node and all its descendant nodes—its child nodes, the child nodes of its child nodes, and so on.

Page 173: INTRODUCTION TO DATABASE

173

B tree A B-tree of order m (the maximum number of children

for each node) is a tree which satisfies the following properties:

Every node has at most m children. Every node (except root and leaves) has at least m⁄2

children. The root has at least two children if it is not a leaf node. All leaves appear in the same level, and carry

information. A non-leaf node with k children contains k–1 keys.

Page 174: INTRODUCTION TO DATABASE

Structure of B tree174

Page 175: INTRODUCTION TO DATABASE

175

B tree with order 3

Page 176: INTRODUCTION TO DATABASE

176

Insertion algorithm All insertions start at a leaf node. To insert a new element Search the tree to

find the leaf node where the new element should be added. Insert the new element into that node with the following steps:1. If the node contains fewer than the maximum legal number of elements,

then there is room for the new element. Insert the new element in the node, keeping the node's elements ordered.

2. Otherwise the node is full, so evenly split it into two nodes. A single median is chosen from among the leaf's elements and the new element. Values less than the median are put in the new left node and values greater than

the median are put in the new right node, with the median acting as a separation value.

Insert the separation value in the node's parent, which may cause it to be split, and so on. If the node has no parent (i.e., the node was the root), create a new root above this node (increasing the height of the tree).

Page 177: INTRODUCTION TO DATABASE

177

A B Tree insertion example with each iteration

Page 178: INTRODUCTION TO DATABASE

178

B+ tree Properties of a B+ Tree of order m :

All internal nodes (except root) has at least v keys and at most 2m keys .

The root has at least 2 children unless it’s a leaf..

All leaves are on the same level. An internal node with k keys has k+1

children

Page 179: INTRODUCTION TO DATABASE

Inserting a Data Entry into a B+ Tree: Summary179

Find correct leaf L. Put data entry onto L.

If L has enough space, done! Else, must split L (into L and a new node L2)

Redistribute entries evenly, put middle key in L2 copy up middle key. Insert index entry pointing to L2 into parent of L.

This can happen recursively To split index node, redistribute entries evenly,

but push up middle key. (Contrast with leaf splits.)

Splits “grow” tree; root split increases height. Tree growth: gets wider or one level taller at

top.

Page 180: INTRODUCTION TO DATABASE

Inserting 16*, 8* into Example B+ tree180 Root

17 24 3013

2* 3* 5* 7* 8*

2* 5* 7*3*

17 24 3013

8*

You overflow

One new child (leaf node) generated; must add one more pointer to its parent, thus one more key value as well.

14* 15* 16*

Page 181: INTRODUCTION TO DATABASE

Inserting 8* (cont.)

Copy up the middle value (leaf split)

181

2* 3* 5* 7* 8*

5Entry to be inserted in parent node.(Note that 5 iscontinues to appear in the leaf.)

s copied up and

13 17 24 30

You overflow! 5 13 17 24 30

Page 182: INTRODUCTION TO DATABASE

182

(Note that 17 is pushed up and onlyappears once in the index. Contrast

Entry to be inserted in parent node.

this with a leaf split.)

5 24 30

17

13

Insertion into B+ tree (cont.)

5 13 17 24 30• Understand

difference between copy-up and push-up

• Observe how minimum occupancy is guaranteed in both leaf and index pg splits.

We split this node, redistribute entries evenly, and push up middle key.

Page 183: INTRODUCTION TO DATABASE

Example B+ Tree After Inserting 8*

183

Notice that root was split, leading to increase in height.

2* 3*

Root17

24 30

14* 15* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*