rdbms

60
RDBMS CONCEPTS

Upload: kvsureshmysore

Post on 13-Nov-2014

20 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Rdbms

RDBMS CONCEPTS

Page 2: Rdbms

What is Database ?

A database is a collection of information—preferably related information and preferably organized.

-a database is a structured object.

-the structured object consists of data and metadata

-the metadata is the structured part (table definition)

-data is the actual stored descriptive information.

NAME EDUCATION % OF Marks

SMITH M Tech 98JONES B Tech 89MILLER Phd 87…. ...... ….…. ...... ….

Page 3: Rdbms

What is DBMS ?(DataBase Management System)

DBMS is collection of programs that enables you to store, modify, and extract information from a database.

The general purpose of a DBMS is to provide for the definition, storage, and management of data in a centralized area that can be shared by many users.

Page 4: Rdbms

Role of DBMS

•data accessibility :It must ensure that each piece of data is available to all those users who need it.

•data security : It must protect the data and ensure that it is not lost or damaged.

•data privacy or data confidentiality : It must protect the data from access and change by unauthorized users.

•data integrity : It must also guarantee that the data is consistent. This means that we must be confident that the data is accurate and reliable and we must be able to insist on certain values for the data

For example : - every employee must have a name (not null) - every row in a table must be unique- anyone who is a manager must be a employee (FK)- a student's date of birth must be acceptable (30th, February 1973 would be invalid).

Page 5: Rdbms

•data concurrency :on a system where several users may be handling the same data at the same time, the DBMS must also ensure that there is no confusion when two or more users try to update the same record at the same time.

data independence : For Programmers who are writing programs for use on a DBMS, DBMS should make sure that

•Individual users and programmers should not have to worry about how the data is held

•Nor should they need to know exactly where the data is held

•Nor should they be concerned with pieces of data which are not used by their particular program

When these last three features are available, the DBMS is said to offer data independence or data transparency

distributed database : it would be useful if DBMS support networking for accessing database which may be stored across the network

Page 6: Rdbms

Data Model

•A data model is a model that describes in an abstract way how data is represented in an information system or a database management system (DBMS).

The evolution of database modeling techniques

Page 7: Rdbms

Relational Database Model

The relational database model improves on the restriction of a hierarchical structure, not completely abandoning the hierarchy of data, as shown in Figure.

Any table can be accessed directly without having to access all parent objects.

The trick is to know what to look for—if you want to find theaddress of a specific employee, you have to know which employee to look for, or you can simply examine all employees.

You don’t have to search the entire hierarchy, from the company downward, to find a single employee.

Page 8: Rdbms

Another benefit of the relational database model is that any tables can be linked together, regardless of their hierarchical position.

Page 9: Rdbms

Relational Database Management System – RDBMS

A RDBMS is a term used to describe an entire suite of programs for both managing a relational database and communicating with that relational database engine.

Sometimes Software Development Kit (SDK) front-end tools and complete management kits are included with relational database packages (eg: MS Access)

In other words, an RDBMS is both the database engine and any other tools that come with it.

Page 10: Rdbms

Attribute/column/fieldRelation/Table

Tuples/Record/Row

Domain

Page 11: Rdbms

Relational Databases

A relational database is organized as a

-set of tables of data and a

-mathematical language (relational algebra) – is available to manipulate the tables.

The presentation of data in the form of tables is known as a relation

A relation is made up of a number of tuples (rows) and attributes (columns).

Domains are range of values which can appear in each column are taken from the domain for that attribute.

Example :The AGE will be in the range 16 to 65, so the domain of the AGE attribute will be an integer in the range 16 to 65

Page 12: Rdbms

Relational algebra

In addition to establishing and maintaining the individual tuples, there are also facilities for manipulating entire relations.

These operations form what is known as relational algebra.

Typically, the operations of relational algebra allow us to:

•SELECT – extract tuples to form a new relation

•PROJECT – extract attributes to form a new relation

•JOIN – add attributes from one relation to another relation

•PRODUCT – combine all the tuples of one relation will all those of another relation

Page 13: Rdbms

•UNION – join two relations to form a new relation

•INTERSECT – form a new relation from two relations where the key values are the same

The formal language of relational algebra is mathematical in form and differs from that of the commercial RDBMSs.

Page 14: Rdbms

Database Model DesignDesign is the process of ensuring that it all works without actually building it. Design is a little like testing something on paper before spending thousands of hours building it in possibly the wrong way.

Data structure diagram

A useful way in which we can depict the entities and the relationships between them is by means of a data structure diagram or DSD.

It is a picture of the data encountered within a computer system.

The data structure diagram is also known as a data model or a logical model or an entity-relationship model.

Page 15: Rdbms

TUTOR

COURSE

STUDENT

teaches on

attended by

A data structure diagram

Entities

This DSD tells us that we need to hold data about:

•The tutors

•The courses, and

•The students

The information which we need to know about each entity are its attributes

Page 16: Rdbms

TUTOR

COURSE

STUDENT

teaches on

attended by

Tutor number Name Address Subjects taught

Title Examinations available

Student enrolment number Name Address Telephone number Subjects studied

A data structure diagram showing attributes

Page 17: Rdbms

Relationships

An entity does not exist in isolation, but is associated with other entities by means of a relationship.

On a DSD the entities are linked, one to another, by lines representing the relationships.

The relationships in our DSD are:

•teaches on - One tutor teaches on many courses.

•is attended by - Each course is attended by many students.

Note that the arrowed lines point from the one end to the many end: from the one tutor to the many courses, and from the one course to the many students.

Page 18: Rdbms

Types of relationship

1. One-to-one relationship

2. One-to-Many relationship

3. Many-to-Many relationship

Page 19: Rdbms

One-to-One Relationship

Page 20: Rdbms

Better transformation of one-to-one relationship

Page 21: Rdbms

One to Many Relationship

Page 22: Rdbms

Correct transformation of one-to-many relationship

Page 23: Rdbms

Many to Many ?

Page 24: Rdbms

Many to Many ….?

Page 25: Rdbms

Transformation of many-to-many relationship TWO One – Many Relationship

Page 26: Rdbms

Navigating through a data structure diagram

In any data structure diagram, it should be possible conceptually to navigate from entity to entity by way of the relationships.

By doing this, we shall be creating paths by which the DBMS – or our programs – can access the data and use these paths to answer enquiries about the data.

Page 27: Rdbms
Page 28: Rdbms
Page 29: Rdbms
Page 30: Rdbms

cno cname street city

C1 IGATE GUINDY CHENNAI

C2 IBM B’GATTA Rd BLORE

C3 ACCEN TIDEL PARK CHENNAI

COMPANY

ord_no cno ord_date ship_date

101 C1 12-FEB-87 15-FEB-87

102 C3 25-JUN-90 02-JUL-90

103 C1 29-AUG-92 05-SEP-92

ORDER

ord_no lineno prodid qty sellprice

101 1 P1 2 350

101 2 P2 3 400

101 3 P5 1 1000

102 1 P2 3 400

103 1 P3 1 4900

103 2 P6 1 5400

LINEITEM

prodid prodname sellprice costprice

P1 Mouse 350 300

P2 Keyboard 400 360

P3 CPU 4900 4200

P4 64MB RAM 550 520

P5 128MB RAM 1000 930

P6 Mother Board 5400 4900

PRODUCT

Page 31: Rdbms

Data Modeling Continued…

When talking about data in a database, it is common to look at the data in three different ways:

•External or user views: the view of the data that each person has for their functional area.

•Logical Data Model or Conceptual Model: the entire collection of user views for each functional area. In other words, it is the enterprise's overall view of the database.

•Physical or Internal view: the structure used to physically implement the logical data model on a physical medium.

Page 32: Rdbms
Page 33: Rdbms

What is a Key field in a Database?

A key is the smallest subset of attributes from the relation such that the key is unique for each tuple.

A key functionally determines a tuple (row).

The selection of keys will depend on the particular application being considered.

Keys are crucial to a table structure for many reasons, some of which are identified below:

•They ensure that each record in a table is precisely identified.

•They help establish and enforce various types of integrity.

•They serve to establish table relationships.

Page 34: Rdbms

Functional dependency (FD)

•A Functional Dependency describes a relationship between attributes in a single relation.

•An attribute is functionally dependant on another if we can use the value of one attribute to determine the value of another.

Ex 1: Employee_Name is functionally dependant on Social_Security_Number because Social_Security_Number can be used to determine the value of Employee_Name.

Ex 2:If we know the value of the destination attribute, then we can find the corresponding value for the fare attribute. The fare attribute is therefore functionally dependent upon the destination attribute.

Page 35: Rdbms

We use the symbol -> to indicate a functional dependency. -> means functionally determines

Examples : destination -> fare

The attributes listed on the left hand side of the -> are called determinants.One can read A -> B as, "A determines B".

Not all determinants are keys.

Page 36: Rdbms

E F CODD's RULES

Codd presented twelve rules that a database must obey if it is to be considered truly relational.

1. The information rule. All information in a relational database is represented explicitly at the logical level and in exactly one way by values in tables.

Rule 1 is basically the informal definition of a relational database.

2. Guaranteed access rule. Each and every datum (atomic value) in a relational database is guaranteed to be logically accessible by resorting to a combination of table name, primary key value, and column name.

Page 37: Rdbms

Rule 2 stresses the importance of primary keys for locating data in the database. The table name locates the correct table, the column name finds the correct column, and the primary key value finds the row containing an individual data item of interest.

3. Systematic treatment of null values. Null values are supported in a fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of the data type.

Rule 3 requires support for missing data through NULL values.

Page 38: Rdbms

4. Dynamic online catalog based on the relational model. The database description is represented at the logical level in the same way as ordinary data, so that authorized users can apply the same relational language to its interrogation as they apply to the regular data.

Rule 4 requires that a relational database be self-describing. In other words, the database must contain certain system tables whose columns describe the structure of the database itself.

Page 39: Rdbms

5. Comprehensive data sublanguage rule. A relational system may support several languages and various modes of terminal use (for example, the fill-in-the-blanks mode). However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings, and that is comprehensive in supporting all of the following items:•Data definition •View definition •Data manipulation (interactive and by program) •Integrity constraints •Authorization •Transaction boundaries (begin, commit, and rollback)

Page 40: Rdbms

Rule 5 mandates using a relational database language, such as SQL, although SQL is not specifically required. The language must be able to support all the central functions of a DBMS -- creating a database, retrieving and entering data, implementing database security, and so on.

6. View updating rule. All views that are theoretically updateable are also updateable by the system.

Rule 6 deals with views, which are virtual tables used to give various users of a database different views of its structure. It is one of the most challenging rules to implement in practice, and no commercial product fully satisfies it today.

Page 41: Rdbms

7. High-level insert, update, and delete. The capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data but also to the insertion, update, and deletion of data.

Rule 7 stresses the set-oriented nature of a relational database. It requires that rows be treated as sets in insert, delete, and update operations. The rule is designed to prohibit implementations that only support row-at-a-time, navigational modification of the database.

8. Physical data independence. Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods.

Page 42: Rdbms

9. Logical data independence. Application programs and terminal activities remain logically unimpaired when information preserving changes of any kind that theoretically permit unimpairment are made to the base tables.

Rule 8 and Rule 9 insulate the user or application program from the low-level implementation of the database. They specify that specific access or storage techniques used by the DBMS, and even changes to the structure of the tables in the database, should not affect the user's ability to work with the data.

10. Integrity independence. Integrity constraints specific to a particular relational database must be definable in the relational data sublanguage and storable in the catalog, not in the application programs.

Page 43: Rdbms

Rule 10 says that the database language should support integrity constraints that restrict the data that can be entered into the database and the database modifications that can be made. This is another of the rules that is not support in most commercial DBMS products.

11. Distribution independence. A relational DBMS has distribution independence.

Rules 11 says that the database language must be able to manipulate distributed data location on other computer systems.

Page 44: Rdbms

12. Nonsubversion rule. If a relational system has a low-level (single record at a time) language, that low level cannot be used to subvert or bypass the integrity rules and constraints expressed in the higher level relational language (multiple records at a time).

Rule 12 prevents "other paths" into the database that might subvert the relational structure and integrity.

Page 45: Rdbms

Data integrity

Data integrity means that the data values in the database are correct and consistent.

Data integrity means, in part, that you can correctly and consistently navigate and manipulate the tables in the database.

Data integrity is enforced in the relational model by entity and referential integrity rules.

Page 46: Rdbms

Entity Integrity

•The entity integrity rule states that for every instance of an entity, the value of the primary key must exist, be unique, and cannot be null.

•Without entity integrity, the primary key could not fulfill its role of uniquely identifying each instance of an entity.

Eg: EMPNO column is primary key in EMP table DEPTNO column is primary key in DEPT table

Page 47: Rdbms

Referential Integrity

•The referential integrity rule states that if a relational table has a foreign key, then every value of the foreign key must either be null or match the values in the relational table in which that foreign key is a primary key.

•Referential integrity ensures that we can correctly navigate between related entities

Eg : DEPTNO column in EMP is FOREIGN KEY with reference to DEPTNO column in DEPT which is a PRIMARY KEY

Page 48: Rdbms

Foreign key….

A foreign key creates a hierarchical relationship between two associated entities.

The entity containing the foreign key is the child, or dependent, and the table containing the primary key from which the foreign key values are obtained is the parent.

In order to maintain referential integrity between the parent and child as data is inserted or deleted from the database certain insert and delete rules are to be considered.

If FK and PK are in same table it is self referential

Page 49: Rdbms

EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO 7369 SMITH CLERK 7902 17-DEC-80 800 20 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 7566 JONES MANAGER 7839 02-APR-81 2975 20 7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 7782 CLARK MANAGER 7839 09-JUN-81 2450 10 7788 SCOTT ANALYST 7566 19-APR-87 3000 20 7839 KING PRESIDENT 17-NOV-81 5000 10 7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30 7876 ADAMS CLERK 7788 23-MAY-87 1100 20 7900 JAMES CLERK 7698 03-DEC-81 950 30 7902 FORD ANALYST 7566 03-DEC-81 3000 20 7934 MILLER CLERK 7782 23-JAN-82 1300 10

 EMP table

DEPTNO DNAME LOC 10 ACCOUNTING NEW YORK 20 RESEARCH DALLAS 30 SALES CHICAGO 40 OPERATIONS BOSTON

DEPT table SALGRADE tableGRADE LOSAL HISAL 1 700 1200 2 1201 1400 3 1401 2000 4 2001 3000 5 3001 9999

Page 50: Rdbms

Modification Anomalies

•Once our E-R model has been converted into relations, we may find that some relations are not properly specified. There can be a number of problems:

•Deletion Anomaly: Deleting a relation results in some related information (from another entity) being lost.

•Insertion Anomaly: Inserting a relation requires we have information from two or more entities - this situation might not be feasible.

Page 51: Rdbms

NORMALIZATION

Page 52: Rdbms

NORMALIZATION

It is a process of analyzing the given relation schemas based on their Functional Dependencies (FDs) and primary key to achieve the properties - Minimizing redundancy - Minimizing insertion, deletion and update anomalies.

Note: Don’t include computed fields

Normalization comprises a set of rules called Normal Forms i.e., First normal form, Second normal form, Third normal form, Forth normal form, Boyce/Codd Normal Form etc.,

Page 53: Rdbms

FIRST NORMAL FORM

Reduce entities to first normal form (1NF) by removing repeating or multivalued attributes to another, child entity

Page 54: Rdbms

Relation in 1NF

Page 55: Rdbms

Second Normal Form

Reduce first normal form entities to second normal form (2NF) by removing attributes that are not dependent on the whole primary key (partial dependency)

Page 56: Rdbms

Relation in 2NF

Page 57: Rdbms

Transitive dependency

An attribute in a table is dependent on another non primary key attribute rather than the primary key.

A condition where A, B, and C are attributes of a relation such that

if A -> B and B -> C

then C is transitively dependent on A via B i.e, A->C is a transitive dependency.

This situation is eliminated when the table is in third normal form.

Page 58: Rdbms

Third Normal Form

•A relation is in third normal form (3NF) if it is in second normal form and it contains no transitive dependencies.

OR

Reduce second normal form entities to third normal form (3NF) by removing attributes that depend on other, nonkey attributes.

Page 59: Rdbms

Decomposed into 3NF

Page 60: Rdbms

All the tables