rdbms course tutorial

27
Database Management System History Overview of Database What is Data? Data is distinct information that is formatted in a special way (or) Any Fact that can be recorded. E.g. Data exists in a variety of forms, like text on paper, images, videos and Speeches) What is Database? A Database is a collection of related data organised in a way that data can be easily accessed, managed and updated Database Management System (DBMS) It is a collection of programs that enables user to create and maintain a database. In other words it is general-purpose software

Upload: rajivndpt8394

Post on 02-Feb-2016

251 views

Category:

Documents


1 download

DESCRIPTION

RDBMS Concept for Beginners

TRANSCRIPT

Page 1: RDBMS Course Tutorial

Database Management SystemHistory

Overview of Database

What is Data?

Data is distinct information that is formatted in a special way (or) Any Fact that can be recorded. 

E.g. Data exists in a variety of forms, like text on paper, images, videos and Speeches)

What is Database?

A Database is a collection of related data organised in a way that data

can be easily accessed, managed and updated

Database Management System (DBMS)

It is a collection of programs that enables user to create and maintain a

database. In other words it is general-purpose software that provides the users

with the processes of defining, constructing and manipulating the database for

various applications.

Page 2: RDBMS Course Tutorial

Examples:-Oracle,SQL,DB2,Postregs,Mengo etc

Database System and Components?

The DB and DBMS software together is called as Database system.

Database System divided into 4 components

Users: Users may be of various type such as DB administrator, System

developer and End users.

Database application : Database application may be Personal,

Departmental, Enterprise and Internal

DBMS: Software that allow users to define, create and manages database

access, Ex: MySql, Oracle etc.

Database: Collection of logical data.

Advantages & Disadvantages of DBMS?

Advantages

Page 3: RDBMS Course Tutorial

1) Redundancy is controlled.2) Unauthorised access is restricted.3) Providing multiple user interfaces.4) Enforcing integrity constraints.5) Providing backup and recovery.

Disadvantages

1. Complexity2. Costly3. Large in size

Disadvantage in File Processing System?

1) Data redundancy and inconsistency.2) Difficult in accessing data.3) Data isolation.4) Data integrity.5) Concurrent access is not possible.6) Security Problems.

Database Architecture

All projects are broadly divided into two types of applications Logical two-tier Client / Server architecture Logical Three-tier Client / Server architecture

Basically high level we can say that 2-tier architecture is Client server application and 3-tier architecture is Web based application

Logical two-tier Client / Server architecture

The two-tier is based on Client Server architecture. The two-tier architecture is like client server application. The direct communication takes place between client and server. There is no intermediate between client and server. Because of tight coupling a 2 tiered application will run faster.

Two-Tier Architecture

Page 4: RDBMS Course Tutorial

The above figure shows the architecture of two-tier. Here the direct communication between client and server, there is no intermediate between client and server.

Let’s take a look of real life example of Railway Reservation two-tier architecture:

Let’s consider that first Person is making Railway Reservation for Mumbai to Delhi by Mumbai Express at Counter No. 1 and at same time second Person is also try to make Railway reservation of Mumbai to Delhi from Counter No. 2

If staff from Counter No. 1 is searching for availability into system & at the same staff from Counter No. 2 is also looking for availability of ticket for same day then in this case there is might be good change of confusion and chaos occurs. There might be chance of lock the Railway reservation that reserves the first.

But reservations can be making anywhere from the India, then how it is handled?

So here if there is difference of micro seconds for making reservation by staff from Counter No. 1 & 2 then second request is added into queue. So in this case the Staff is entering data to Client Application and reservation request is sent to the database. The database sends back the information/data to the client.

In this application the Staff user is an end user who is using Railway reservation application software. He gives inputs to the application software and it sends requests to Server. So here both Database and Server are incorporated with each other, so this technology is called as “Client-Server Technology“.

The Two-tier architecture is divided into two parts:

1) Client Application (Client Tier)2) Database (Data Tier)

On client application side the code is written for saving the data in the SQL server database. Client sends the request to server and it process the request & send back with data. The main problem of two tier architecture is the server cannot respond multiple request same time, as a result it cause a data integrity issue.

Advantages:

1. Easy to maintain and modification is bit easy2. Communication is faster

Disadvantages:

1. In two tier architecture application performance will be degrade upon increasing the users.

2. Cost-ineffective

Logical three-tier Client / Server architecture

Page 5: RDBMS Course Tutorial

Three-tier architecture typically comprises a presentation tier, a business or data access tier, and a data tier. Three layers in the three tier architecture are as follows:

1) Client layer2) Business layer3) Data layer

1) Client layer:

It is also called as Presentation layer which contains UI part of our application. This layer is used for the design purpose where data is presented to the user or input is taken from the user. For example designing registration form which contains text box, label, button etc.

2) Business layer:

In this layer all business logic written like validation of data, calculations, data insertion etc. This acts as a interface between Client layer and Data Access Layer. This layer is also called the intermediary layer helps to make communication faster between client and data layer.

3) Data layer:

In this layer actual database is comes in the picture. Data Access Layer contains methods to connect with database and to perform insert, update, delete, get data from database based on our input data.

Three-Tier Architecture

Advantages

1. High performance, lightweight persistent objects

Page 6: RDBMS Course Tutorial

2. Scalability – Each tier can scale horizontally3. Performance – Because the Presentation tier can cache requests, network

utilization is minimized, and the load is reduced on the Application and Data tiers.

4. High degree of flexibility in deployment platform and configuration5. Better Re-use6. Improve Data Integrity7. Improved Security – Client is not direct access to database.8. Easy to maintain and modification is bit easy, won’t affect other modules9. In three tier architecture application performance is good.

Disadvantages

1. Increase Complexity/Effort

Database Model

A Database model defines the logical design of data. The model describes the

relationships between different parts of the data. In history of database design,

three models have been in use.

Hierarchical Model Network Model Relational Model

Hierarchical Model

A hierarchical database model is a data model in which the data is organized into a tree-like structure. The data is stored as records which are connected to one another through links (Pointer). A record is a collection of fields, with each field containing only one value.

The hierarchical database model mandates that each child record has only one parent, whereas each parent record can have one or more child records. In order to retrieve data from a hierarchical database the whole tree needs to be traversed starting from the root node. This model is recognized as the first database model created by IBM in the 1960s

Page 7: RDBMS Course Tutorial

Network Model

The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice.

Relational model

a. The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F. Codd. In the relational model of a database, all data is represented in terms of tuples, grouped into relations.

b. The purpose of the relational model is to provide a declarative method for specifying data and queries: users directly state what information the database contains and what information they want from it, and let the database management system

Page 8: RDBMS Course Tutorial

software take care of describing data structures for storing the data and retrieval procedures for answering queries.

Codd’s Rule

Codd's twelve rules are a set of thirteen rules (numbered zero to twelve) proposed by Edgar F. Codd, a pioneer of the relational model for databases, designed to define what is required from a database management system in order for it to be considered relational, i.e., a relational database management system (RDBMS).[

Rule 0: The Foundation rule:

A relational database management system must manage its stored data using only its

relational capabilities.

The system must qualify as relational, as a database, and as a management system. For

a system to qualify as a relational database management system (RDBMS), that system

must use its relational facilities (exclusively) to manage the database.

Rule 1: The information rule:

All information in a relational database (including table and column names) is

represented in only one way, namely as a value in a table.

Rule 2: The guaranteed access rule:

All data must be accessible. This rule is essentially a restatement of the fundamental

requirement for primary keys. It says that every individual scalar value in the database

must be logically addressable by specifying the name of the containing table, the name of

the containing column and the primary key value of the containing row.

Rule 3: Systematic treatment of null values:

The DBMS must allow each field to remain null (or empty). Specifically, it must support a

representation of "missing information and inapplicable information" that issystematic,

Page 9: RDBMS Course Tutorial

distinct from all regular values (for example, "distinct from zero or any other number", in

the case of numeric values), and independent of data type. It is also implied that such

representations must be manipulated by the DBMS in a systematic way.

Rule 4: Active online catalog based on the relational model:

The system must support an online, inline, relational catalog that is accessible to

authorized users by means of their regular query language. That is, users must be able

to access the database's structure (catalog) using the same query language that they

use to access the database's data.

Rule 5: The comprehensive data sublanguage rule:

The system must support at least one relational language that

1. Has a linear syntax

2. Can be used both interactively and within application programs,

3. Supports data definition operations (including view definitions), data manipulation

operations (update as well as retrieval), security and integrity constraints,

andtransaction management operations (begin, commit, and rollback).

Rule 6: The view updating rule:

All views that are theoretically updatable must be updatable by the system.

Rule 7: High-level insert, update, and delete:

The system must support set-at-a-time insert, update, and delete operators. This means

that data can be retrieved from a relational database in sets constructed of data from

multiple rows and/or multiple tables. This rule states that insert, update, and delete

operations should be supported for any retrievable set rather than just for a single row in

a single table.

Rule 8: Physical data independence:

Changes to the physical level (how the data is stored, whether in arrays or linked lists

etc.) must not require a change to an application based on the structure.

Rule 9: Logical data independence:

Changes to the logical level (tables, columns, rows, and so on) must not require a

change to an application based on the structure. Logical data independence is more

difficult to achieve than physical data independence.

Rule 10:Integrity independence:

Integrity constraints must be specified separately from application programs and stored in

the catalog. It must be possible to change such constraints as and when appropriate

without unnecessarily affecting existing applications.

Rule 11: Distribution independence:

Page 10: RDBMS Course Tutorial

The distribution of portions of the database to various locations should be invisible to

users of the database. Existing applications should continue to operate successfully:

1. when a distributed version of the DBMS is first introduced; and

2. when existing distributed data are redistributed around the system.

Rule 12: The nonsubversion rule:

If the system provides a low-level (record-at-a-time) interface, then that interface cannot

be used to subvert the system, for example, bypassing a relational security or integrity

constraint.

Normalization 

The inventor of the relational model Edgar Codd proposed the theory of normalization with the introduction of 1NF and he continued to extend theory with 2 and 3NF.Later he joined with Raymond F. Boyce  to develop the theory of Boyce-Codd Normal Form. 

Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: 

1) Eliminating redundant data (for example, storing the same data in more than one table)

2) Ensuring data dependencies make sense (only storing related data in a table).

Before we begin our discussion of the normal forms, it's important to point out that they are guidelines and guidelines only. Occasionally, it becomes necessary to stray from them to meet practical business requirements.

First Normal Form (1NF)

First Normal Form (1NF) sets the very basic rules for an organized database:

1) Eliminate duplicative columns from the same table.2) Every column must have value. It should not be empty3) Create separate tables for each group of related data and identify each

row with a unique column (the primary key).

Page 11: RDBMS Course Tutorial

Before we proceed lets understand a few things --

What is Key?

A KEY is a value used to uniquely identify a record in a table. A KEY could be a single column or combination of multiple columns

Page 12: RDBMS Course Tutorial

Note: Columns in a table that are NOT used to uniquely identify a record are called non-key columns.

a. One or more columns in a database table that is used to sort and/or identify rows in a table.

b. It is used to fetch or retrieve records/data-rows from data table according to the condition/requirement.

c. Keys are also used to create relationship among different database tables or views.

E.g. if you were sorting people by the field salary then the salary field is the key

Primary keya. A primary is a single column values used to uniquely identify a

database record. b. When we create primary key to any table then a clustered index is

automatically created to that column.

It has following attributes

A primary key cannot be NULL A primary key value must be unique The primary key values cannot be changed The primary key must be given a value when a new record is

inserted.

Composite/ Compound key

A composite key is a primary key composed of multiple columns used to identify a record uniquely

In our database, we have two people with the same name Robert Phil but they live at different places.

Hence we require both Full Name and Address to uniquely identify a record. This is a composite key.

a. A composite key consisting of two or more fields that uniquely describe a row in a table.

b. The difference between compound and candidate is that all of the fields in the compound key are foreign keys; in the candidate key one or more of the fields may be foreign keys (but it is not mandatory). 

Page 13: RDBMS Course Tutorial

E.g. You could have an EMPLOYEE table with a candidate key using PASSPORT_NUMBER and another using SOCIAL_SECURITY_NUMBER. In exclusion both can uniquely identify a row. Either can be used as a primary key (but not both since a table can have only one primary key).

Let's move into 2NF

2NF Rules

Second Normal Form (2NF)

Meet all the requirements of the first normal form.

1) Remove subsets of data that apply to multiple rows of a table and place them in separate tables.

2) Create relationships between these new tables and their predecessors through the use of foreign keys.

We have divided our 1NF table into two tables viz. Table 1 and Table2. Table 1 contains member information. Table 2 contains information on movies rented.

We have introduced a new column called Membership_id which is the primary key for table 1. Records can be uniquely identified in Table 1 using membership id

Introducing Foreign Key!

In Table 2, Membership_ID is the foreign Key

Page 14: RDBMS Course Tutorial

oreign Key references primary key of another Table!It helps connect your Tables 

 

A foreign key can have a different name from its primary key It ensures rows in one table have corresponding rows in another Unlike Primary key they do not have to be unique. Most often they

aren't Foreign keys can be null even though primary keys can not  

Why do you need a foreign key ?

Suppose an idiot inserts a record in Table B such as

Page 15: RDBMS Course Tutorial

You will only be able to insert values into your foreign key that exist in the unique key in the parent table. This helps in referential integrity. 

The above problem can be overcome by declaring membership id from Table2  as foreign key of membership id  from Table1 

Now , if somebody tries to insert a value in the membership id  field that does not exist in the parent table , an error will be shown!

What is a transitive functional dependencies?

A transitive functional dependency is when changing a non-key column , might cause any of the other non-key columns to change

Consider the table 1. Changing the non-key column Full Name , may change Salutation.

Let's move ito 3NF

3NF RulesMeet all the requirements of the second normal form.

Page 16: RDBMS Course Tutorial

1) Remove columns that are not dependent upon the primary key.Eg: total, average etc which can be calculated in sql query

Rule 1- Be in 2NF Rule 2- Has no transitive functional dependenciesTo move our 2NF table into 3NF we again need to need divide our table.

TABLE 1

 

Table 2

 

Table 3

Page 17: RDBMS Course Tutorial

We have again divided our tables and created a new table which stores Salutations. 

There are no transitive functional dependencies and hence our table is in 3NF

In Table 3 Salutation ID is primary key and in Table 1 Salutation ID is foreign to primary key in Table 3

Now our little example is in a level that cannot further be decomposed to attain higher forms of normalization. In fact it is already in higher normalization forms. Separate efforts for moving in to next levels of normalization are normally needed in complex databases.  However we will be discussing about next levels of normalizations in brief in the following.

Foreign Keya. A foreign key is a relationship between columns in two database tables

(one of which is indexed) designed to insure consistency of data

E.g. Each record in a CUSTOMER table contains the ID of the account manager for that customer. In the ACCOUNT_MANAGER table the ID would typically be the primary key (indexed, unique, not null).

The ID field in the CUSTOMER table is the foreign key; only values for ACCOUNT_MANAGER.ID will be allowed in the CUSTOMER.ID field.

Super Key

An attribute or a combination of attribute that is used to identify the records uniquely is known as Super Key. A table can have many Super Keys.

E.g. Primary key, Unique key, Alternate key are subset of Super Keys.

Candidate keysa. A candidate key is a column or group of columns that can uniquely identify

a row in the table without referring to any other source. b. Each table may have one or more candidate keys. One of these candidate

keys is selected as primary key

E.g. you could have an EMPLOYEE table with a candidate key using FULL_NAME and another using DATE_OF_BIRTH.

Alternate key

Page 18: RDBMS Course Tutorial

a. If any table have more than one candidate key, then after choosing primary key from that candidate keys, rest of candidate keys are known as an alternate key of that table

b. Basically it is a candidate key that currently is not primary key.

E.g. we have a table named Employee which has two columns EmpID and EmpMail, both have not null attributes and unique value. So both columns are treated as candidate key. Now we make EmpID as a primary key to that table then EmpMail is known as alternate key.

Unique Keya. Unique key is a set of one or more fields/columns of a table that uniquely

identify a record in database table.b. It is like Primary key but it can accept only one null value and it cannot

have duplicate values. 

Boyce and Codd Normal Form (BCNF)

A relation is in Boyce-Codd Normal Form (BCNF) if every determinant is a candidate key.ORKey attribute should not depend on a non key attribute.Determinant: A determinant in a database table is any attribute that you can use to determine the values assigned to other attribute(s) in the same row.Examples: Consider a table with the attributes employee_id, first_name, last_name and date_of_birth. In this case, the field employee_id determines the remaining three fields. The name fields do not determine the employee_id because the firm may have more than one employee with the same first and/or last name. Similarly, the DOB field does not determine the employee_id or the name fields because more than one employee may share the same birthday.

Candidate Key: A candidate key is a combination of attributes that can be uniquely used to identify a database record. Each table may have one or more candidate keys. One of these candidate keys is selected as the table primary key.

Fourth Normal form (4NF)

Page 19: RDBMS Course Tutorial

Meet all the requirements of the third normal form and BCNF.A relation is in 4NF if it has no more than one multi-valued or multiple dependencies. Consider these entities: employees, skills, and languages. An employee can have several skills and know several languages. There are two relationships, one between employees and skills, and one between employees and languages. A table is not in fourth normal form if it represents both relationships. Instead, the relationships should be represented in two tables. If, however, the attributes are interdependent (that is, the employee applies certain languages only to certain skills), the table should not be split.A good strategy when designing a database is to arrange all data in tables that are in fourth normal form, and then to decide whether the results give you an acceptable level of performance. If they do not, you can rearrange the data in tables that are in third normal form, and then re assess performance.

Fifth Normal form (5NF)

(Join-projection normal form)JPNF. It should be in 4NF.No multi valued dependency exists.

Levels of Abstraction in Data Modelling/Design

Conceptual ModelA) ER (entity/relationship) Diagram

Logical Model Physical Model

Introduction to SQL

Data Languages

A DBMS is a software package that carries out many different tasks including the provision of facilities to enable the user to access and modify information in the database. The database is an intermediate link between the physical database, computer and the operating system and the users. To provide the various facilities to different types of users, a DBMS normally provides one or more specialized programming languages called database languages.

DDLData Definition Language (DDL) statements are used to define the database structure or schema. Some examples:

o CREATE - to create objects in the database

o ALTER - alters the structure of the database

o DROP - delete objects from the database

o TRUNCATE - remove all records from a table, including all spaces allocated for the

records are removedo COMMENT - add comments to the data dictionary

o RENAME - rename an object

DML

It is a language that provides a set of operations to support the basic data

manipulation operations on the data held in the databases. It allows users

Page 20: RDBMS Course Tutorial

to insert, update, delete and retrieve data from the database. The part of

DML that involves data retrieval is called a query language.

The following table gives an overview about the usage of DML

statements in SQL:

                 

o SELECT - retrieve data from the a database

o INSERT - insert data into a table

o UPDATE - updates existing data within a table

o DELETE - deletes all records from a table, the space for the records remain

o MERGE - UPSERT operation (insert or update)

o CALL - call a PL/SQL or Java subprogram

o EXPLAIN PLAN - explain access path to data

o LOCK TABLE - control concurrency

DCLData Control Language (DCL)

DCL statements control access to data and the database using statements

such as GRANT and REVOKE. A privilege can either be granted to a User

with the help of GRANT statement. The privileges assigned can be

SELECT, ALTER, DELETE, EXECUTE, INSERT, INDEX etc. In addition to

granting of privileges, you can also revoke (taken back) it by using

REVOKE command.

o GRANT - gives user's access privileges to database

o REVOKE - withdraw access privileges given with the GRANT command

TCLTransaction Control (TCL) statements are used to manage the changes made by DML statements. It allows statements to be grouped together into logical transactions.

o COMMIT - save work done

o SAVEPOINT - identify a point in a transaction to which you can later roll back

o ROLLBACK - restore database to original since the last COMMIT

o SET TRANSACTION - Change transaction options like isolation level and what rollback

segment to use

Page 21: RDBMS Course Tutorial

Getting Start with Microsoft SQL Server

a) Overview of Microsoft SQL Server

What is a SQL Server?

SQL Server is a Microsoft product used to manage and store information. Technically, SQL Server is a “relational database management system” (RDMS). Broken apart, this term means two things. First, that data stored inside SQL Server will be housed in a “relational database”, and second, that SQL Server is an entire “management system”, not just a database. SQL itself stands for Structured Query Language. This is the language used to manage and administer the database server.

b) Installing and Configuring SQL Server c) Tools in SQL Server

SQL Database Designing

a) Databasesb) Tablesc) Partitioning Table

Database Objects

a) Stored Proceduresb) Functionsc) Viewsd) Dynamic Management Viewe) Indexes f) Triggers g) Constraints

Page 22: RDBMS Course Tutorial

Database Management Security

a) Loginsb) Usersc) Protocolsd) Policy-Based Managemente) Data Recovery (Backup & Restore)f) SQL Server Agent

ACID (Atomicity, Consistency, Isolation, and Durability) is a set of properties that guarantee that database transactions are processed reliably. 

Atomicity:-The phrase "all or nothing" succinctly describes the first ACID property of atomicity. When an update occurs to a database, either all or none of the update becomes available to anyone beyond the user or application performing the update.

Consistency

Consistency is the ACID property that ensures that any changes to values in an instance are consistent with changes to other values in the same instance. A consistency constraint is a predicate on data which serves as a precondition, post-condition, and transformation condition on any transaction.

The isolation portion of the ACID Properties is needed when there are concurrent transactions. Concurrent transactions are transactions that occur at the same time, such as shared multiple users accessing shared objects. 

Maintaining updates of committed transactions is critical. These updates must never be lost. The ACID property of durability addresses this need. Durability refers to the ability of the system to recover committed transaction updates if either the system or the storage media fails. Features to consider for durability:

Degrees of isolation1:

degree 0 - a transaction does not overwrite data updated by another user or process ("dirty data") of other transactions

degree 1 - degree 0 plus a transaction does not commit any writes until it completes all its writes (until the end of transaction)

degree 2 - degree 1 plus a transaction does not read dirty data from other transactions

degree 3 - degree 2 plus other transactions do not dirty data read by a transaction before the transaction commits

1. These were originally described as degrees of consistency by Jim Gray.

Page 23: RDBMS Course Tutorial