90bdbms concepts for ibps it-officer 2014 · this document is prepared for ibps so (it-officer)...
TRANSCRIPT
ENGISTAN.COM
DBMS Concepts For IBPS IT-Officer 2014
This document is prepared for IBPS SO (IT-Officer) Examination 2014. The key concepts of DBMS are explained in a very precise & lucid way to assist the aspirants in their preparation. If you have any queries, doubts, or suggestions, please do share with us in our Forum.
We wish you All The Best – TEAM Engistan
Contents 1. Basic Terms
2. Database Models 3. RDBMS
4. Database Keys 5. Database Users 6. Normalization 7. E-R Diagram
8. Generalization & Specialization 9. SQL Basics
10. Data Languages 11. SQL Queries
12. Transactions-ACID Properties
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ] Data: Data is the quantities, characters, or symbols on which operations are performed by a computer.
Data (or) Information Processing: The process of converting the facts into meaningful information is known as Data processing. It is also known as Information processing.
Meta Data: The term Metadata refers to "data about data”. Metadata is defined as the data providing information about one or more aspects of the data, such as:
• Means of creation of the data
• Purpose of the data
• Time and date of creation
• Creator or author of the data
• Location on a computer network where the data were created
• Standards used
Database: A database is a structured collection of data, which is organized into files called tables.
o A logically coherent collection of related data that (i) describes the entities and their inter-relationships, and (ii) is designed, built & populated for a specific reason.
Database Model
A Database model defines the logical design of data. The model describes the relationships between different parts of the data. In history of database design, three models have been in use.
• Hierarchical Model
• Network Model
• Relational Model
Hierarchical Model: In this model each entity has only one parent but can have several children. At the top of hierarchy there is only one entity which is called Root.
Engistan.com | Engineer’s Community
1
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ]
Network Model: In the network model, entities are organised in a graph, in which some entities can be accessed through several path
Relational Model: In this model, data is organised in two-dimesional tables called relations. The tables or relation are related to each other.
Engistan.com | Engineer’s Community
2
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ]
RDBMS Concepts
A Relational Database management System (RDBMS) is a database management system based on relational model introduced by E.F Codd. In relational model, data is represented in terms of tuples (rows).
RDBMS is used to manage Relational database. Relational database is a collection of organized set of tables from which data can be accessed easily. Relational Database is most commonly used database. It consists of number of tables and each table has its own primary key.
What is Table ?
In Relational database, a table is a collection of data elements organised in terms of rows and columns. A table is also considered as convenient representation of relations. But a table can have duplicate tuples while a true relation cannot have duplicate tuples. Table is the most simplest form of data storage. Below is an example of Employee table.
ID Name Age Salary
Engistan.com | Engineer’s Community
3
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ]
1 Adam 34 13000
2 Alex 28 15000
3 Stuart 20 18000
4 Ross 42 19020
What is a Record ?
A single entry in a table is called a Record or Row. A Record in a table represents set of related data. For example, the above Employee table has 4 records. Following is an example of single record.
1 Adam 34 13000
What is Field ?
A table consists of several records (row), each record can be broken into several smaller entities known as Fields. The above Employee table consist of four fields, ID, Name, Age and Salary.
What is a Column ?
In Relational table, a column is a set of value of a particular type. The term Attribute is also used to represent a column. For example, in Employee table, Name is a column that represent names of employee.
Name
Engistan.com | Engineer’s Community
4
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ]
Adam
Alex
Stuart
Ross
Database Management System (DBMS):
- A collection of programs that enables users to perform certain actions on a particular database:
define the structure of database information (descriptive attributes, data types, constraints, etc), storing this as meta- data
populate the database with appropriate information
manipulate the database (for retrieval/update/removal/insertion of information)
protect the database contents against accidental or deliberate corruption of contents (involves secure access by users and automatic recovery in the case of user/hardware faults)
share the database among multiple users, possibly
concurrently
Examples of DBMS are Oracle, Sybase, MySQL, DB/2, SQLServer, Informix, MS-Access, FileMaker etc
Sample Databases
Shown below is an extract from a (relational) database that might be part of a
University’s Academic Information System:
Engistan.com | Engineer’s Community
5
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ]
Terminology:
relation = table (file)
attribute = column (field)
tuple = row (record)
Engistan.com | Engineer’s Community
6
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ]
Database Keys:
Keys are very important part of Relational database. They are used to establish and identify relation between tables. They also ensure that each record within a table can be uniquely identified by combination of one or more fields within a table.
Super Key: Super Key is defined as a set of attributes within a table that uniquely identifies each record within a table. Super Key is a superset of Candidate key.
Candidate Key: Candidate keys are defined as the set of fields from which primary key can be selected. It is an attribute or set of attribute that can act as a primary key for a table to uniquely identify each record in that table.
Primary Key: Primary key is a candidate key that is most appropriate to become main key of the table. It is a key that uniquely identify each record in a table.
Engistan.com | Engineer’s Community
7
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ]
Foreign Key: A foreign key is generally a primary key from one table that appears as a field in another where the first table has a relationship to the second. In other words, if we had a table A with a primary key X that linked to a table B where X was a field in B, then X would be a foreign key in B.
Composite Key: Key that consists of two or more attributes that uniquely identify an entity occurrence is called Composite key. But any attribute that makes up the Composite key is not a simple key in its own.
Secondary or Alternative key: The candidate key which are not selected for primary key are known as secondary keys or alternative keys
Engistan.com | Engineer’s Community
8
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ] Non-key Attribute: Non-key attributes are attributes other than candidate key
attributes in a table.
Non-prime Attribute: Non-prime Attributes are attributes other than Primary attribute.
Database Users:
Database Administrators (DBA):
o individual(s) that determine & implement policy regarding users, their permissions on a database and the design & construction of that database
Database Designers:
o individual(s) – possibly also software engineers – who apply design techniques to produce database structures pertinent to a specific application
End Users:
o People who, from time to time, access the contents of a database:
Casual end users may submit ad-hoc queries as the need arises, using a high-level query language
naïve, or parametric, end-users access the database
through pre-written programs that effect an appropriate
interface to the database
database programmers write code, using a relevant
programming language and the high-level query language, that
can later be used by parametric users
Normalization
Normalization is a systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics like Insertion, Update and Deletion Anomalies. It is a two-
Engistan.com | Engineer’s Community
9
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ]
step process that puts data into tabular form by removing duplicated data from the relation tables.
Normalization is used for mainly two purposes,
• Eliminating redundant (useless) data.
• Ensuring data dependencies make sense i.e data is logically stored.
Problem Without Normalization
Without Normalization, it becomes difficult to handle and update the database, without facing data loss. Insertion, Updation and Deletion Anomalies are very frequent if Database is not Normalized. To understand these anomalies let us take an example of Student table.
S_id S_Name S_Address Subject_opted
401 Adam Noida Bio
402 Alex Panipat Maths
403 Stuart Jammu Maths
404 Adam Noida Physics
Updation Anomaly : To update address of a student who occurs twice or more than
twice in a table, we will have to update S_Address column in all the rows, else data will
become inconsistent.
Engistan.com | Engineer’s Community
10
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ]
Insertion Anomaly: Suppose for a new admission, we have a Student id(S_id), name and
address of a student but if student has not opted for any subjects yet then we have to
insert NULL there, leading to Insertion Anamoly.
Deletion Anomaly: If (S_id) 401 has only one subject and temporarily he drops it, when
we delete that row, entire student record will be deleted along with it.
Normalization Rule
Normalization rule are divided into following normal form.
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. BCNF
1. First Normal Form (1NF): A row of data cannot contain repeating group of data i.e each column must have a unique value. Each row of data must have a unique identifier i.e Primary key. For example consider a table which is not in First normal form
Student Table :
S_id S_Name subject
401 Adam Biology
401 Adam Physics
402 Alex Maths
403 Stuart Maths
You can clearly see here that student name Adam is used twice in the table and subject math is also repeated. This violates the First Normal form. To reduce above table to First Normal form breaks the table into two different tables
Engistan.com | Engineer’s Community
11
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ]
New Student Table :
S_id S_Name
401 Adam
402 Alex
403 Stuart
Subject Table :
subject_id student_id subject
10 401 Biology
11 401 Physics
12 402 Math
12 403 Math
In Student table concatenation of subject_id and student_id is the Primary key. Now both the Student table and Subject table are normalized to first normal form
2. Second Normal Form (2NF): A table to be normalized to Second Normal Form should meet all the needs of First Normal Form and there must not be any partial dependency of any column on primary key. It means that for a table that has concatenated primary key, each column in the table that is not part of the primary key must depend upon the entire concatenated key for its existence. If any column depends oly on one part of the concatenated key, then the table fails Second normal form. For example, consider a table which is not in Second normal form.
Engistan.com | Engineer’s Community
12
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ]
Customer Table:
customer_id Customer_Name Order_id Order_name Sale_detail
101 Adam 10 order1 sale1
101 Adam 11 order2 sale2
102 Alex 12 order3 sale3
103 Stuart 13 order4 sale4
In Customer table concatenation of Customer_id and Order_id is the primary key. This table is in First Normal form but not in Second Normal form because there are partial dependencies of columns on primary key. Customer_Name is only dependent on customer_id, Order_name is dependent on Order_id and there is no link between sale_detail and Customer_name.
To reduce Customer table to Second Normal form break the table into following three different tables.
Customer_Detail Table :
customer_id Customer_Name
101 Adam
102 Alex
103 Stuart
Order_Detail Table :
Engistan.com | Engineer’s Community
13
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ]
Order_id Order_Name
10 Order1
11 Order2
12 Order3
13 Order4
Sale_Detail Table :
customer_id Order_id Sale_detail
101 10 sale1
101 11 sale2
102 12 sale3
103 13 sale4
Now all these three table comply with Second Normal form.
3. Third Normal Form (3NF): Third Normal form applies that every non-prime attribute of table must be dependent on primary key. The transitive functional dependency should be removed from the table. The table must be in Second Normal form. For example, consider a table with following fields.
Engistan.com | Engineer’s Community
14
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ]
Student_Detail Table:
Student_id Student_name DOB Street city State Zip
In this table Student_id is Primary key, but street, city and state depends upon Zip. The dependency between zip and other fields is called transitive dependency. Hence to apply 3NF, we need to move the street, city and state to new table, with Zip as primary key.
New Student_Detail Table :
Student_id Student_name DOB Zip
Address Table :
Zip Street city state
The advantage of removing transitive dependency is,
• Amount of data duplication is reduced. • Data integrity achieved.
4. Boyce and Codd Normal Form (BCNF): Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals with certain type of anamoly that is not handled by 3NF. A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF.
Engistan.com | Engineer’s Community
15
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ]
E-R Diagram
ER-Diagram is a visual representation of data that describes how data is related to each other.
Symbols and Notations
Engistan.com | Engineer’s Community
16
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ]
Components of E-R Diagram
The E-R diagram has three main components.
1) Entity
An Entity can be any object, place, person or class. In E-R Diagram, an entity is represented using rectangles. Consider an example of an Organisation. Employee, Manager, Department, Product and many more can be taken as entities from an Organisation.
Weak Entity
Weak entity is an entity that depends on another entity. Weak entity doen't have key attribute of their own. Double rectangle represents weak entity.
2) Attribute
An Attribute describes a property or characterstic of an entity. For example, Name, Age, Address etc can be attributes of a Student. An attribute is represented using eclipse.
Engistan.com | Engineer’s Community
17
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ]
Key Attribute
Key attribute represents the main characteristic of an Entity. It is used to represent Primary key. Ellipse with underlying lines represent Key Attribute.
Composite Attribute
An attribute can also have their own attributes. These attributes are known as Composite attribute.
3) Relationship
A Relationship describes relations between entities. Relationship is represented using diamonds.
There are three types of relationship that exist between Entities.
• Binary Relationship
• Recursive Relationship
• Ternary Relationship
Binary Relationship
Binary Relationship means relation between two Entities. This is further divided into three types.
Engistan.com | Engineer’s Community
18
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ]
1. One to One : This type of relationship is rarely seen in real world.
The above example describes that one student can enroll ony for one course and a course will also have only one Student. This is not what you will usually see in relationship.
2. One to Many : It reflects business rule that one entity is associated with many number
of same entity. For example, Student enrolls for only one Course but a Course can have
many Students.
The arrows in the diagram describes that one student can enroll for only one course.
3. Many to Many :
The above diagram represents that many students can enroll for more than one courses.
Recursive Relationship
When an Entity is related with itself it is known as Recursive Relationship.
Ternary Relationship
Relationship of degree three is called Ternary relationship.
Engistan.com | Engineer’s Community
19
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ]
Generalization and Specialization
Generalization: Generalization is a bottom-up approach in which two lower level entities combine to form a higher level entity. In generalization, the higher level entity can also combine with other lower level entity to make further higher level entity.
Specialization: Specialization is opposite to Generalization. It is a top-down approach in which one higher level entity can be broken down into two lower level entity. In specialization, some higher level entities may not have lower-level entity sets at all.
Aggregation: Aggregation is a process when relation between two entity is treated as a single entity. Here the relation between Center and Course is acting as an Entity in relation with Visitor.
Engistan.com | Engineer’s Community
20
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ] SQL Basics
Introduction to SQL
Structure Query Language (SQL) is a programming language used for storing and managing data in RDBMS. SQL was the first commercial language introduced for E.F Codd's Relational model. Today almost all RDBMS (MySql, Oracle, Infomix, Sybase, MS Access) uses SQL as the standard database language.
SQL is used to perform all type of data operations in RDBMS.
SQL Command
SQL defines following data languages to manipulate data of RDBMS.
DDL : Data Definition Language
All DDL commands are auto-committed. That means it saves all the changes permanently in the database.
Command Description
create to create new table or database
alter for alteration
truncate delete data from table
drop to drop a table
rename to rename a table
DML : Data Manipulation Language
DML commands are not auto-committed. It means changes are not permanent to database, they can be rolled back.
Engistan.com | Engineer’s Community
21
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ]
Command Description
insert to insert a new row
update to update existing row
delete to delete a row
merge merging two rows or two tables
TCL : Transaction Control Language
These commands are to keep a check on other commands and their affect on the database. These commands can annul changes made by other commands by rolling back to original state. It can also make changes permanent.
Command Description
commit to permanently save
rollback to undo change
savepoint to save temporarily
DCL : Data Control Language
Data control language provides command to grant and take back authority.
Command Description
grant grant permission of right
revoke take back permission.
Engistan.com | Engineer’s Community
22
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ]
DQL : Data Query Language
Command Description
select retrieve records from one or more table
Basic Structure of SQL Queries:
The basic structure of an SQL query consists of three clauses: SELECT, FROM, and WHERE.
1. SELECT Statement: SELECT Statement Defines WHAT is to be returned (separated by commas) Database Columns (From Tables or Views) Constant Text Values Formulas Pre-defined Functions Group Functions (COUNT, SUM, MAX, MIN, AVG)
“*” Means All Columns From All Tables In the FROM Statement Example: SELECT state_code, state_name
2. FROM Statement: Defines the Table(s) or View(s) Used by the SELECT or WHERE Statements „ You MUST Have a FROM statement „ Multiple Tables/Views are separated by Commas
3. WHERE Clause: Defines what records are to be included in the query
It is Optional. Uses Comparison Operators (=, >, >=, <, <=,!=,<> Multiple Conditions Linked with AND & OR Statements Strings Contained Within SINGLE QUOTES.
- AND & OR Statements:
Multiple WHERE conditions are Linked by AND / OR Statements „ “AND” Means All Conditions are TRUE for the Record
Engistan.com | Engineer’s Community
23
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ]
„ “OR” Means at least 1 of the Conditions is TRUE „ You May Group Statements with ( ) „ BE CAREFUL MIXING “AND” & “OR” Conditions
Examples:
1. SELECT * FROM annual_summaries WHERE sd_duration_code = ‘‘1
2. SELECT state_name FROM states WHERE state_population > 15000000
3. SELECT state_name, state_population FROM states WHERE state_name LIKE ‘‘%NORTH%’’
4. SELECT * FROM annual_summaries WHERE sd_duration_code IN (‘‘1’’, , ‘‘W’’, , ‘‘X’’) AND annual_summary_year = 2000
Transaction Management:
Transaction: A transaction is a unit of program execution that accesses and possibly updates various data items. Or in simple words A transaction is an event which occurs on the database. Generally a transaction reads a value from the database or writes a value to the database.
Goal Of Transactions: The ACID properties
Atomicity: Either all actions are carried out, or none are.
Consistency: If each transaction is consistent, and the database is initially consistent, then it is left consistent.
Isolation: Transactions are isolated, or protected, from the effects of other scheduled transactions.
Durability: If a transaction completes successfully, then its effects persist.
1. Atomicity: A transaction can Commit after completing its actions, or Abort because of
- Internal DBMS decision: restart - System crash: power, disk failure, … - Unexpected situation: unable to access disk, data value, …
A transaction interrupted in the middle could leave the database inconsistent
Engistan.com | Engineer’s Community
24
Engistan.com 90BDBMS CONCEPTS FOR IBPS IT-OFFICER 2014[ ]
DBMS needs to remove the effects of partial transactions to ensure atomicity: either all a transaction’s actions are performed or none.
2. Consistency: Database consistency is the property that every transaction sees a consistent database instance. It follows from transaction atomicity, isolation and transaction consistency Users are responsible for ensuring transaction consistency
- when run to completion against a consistent database instance, the transaction leaves the database consistent
For example, consistency criterion that my inter-account-transfer transaction does not change the total amount of money in the accounts!
3. Isolation: Guarantee that even though transactions may be interleaved, the net effect is identical to executing the transactions serially For example, if transactions T1 and T2 are executed concurrently, the net
effect is equivalent to executing - T1 followed by T2, or - T2 followed by T1
NOTE: The DBMS provides no guarantee of effective order of execution.
4. Durability: DBMS uses the log to ensure durability. If the system crashed before the changes made by a completed transaction
are written to disk, the log is used to remember and restore these changes when the system is restarted.
Again, this is handled by the recovery manager
Engistan.com | Engineer’s Community
25