rdbms - day1

Click here to load reader

Post on 05-Mar-2016

212 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

drgery

TRANSCRIPT

  • Introduction to RDBMSDay - 1

  • What is Data?Data (plural of the word Datum) - Data is a factual information used as a basis for reasoning, discussion, or calculation

    Data may be numerical data which may be integers or floating point numbers, and non-numerical data such as characters, date and etc. Data by itself normally doesnt have a meaning associated with it. e.g:- Krishnan 01-Jan-7115-Jun-0550000

  • InformationRelated data is often called as information. Information will always have a meaning and context attached to the data element. Let us consider the same example that we gave for data. When we add meaning and context to the data it becomes information:

    Employee Name: KrishnanDate of Birth: 01-Jan-71Data of Joining: 15-Jun-05Salary: 50000Department Number: 10

  • Database

    A logically coherent collection of related data (information) with inherent meaning, built for a certain application, and representing a subset of the "real-world".

    For e.g. a customer database in your bank, details of the insurance policies that we hold etc.

  • Evolution of Databases

  • File Systems (1950s)Basic ConstructsSequential recordsA record contains sequential fields

    Relies on indexes for random accessISAM (Index Sequential Access Method)VSAM (Virtual Storage Access Method)

    Basic operationOpen, close, or reset a fileRead, write or delete a record

  • Disadvantages of File processing systemData redundancy and inconsistencyDifficulty in accessing dataData isolationIntegrity problemsAtomicity problemConcurrent access anomaliesSecurity problems

  • Hierarchical Model (1960s)Data are represented by collection of records and the relationship among data are represented by links, which can be viewed as pointers.

    A record is a collection of fields (attributes), each of which contains only one data value.

    A link is an association between precisely two records.

    Records are organized as collection of rooted trees.

  • Hierarchical ModelAgain consider the database representing a customer-account relationship in a banking system. There are two record types, customer and account.

    type customer = recordcustomer name: string;customer street: string;customer city: string;end

    The account record type can be defined as

    type account = recordaccount number: string;balance: integer;end

  • Hierarchical Model

  • Network Model (1960s)Data are represented by collection of records and the relationship among data are represented by links, which can be viewed as pointers.

    Record is a collection of fields (attributes), each of which contains only one data

    Link is an association between precisely two records.

    Records are organized as arbitrary graphs.

  • Network Modelconsider a database representing a customer-account relationship in a banking system. There are two record types, customer and account.

    type customer = recordcustomer name: string;customer street: string;customer city: string;end

    The account record type can be defined as

    type account = recordaccount number: string;balance: integer;end

  • Network Model

  • Relational Model (1980s)Data is presented as a collection of relationsEach relation is depicted as a tableColumns are attributesRows ("Tuples") represent entitiesEvery table has one or more set of attributes that taken together as a "key" uniquely identifies each entity

    CustomerAccount

    emp_nonamestreetcityE1HayesMainharrisonE2JhonsonAlmaPalo altoE3TurnerPutnamstamford

    emp_noac_nobalanceE1A-102400E2A-101500E2A-201900E3A-304300

  • Relational Model Relationship is established with the help of keys

    emp_nonamestreetcityE1HayesMainharrisonE2JhonsonAlmaPalo altoE3TurnerPutnamstamford

    emp_noac_nobalanceE1A-102400E2A-101500E2A-201900E3A-304300

  • Database Management System (DBMS)A database management system (DBMS) is software that allows databases to be defined, constructed, and manipulated.

    A collection of programs that enables you to store, modify, and extract information from a database.

    The general purpose of a DBMS is to provide for the definition, storage, and management of data in a centralized area that can be shared by many users.

  • Purpose of a DBMSDBMS developed to handle the following difficulties:

    Data redundancy and inconsistencyDifficulty in accessing dataData isolation - multiple files and formatsIntegrity problemsAtomicity of updatesConcurrent access by multiple usersSecurity problems

  • Functionality of a DBMSSpecifying the database structuredata definition languageManipulation of the databasequery processing and query optimizationIntegrity enforcementintegrity constraintsConcurrent controlmultiple user environmentCrash recoverySecurity and authorization

  • DBMS Approach

  • Data ModelsObject-Based Logical Record-Based Logical Physical Data Models Models Models

    The Entity-Relationship Model Relational Model Unifying ModelThe Object-Oriented Model Network Model Frame-Memory ModelThe Semantic Data Model Hierarchical ModelThe Functional Data Model

  • Data independencecapability of changing a database scheme without havingto change the scheme at the next higher leveltwo level of data independence

  • Data AbstractionPhysical Level describes how data are actually storedLogical Level describes what data are storedView Level describes only a part of the database useful for a particular userView 1View 2View nLogical levelPhysical level

  • Database Languagesdata definition language (DDL) : specify the conceptual database scheme

    data manipulation language (DML): query languageused to retrieve and update information in a database

    host language: a conventional high level language used to write application programs

  • DBMS Structure

  • Relations

  • Relations

  • Relations

  • Codds Rule for RDBMSRule 1: The Information Rule All data should be presented to the user in table form

    Rule 2: Guaranteed Access RuleAll data should be accessible without ambiguity. This can be accomplished through a combination of the table name, primary key, and column name.

  • Rule 3: Systematic Treatment of Null ValuesA field should be allowed to remain empty. This involves the support of a null value, which is distinctfrom an empty string or a number with a value ofzero. Of course, this can't apply to primary keys. Inaddition, most database implementations support theconcept of a not null field constraint that prevents nullvalues in a specific table column.

    Rule 4: Dynamic On-Line Catalog Based on the Relational Model A relational database must provide access to its structurethrough the same tools that are used to access the data.This is usually accomplished by storing the structuredefinition within special system tables.

  • Rule 5: Comprehensive Data Sub-language RuleThe database must support at least one clearly definedlanguage that includes functionality for data definition, datamanipulation, data integrity, and database transactioncontrol. All commercial relational databases use forms ofthe standard SQL (Structured Query Language) as theirsupported comprehensive language.

    Rule 6: View Updating RuleData can be presented to the user in different logicalcombinations, called views. Each view should support thesame full range of data manipulation that direct-access to atable has available. In practice, providing update and deleteaccess to logical views is difficult and is not fully supportedby any current database.

  • Rule 7: High-level Insert, Update, and DeleteData can be retrieved from a relational database in setsconstructed of data from multiple rows and/or multipletables. This rule states that insert, update, and deleteoperations should be supported for any retrievable setrather than just for a single row in a single table.

    Rule 8: Physical Data IndependenceThe user is isolated from the physical method of storing andretrieving information from the database. Changes can bemade to the underlying architecture ( hardware, diskstorage methods ) without affecting how the user accessesit.

  • Rule 9: Logical Data IndependenceHow a user views data should not change when the logicalstructure (tables structure) of the database changes. Thisrule is particularly difficult to satisfy. Most databases rely onstrong ties between the user view of the data and the actualstructure of the underlying tables.

    Rule 10: Integrity IndependenceThe database language (like SQL) should supportconstraints on user input that maintain database integrity.This rule is not fully implemented by most major vendors. At a minimum, all databases do preserve two constraintsthrough SQL. -- No component of a primary key can have a null value -- If a foreign key is defined in one table, any value in it must exist as a primary key in another table

  • Rule 11: Distribution IndependenceA user should be totally unaware of whether or not thedatabase is distributed (whether parts of the database existin multiple locations). A variety of reasons make this ruledifficult to implement; I will spend time addressing thesereasons when we discuss distributed databases.

    Rule 12: Non-subversion RuleThere should be no way to modify the database structureother than through the multiple row database language (likeSQL). Most databases today support administrative toolsthat allow some direct manipulation of the data Structure.

  • Functional dependency Given a relation R, a set of attributes X in R is said tofunctionally determine another attribute Y, also in R, (written X Y) if and only if each X value is associated with at most one Y value. Customarily we call X the determinant set and Y the dependent attribute.

  • Properties of functional dependencies

    Given that X, Y, and Z are sets of attributes in a relation R, one can derive several properties of functional dependencies. Among the most important are Armstrong's axioms

    Reflexivity: If Y is a subset of X, then X Y Augmentation: If X Y, then XZ YZ Transitivity: If X Y and Y Z, then X Z

  • Closure of a set of Functional Dependencies

    Given a set F of functional dependencies, there are certain other functional dependencies that are logically implied by F.The set of all functional dependencies logically implied by F is the closure of F denoted by F+.F+ can be found by applying Armstrongs Axioms:

  • Closure of a set of Functional Dependencies

    We can further simplify the computation of F+ by using additionalrules:

  • Closure of a set of Functional Dependencies(example)

    R=(A, B, C, G, H, I)F= { A ---- > B A ---- > C CG ---- > H CG ---- > I B ---- > H }

    Some members of F+:

    A ---- > H (transitivity A ---- > B, B ---- > H)AG ---- > I (pseudotransitivity A ---- > C, CG ---- > I)CG ---- > HI (union CG ---- > H, CG ---- > I)

  • NORMALIZATION

  • NormalizationA company obtains parts from a number of suppliers. Each supplier is located in one city. A city can have more than one supplier located there and each city has a status code associated with it. Each supplier may provide many parts. The company creates a simple relational table to store this information that can be expressed in relational notation as:

    FIRST (s#, status, city, p#, qty)

  • First Normal Form (1 NF)

  • 1 NF Anomalies contains redundant data. For example, information about the supplier's location and the location's status have to be repeated for every part supplied. Redundancy causes what are called update anomalies.

    INSERT. The fact that a certain supplier (s5) is located in a particular city (Athens) cannot be added until they supplied a part. DELETE. If a row is deleted, then not only is the information about quantity and part lost but also information about the supplier. UPDATE. If supplier s1 moved from London to New York, then six rows would have to be updated with this new information.

  • Second Normal Form (2 NF)

  • 2 NFfunctional dependencies in the First table:

    s# > city, status city > status (s#,p#) >qty

  • 2 NFThe process for transforming a 1NF table to 2NF is:

    Identify any determinants other than the composite key, and the columns they determine. Create and name a new table for each determinant and the unique columns it determines. Move the determined columns from the original table to the new table. The determinate becomes the primary key of the new table. Delete the columns you just moved from the original table except for the determinate which will serve as a foreign key. The original table may be renamed to maintain semantic meaning.

  • 2 NFTo transform FIRST into 2NF we move the columns s#,status, and city to a new table called SECOND. The columns# becomes the primary key of this new table

  • 2 NF AnomaliesTables in 2NF but not in 3NF still contain modificationanomalies. In the example of SECOND, they are:

    INSERT. The fact that a particular city has a certain status (Rome has a status of 50) cannot be inserted until there is a supplier in the city.

    DELETE. Deleting any row in SUPPLIER destroys the status information about the city as well as the association between supplier and city.

  • Third Normal Form (3 NF)

  • 3 NFTable PARTS is already in 3NF. The non-key column, qty, is fully dependent upon the primary key (s#, p#).

    SUPPLIER is in 2NF but not in 3NF because it contains the following transitive dependency.

    s# > status s# > city city > status

  • 3 NFThe process of transforming a table into 3NF is:

    Identify any determinants, other the primary key, and the columns they determine. Create and name a new table for each determinant and the unique columns it determines. Move the determined columns from the original table to the new table. The determinate becomes the primary key of the new table. Delete the columns you just moved from the original table except for the determinate which will serve as a foreign key. The original table may be renamed to maintain semantic meaning.

  • 3 NFcreate a new table called CITY_STATUS and move the columns cityand status into it. Status is deleted from the original table, city is leftbehind to serve as a foreign key to CITY_STATUS, and the originaltable is renamed to SUPPLIER_CITY to reflect its semantic meaning

  • 3 NFThe results of putting the original table into 3NF hascreated three tables. These can be represented in"psuedo-SQL" as:

    PARTS (#s, p#, qty) Primary Key (s#,#p) Foreign Key (s#) references SUPPLIER_CITY.s#

    SUPPLIER_CITY(s#, city) Primary Key (s#) Foreign Key (city) references CITY_STATUS.city

    CITY_STATUS (city, status) Primary Key (city)

  • Advantages of Third Normal Form The advantage of having relational tables in 3NF is that iteliminates redundant data which in turn saves space andreduces manipulation anomalies. For example, theimprovements to our sample database are:

    INSERT. Facts about the status of a city, Rome has a status of 50, can be added even though there is not supplier in that city. Likewise, facts about new suppliers can be added even though they have not yet supplied parts.

    DELETE. Information about parts supplied can be deleted without destroying information about a supplier or a city. UPDATE. Changing the location of a supplier or the status of a city requires modifying only one row.

  • Boyce/Codd Normal Form

    Determinant: an attribute or a group of attributes on whichsome other attribute is fully functionally dependent.

    Boyce/Codd Normal Form: a relation is in BCNF if and only if every determinant is a candidate key.

  • www.igate.com

    Where the hierarchical model structures data as a tree of records, with each record having one parent record and many children, the network model allows each record to have multiple parent and child records, forming a lattice structure.

    The chief argument in favor of the network model, in comparison to the hierarchic model, was that it allowed a more natural modeling of relationships between entities. Although the model was widely implemented and used, it failed to become dominant for two main reasons. Firstly, IBM chose to stick to the hierarchical model in their established products such as IMS and DL/I. Secondly, it was eventually displaced by the relational model, which offered a higher-level, more declarative interface. The relational model consists of three components:

    1. A Structural component -- a set of TABLES (also called RELATIONS). 2. MANIPULATIVE component consisting of a set of high-level operations which act upon and produce whole tables. 3. A SET OF RULES for maintaining the INTEGRITY of the database.

    The fourth and fifth normal forms (4NF and 5NF) deal specifically with the relationship of attributes comprising a multi-attribute primary key. Sixth normal form (6NF) only applies to temporal databases

    The simplest way I can say a table is in BCNF if it is in 3NF and the only determinants are the candidate keys.