db01-definitions and concepts

Upload: jeet-sarkar

Post on 06-Apr-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 DB01-Definitions and Concepts

    1/18

    DATABASE-1: Definitions and Concepts

    Overview

    History of Database

    What is a Database?

    Components of a Database

    Different Database Users

    Data Abstraction Database Schema and Instances

    Database Languages DDL, DML, DQL, DCL

    Data Models

    Transaction Management

    Database Management System

    Data Administrator (DA) & Database Administrator (DBA)

    Advantages and Disadvantages of a DBMS

    History of Database

    Data may be defined as a collection ofrelated information. For example during a

    census, information related to different people like name, age, occupation, income etc. maybe collectively stored for each person, thusforming a list of related data. To make use ofthis huge amount of data, it needs to be storedto be accessed efficiently whenever needed.

    Before the advent of computers, paper index cards were used to store informationand maintain a catalogue of different types of related data. For example a librarywould use different sets of index cards one set to keep a record of its books andanother set to keep a record of its members. When a new book was bought, aseparate index card was added to the existing list with information related to thenew book. Similarly when a new member joined the library, a new card was added

    to the member set with information related to the member and the books borrowed.With the emergence of computers, instead of using paper index cards, data werestored in separate computer files in the form of records. Each file contained a groupof related records. Thus the library would now have a file containing records of thedifferent books and a separate file to contain records of the different members.

    The figure to the right shows one such applicationwhere a manufacturing company is using threeseparate files for its business.

    The first file, the Supplier file, stores a list ofnames, addresses, telephone numbers, andcontact persons of different suppliers who aresupplying raw materials to the company formaking their products.

    The second file, the Orderfile, is storing details ofthe purchase orders placed on the differentsuppliers for raw materials.

    The third file, the Payment file, is storing detailsof the payments made to the different suppliersagainst their purchase orders based on their termsof payment.

    The users of the different files interact with thefiles by means of specific application programs. Thus the Supplier file user interacts with the

    Supplier file through the Supplier ProcessingApplication program, the Order file user uses the

    DB01 Definitions and Concepts Page 1 of 18 Joyrup Bhattacharya

    No. CJ2ed/0221

    Cobol Made EasyJ. K. JonesSecond Edition$10.95

    A. Banerjee

    2/3 Park CircusCalcutta 7000191. MK1ed/23412. CG1ed/0542

    Book Index Cards Member Index Cards

    SupplierProcessing

    Application

    Order

    ProcessingApplication

    SupplierData File

    OrderData File

    Supplier File User

    Order File User

    PaymentProcessingApplication

    PaymentData File

    Payment File User

  • 8/3/2019 DB01-Definitions and Concepts

    2/18

    Order Processing Application program to access the Order file and thePayment file user runs the Payment Processing Application program to accessthe Payment file.

    Though File Processing Systems were an improvement over the earlier manualsystem of keeping records, however it had some major drawbacks:

    1. Data Duplication: The same data may need to be stored in different files. Forexample in the above application when an order is placed on a supplier, the

    Order file should also contain the Supplier Name and Address. Thus the samedata i.e. the Supplier Name and Address occurs in the Supplier file and alsoin the Order file. Moreover if several orders are placed on the same supplier atdifferent times, the number of occurrences of the same data will be even more.

    Two problems arise due to this. Firstly unnecessary space is used up tostore the same data at different places. The second problem is that of dataintegrity. A collection of data is said to have integrity if it is logicallyconsistenteverywhere. For example if a supplier changes its address, then thesame change needs to be updated in all the files where the address of thesupplier is stored. However chances are that due to manual or other mistakes thesame data may not get updated everywhere, leading to data inconsistencyand data integrityproblems.

    2. Separated and isolated Data: Usually different files are used to containdifferent information. In case data needs to be combined from thesedifferent files, the programmer must know which data need to be selectedfrom which file, before combining them to form a third file. For example theSupplier file, Order file and the Payment file contain different information. Ifthe user wants to find the payments that are due in a particular month, he has torefer to both the Order file and Payment file to get the required result. In casedata need to be combined from many files, theprocess becomes complexonthe part of the programmer.

    3. File format dependent Applications: In file processing systems, applicationprograms are written based on the data files on which they work. Thus the actualformats of the data in the files are an integral part of the application program

    code i.e. there is a dependency between the data files and the applicationprograms that work on those files. In case any modification is made on the datatype in a file, then all application programs that use that data file are alsorequired to be modified. For example all application programs that process filescontaining phone numbers need to be modified if the phone number changesfrom 7 to 8 digits. Modifying several programs is a time consuming anderror prone job.

    4. File Incompatibility: In case different programming platforms are usedtodevelop the application programs, then the formats of the data files on whichthese programs act will also be different for each of the programming languagesused. Thus a C program data file will be different from a Visual Basic programdata file. Under such a situation, if a requirement is there to combine data

    from the different data files, then the data files need to be firstconverted to a common formatand then used. This is both a complex andtime-consuming task.

    5. User unfriendly data representation: In a file processing system it isdifficult to combine data from different files and display them in a userfriendly manner based on the specific requirements of the end user. This isbecause it is difficult to process relationships between different data fromdifferent files in a file processing system.

    Database technologies were developed to overcome these difficulties. Unlike FileProcessing systems, in Database Processing systems the user applicationprograms do not directly interact with the stored data, but through an intermediatesystem called the Database Management System or DBMS. In doing so, the

    application programs become independent of the way in which the data is actuallystored.

    DB01 Definitions and Concepts Page 2 of 18 Joyrup Bhattacharya

    DataIntegrity

  • 8/3/2019 DB01-Definitions and Concepts

    3/18

    What is a Database?

    To overcome the shortfalls of the File Processing systems, Database technologieswere developed during the 60s with major effort given by IBM Corporation. UnlikeFile Processing systems whose main components are application programdependent data files, themain componentof aDatabase Processing SystemisaDatabase. A database is usually a collection of information or data related to aparticular topic or subject. This data can be purely textualin nature like the name,

    address, and contact number etc. of a person in a telephone directory. It cancontain graphicaldata like photograph of the person along with his name, or it canbe a collection ofaudio/video files as in a music album database. Moreover in adatabase, the structure of the stored data should be independent of applicationprograms that may use this data. In general a database should have the followingcharacteristics:

    1. Data Independence: In file processing systems, the application programsare dependent on the data structures of the data files on which they act. Thus ifa data format is changedfor better efficiency or accuracy or new data itemsare added to accommodate changes, the application programs also need tobe changedto accommodate the changes. In a database system data itemsare stored independent of any application program. Application programsinteract with the Database Management System which in turn interacts with thedatabase, making the application programs independent of any changes made tothe database.

    This ability to modify the data scheme in one level without affecting the datascheme in a higher level is called data independence. There are two kinds ofdata independence. These arePhysicalData Independence (also called programdata independence) and LogicalData Independence.The differences include:

    Physical Data Independence Logical Data Independence

    It is the ability to modify the physical schema of data storagewithout the need to rewrite theapplication programs that access the

    database.

    It is the ability to modify theunderlying logical or conceptualschema of a database without the

    need to rewrite application programs.

    It leaves the users views andmethods of accessing the informationunaffectedby changes made to thephysical organisation of data at thephysical or internal level.

    It leaves the users views andmethods for accessing the informationunaffected by changes made to thestructure of database at theconceptual level.

    Such modifications are usually doneto improve the overall performance of a database.

    It allows the logical structure of thedatabase to be altered dynamicallyin case a change is required.

    Application programs do notdependent much on the physicalstructure of the data. Hence it isrelatively easier to achieve physicaldata independence.

    Since application programs are highlydependent on the logical structure of

    the data, it is a difficult job toachieve logical data independence.

    Example: Changing the fileorganisation from sequential torandom access to improveperformance.

    Example: Adding a new field like themobile phone number to an existingrecord of a person in a company.

    2. Data Integrity: In a database a particular data is kept at a single placeavoiding duplication of data. This ensures data updates need to beimplemented at a single point only, eliminating chances for any confusion.

    3. Data Flexibility: In a database, the same data can be accessed from manyplaces simultaneously and in different ways based on the requirements of the

    respective application programs.

    DB01 Definitions and Concepts Page 3 of 18 Joyrup Bhattacharya

    DataIndependence

  • 8/3/2019 DB01-Definitions and Concepts

    4/18

    4. User Friendly Interface: The end user is not required to bother about theactual or physical storage of data. Highly technical software called the DatabaseManagement System takes care of the low level data structures and therelationship between the different data. Thus the complexity of the data andimplementation details are hidden by the DBMS and the end user can access therequired data with the minimal of technical knowledge.

    We can thus formally define a Database as stated below.

    A database is a self-describing, shared collection of interrelated data fromwhere users can efficiently retrieve information in response to specific queries.

    Components of a Database

    The self-describing nature of a database impliesthat we do not have to rely on any externalinformation to find out what the data in thedatabase represent or the relation between thedifferent data components. In a File Processingsystem, the data files contain only the data, andthe description of the data is a part of theApplication Programs that access the data files. On

    the other hand, in a Database Processing system, the data description is inbuiltinto the database along with the data. In the set of data shown above in theform of a table, the data part consists of the values: 0001, Godrej, Mumbai,21365871 etc. Whereas the headings: SupID, SupName, SupCity, andSupPhone, describe the database structure and meaning of each data item. Bothof these are inbuilt into the database.

    The data related to the structure or description of a database is calledMetadataorData Dictionary. For example Metadata includes the table names,the column names, the properties of the columns in the tables like the data-type,the length of each data-type etc. This Metadata is usually stored in the form oftables called System Tables.

    Thus in our previous example of the manufacturing

    company, we basically had three separate tablesof data viz. Supplier, Order, and Payment.Similarly each table contained several columns inwhich data was divided and stored. The SystemTables that can be used to describe the databasefor the above company are shown to the right.

    We have two System Tables containing theMetadata. These include System Tables describingthe differentTABLES (first table) and the differentCOLUMNS in the various tables (second table).

    The first table contains the names of the differenttables that comprise the database, the number ofcolumns in each table and the Primary Keys.

    Similarly the second table contains details aboutthe different columns that form each of the tables.The column names, data-types of the data theycontain and the lengths of each data type areincluded as information needed to describe thedatabase.

    To improve database performance, a databasecontains another kind of data called indexes.Suppose it is required to list the names of allsuppliers in a particular city. Since the database

    may be stored as a sorted file with respect to the ID-number ofthe suppliers, it will be a time consuming process to query each

    DB01 Definitions and Concepts Page 4 of 18 Joyrup Bhattacharya

    Supplier

    SupID

    SupName

    SupCitySupPhon

    e

    0001 Godrej Mumbai2136587

    1

    0002 Bajaj Kolkata2417555

    2

    0003 Steelco Chennai3265147

    8

    0004 Philips Kolkata2422369

    Metadata/DataDictionary

    Table Name

    Number

    ofColumns

    Primary

    Key

    Supplier 4 SupID

    Order 6 OrdNo

    Payment 4 ReceiptNo

    ColumnName

    TableName

    DataType

    Length

    SupID Supplier Integer 2

    SupName Supplier Text 20

    SupCity Supplier Text 20

    SupPhone

    Supplier Integer 2

    OrdNo Order Integer 2

    Date Order Text 10

    SupIDF Order Integer 2

    Item Order Text 30

    Rate Order Float 4

    Qty Order Integer 2

    ReceiptNo

    Payment Integer 2

    OrdNoF Payment Integer 2

    Date Payment Text 10

    Amount Payment Float 4

    DatabaseDefinition

    Index

    CityIndex

    SupCity SupID

    Chennai 0003

    Kolkata 0002

    Kolkata 0004

    Mumbai 0001

    Mumbai 0005

  • 8/3/2019 DB01-Definitions and Concepts

    5/18

    and every record and then find out the names located in a particular city. To speedup the process, a special data structure called an indexmay also be maintainedby the database. It is similar to finding a name from the telephone directory. Theindex stores the different cities in alphabetical order and relates each city to therespective supplier ID as shown to the right. It is easier to find the city inalphabetical order from the index and then find the supplier name from the SupIDgiven against each city name in the index. The DBMS looks for the supplier namefrom the Supplier table by matching the SupID as given in the index. We will

    discuss more about indexes in a later section.A database may also contain data about the applications that use the database.These may include the structure of the different data entry forms, the differenttypes of reports or queries etc. This last category of data is called ApplicationMetadata.

    We can summarise the components of a database bythe diagram to the right. The basic unit of informationstored in a database are bits. These bits combine toform characters (both strings and numbers). Thesestrings and numbers are collected to form differentfields, which in turn form records. Several records arecollected to form data files. Data-files along with other

    special data structures like Metadata, ApplicationMetadata and Indexes form the Database.

    The term shared collection in the description of a database implies that all datais stored centrally in the database. This central data is then shared by everyindividual who has access to the particular data. Data is not stored in differentindividual files as per the need of different individuals with the same datarepeating in more than one file, as in a file processing system. Different applicationprograms fetch the data from the central database where a particular data is storedonly once.

    The next term interrelated data implies that the data stored as differentrelations or tables are not independent but are related to each other. For examplein the above example, the Order table is related to the Supplier table through theSupID attribute. From the Order table if we know the SupID, we can find out thephone number of the corresponding supplier from the Supplier table by matchingthe SupID numbers in both the tables. Therefore the data stored in different tablesare related to each other by means of special attributes or keys (discussed in detailin later sections).

    The final part of the definition indicates that the information stored in a databasecan be efficiently updated and retrievedby the users by writing specific queriesin a data query language like SQL. The queries are submitted to the databasemanagement system, which responds to these queries by combining data fromdifferent tables and present the required data to the end-user in a mannerconvenient to the user.

    Database Users

    The main aim of a database is to provide ways of storing and retrieving informationin an efficient manner. To do this, different kinds of people may need to access orhandle the database both during the development and during the implementationstage. These users include the general public accessing a public database like arailway reservation database. They may include company executives handlingconfidential data in the company database. At the lower end we have the computerprofessionals engaged in developing a database and the data-entry operatorsengaged in entering the raw data into the database. Depending upon the type ofuse, we can classify database users into the following categories:

    1. Application Programmers: These are people who are engaged in developing

    general application programs to access databases. The application programis usually written in a base or host language (like C, Visual Basic etc.).

    DB01 Definitions and Concepts Page 5 of 18 Joyrup Bhattacharya

    DATABASEFILES+METADATA+INDEXES

    RECORDSFIELDS

    CHARACTERS

    BITS

  • 8/3/2019 DB01-Definitions and Concepts

    6/18

    Commands in a special Data Manipulation Language (like SQL) are thenembedded within the host language code, to access the database and performdata manipulations.

    2. Sophisticated Users: These are the people who interact with a databasewithout writingapplication programs but by requesting information from adatabase by writing queries in Data Manipulation Languages like SQL.These queries are then processed by a query processor and submitted to adatabase storage manager to provide the necessary outputs. Analysts who maybe required to analyse data based on certain criteria and generate specialreports fall under this category.

    3. Specialised Users: These people are engaged in writing specialised databaseapplication programs involving complex data structures like graphics,audio, or video data or are engaged in writing special application programs toimplement computer aided design systems.

    4. Inexperienced Users: These are end users who interact with a databasethrough permanent application programs like menu driven interfaces in arailway enquiry system, in an automated bank teller machine etc.

    5. Database Administrators: In an organisation the Database Administrator is theperson who is responsible for overall control and fine tuning of thedatabase to get the best performance. The DBA is responsible for maintainingthe database server and provide users with access to their requiredinformation as and when required.

    Data Abstraction

    In a database, the stored data needs to beretrieved and manipulated efficiently.Complex algorithms and data structures havebeen developed to do this. However not allusers of the database are computer experts andhence may not be expected to understand thesecomplex data structures to manipulate the data.

    To overcome this difficulty, the databaseapproach provides some level of dataabstraction i.e. the developers of the databasehide from the database users the details ofactually how the data is stored. Instead, itpresents to the user a view of the data that isreadily understandable by him. This helps tosimplify the users interaction with the databasesystem as it allows the user to manipulate thedata without being concerned about the underlying mechanism by which the datagets actually stored.

    In a database system thus different levels of data abstraction are used to simplifythe final data representation i.e. to connect the raw data type to the final user viewof the data. These levels include:

    1. Physical Level: This is the lowest levelof data abstraction. At this level thecomplex low level data structures used to store the data are described. Forexample at the byte level, the different records that comprise a database maybe stored as a linear linked list, as a binary tree structure, as fixed length recordsor as variable length records. The data representation at the physical level thusdescribes how blocks of data consisting of bytes of raw data are storedin consecutive storage locations. The database system hides many of theselowest level storage details from the database programmers and the end users.

    2. Logical Level: This is the next higher level. At this level the data and therelationships that exist between those data are defined. The entire database

    DB01 Definitions and Concepts Page 6 of 18 Joyrup Bhattacharya

    Logical Level

    View 2 View nView 1View Level

    PhysicalLevel

  • 8/3/2019 DB01-Definitions and Concepts

    7/18

    is described in terms of relatively simple structures like data tables etc. thoughat the physical level this may involve manipulation of complex data structures.The logical level of abstraction is used by database administrators who decidewhat information needs to be kept in the database and the relationship betweenthe different data.

    For example in a student database, the different aspects related to a student,like students personal data, students accounts related data, students academicperformance related data etc. needs to be defined and the relationships thatexist between these different aspects are established at the logical level.

    3. View Level: This is the highest levelof data abstraction. In case of a largedatabase, some complexity may still remain at the logical level. Moreovermajority of users will not be required to access the entire database, but will beconcerned with only a part of the database. Accordingly, depending upon thenature of use and the type of user, different user-friendly views of thedatabase are defined. Apart from providing appropriate database views, thislevel alsoprovides security to the database by providing selective access todifferent users.

    For example in a student database, different views or forms may be provided atthe view level like the student personal data entry view, student fees entry view,

    students marks entry and report card generation view etc. Of these teachersmay be given access to only the marks entry view, while the accountsdepartment may be given access to the fees related data view etc. thusproviding data security at these different view levels.

    Database Schema and Instance

    The overall design and description of a database is in general called thedatabase schema. The schema is used to define the following:

    a)Thephysical structure of the database i.e. the data structures usedto storethe data physically in the database. It also specifies the character sets orsymbols used to encode the data. ASCII is the best known character set used.

    b)The logical structure of the database i.e. the different relations or tablesthat comprise the database, the relationships between those tables and thedifferent attributes for the relations.

    c) The different constrains or business rules that govern different transactions.

    d) Rules to determine who has access to the schema.

    Though the contents of a database may change over time, but its schema asdetermined during design time, is hardly changed. A database may havedifferent types of schema at the different levels of data abstraction discussedearlier. Based on these, the Three-Schema Architecture has been developed toconstruct a database system. It consists of the following:

    1. Physical/Internal schema: This corresponds

    mainly to the physical data abstractionleveland deals with the physical organisationof data. It forms the lowest level anddescribes the different data structures used andhow the raw data gets stored at the byte level.

    2. Logical/Conceptual schema: Thiscorresponds mainly to the logical dataabstraction level. It is used to describe thelogical structure of the database based on thedifferent data types and the relationships thatexist between those data types. It describesthe different data operations possible and any

    constraint or business rule to be imposed on those data. The logical schema

    DB01 Definitions and Concepts Page 7 of 18 Joyrup Bhattacharya

    ThreeSchema

    Architecture

    Sub/External Views SchemaLogical/Conceptual SchemaPhysical/Internal

    SchemaStored

    Database

    User-1

    View-1 View-2 View-n

    User-2 User-n

    DatabaseSchema

  • 8/3/2019 DB01-Definitions and Concepts

    8/18

    hides the details of physical storage structures from the developer or databaseadministrator.

    3. Sub/External View schema: This corresponds to the view level of dataabstraction and deals with the way a particular user application views the datafrom the database. It forms the highest level. Each view or external schema isused to describe a part of the database that a particular user group is interestedin and hides the rest of the database from that user group.

    In general a database system supports one physical schema, one logicalschema and several sub-schemas as shown in the diagram above.

    When a new database is defined, we only specify the database schema to theDBMS. At this stage the state of the database is empty as it contains no data. Weget the initial state of the database only when the database is first filled with theinitial data. Whereas a database schema describes the structure of a database,the database state or database instance indicates the collection ofinformation stored at any particular momentin the database. At any point intime, a database has a current state or instance. It is the responsibility of thedatabase management system to ensure that every instance of a database is avalid instance satisfying the various constraints specified in the schema. Forexample in case a bank allows a minimum account balance of Rs. 1000, then the

    DBMS should take care of this constraint to ensure that at no instance can a bankaccount have a balance less than Rs. 1000.

    Unlike a database schema, an instance can change frequentlyas and when datain added, updated or removed from the database. However changes may need tobe applied to a schema once in a while. For example the mobile phone number orthe email address may need to be incorporated to the existing database ofcustomers in a bank. This is known as schema evolution and is allowed by mostmodern DBMSs during the time a database is operational.

    Database Languages (DDL, DML, DQL, DCL)

    To implement and use a database, three different classes of programminglanguages are used in general. These can be broadly divided into Data Definition

    Languages or DDL, Data Manipulation Languages or DML and the Data ControlLanguages or DCL. The functions and examples of these are described below:

    1. Data Definition Language (DDL): The design and structure of a database isusually specified by a specific language called a Data Definition Language. TheDDL forms a link between the logical and physical structure of adatabase i.e. the way the user views the data and the way the data is physicallystored. Once the DDL statements are written and compiled, they produce a set ofrelations (tables), which are stored in a special file called a Data Dictionary orData Directory. The major functions of the DDL are thus:

    a) To describe or create the logical schema or different relations in adatabase.

    b)To describe the data fields or attributesof each record i.e. to describe eachfields logical name, data-type, field length, etc.

    c) To describe the relationships between the different relations.

    d)To describe the integrity constraints.

    e) Describe the specific keys and indexes for accessing the data.

    f) Provide means ofdata securityand data restrictions.

    g) Provide means of logical and physical data independence.

    Examples of DDL statements in SQL include CREATE to establish a new table,ALTER to alter the structure of the database, DROP to delete tables from thedatabase, TRUNCATE to remove all records from a table etc.

    2. Data Manipulation Language (DML): Once the general structure of adatabase is formed using a DDL, the database can be accessed, filled and

    DB01 Definitions and Concepts Page 8 of 18 Joyrup Bhattacharya

    DB Instance

  • 8/3/2019 DB01-Definitions and Concepts

    9/18

    manipulatedby the user using a Data Manipulation Language. The Data QueryLanguage or DQL is a subset of DML and is used to write specific queries toretrieve specific data. DQL is very flexible and can be used to express quitecomplicated queries, sometimes very concisely. The different functions andcharacteristics of a DML include:

    a) Insertnew information into the database

    b) Retrieve existing information from the database based on certain criteria

    c) Delete information from the databased) Modify, sort, and update information in the database

    e) Enable a user and application programs toprocess data on a logical basisrather than bother about how the data is physically organised.

    f) Supports high-level languages (like COBOL, VB etc.) in which applicationprograms are generally written. In general DML statements are embeddedwithin high-level host languages in which application programs are written.

    In general there are basically two types of DML. These are:

    a) Procedural DMLs: In a procedural DML, to retrieve particular information, theuser has to specify both the specific data requirementalong with how toget that data. Procedural DMLs are more efficient than non-procedural

    languages. Example of a procedural approach include Relational Algebrawhich can be used to manipulate data organised in relations (tables) using thevarious relational operators. However relational algebra is hard to use anddue to their complexity they are generally not used in commercialdatabases.

    b) Non-procedural DMLs: In a non-procedural DML, to retrieve a particularinformation, the user has to specify only the specific data requirementwithout specifying the means to get that data. Since a user is not required tospecify the means of getting the data, these languages may not generate veryefficient codes. Examples of non-procedural DMLs include RelationalCalculus, Transform-Oriented-Languages (e.g. SEQUEL, SQL), Query-by-Example and Query-by-Form (e.g. MS-Access). Of these, due to its

    complexity Relational Calculus is never used in commercial databaseprocessing.

    In Transform-Oriented-Languages like SQL, the input data may beexpressed as several relations (tables), which are then transformed to expressthe required result as a single relation (table).

    Query-by-Example and Query-by-Form are graphicallanguages. In these,the user is presented with a graphical interface in the form of a Data-Entry-Form. The database management system analyses the entries made by theuser and generates the required queries.

    Examples of DML statements in SQL include SELECTto retrieve rows of data,INSERTto place new rows of data in the database, UPDATE to replace existingvalues in the database with new values, DELETE to delete rows of data etc.

    3. Data Control Language (DCL): The Data Control Language defines activitiesthat are not part of DDL or DML. DCL commands are used to control thedistribution of access privileges to users. It defines, when proposedchanges to a database can be made irreversibly. Only database administratorcan execute DCL commands.

    Examples of DCL statements in SQL include CALL to execute an SQL procedure,RETURN to return a value from an SQL function, SETassignment: to assign avalue to an SQL variable, VALUES to invoke an SQL routine, ALTERPASSWORD to change passwords etc.

    Data Models

    Data models are a collection of conceptual tools for describing the data, therelationships between the data, the constraints applicable on the data etc. There

    DB01 Definitions and Concepts Page 9 of 18 Joyrup Bhattacharya

  • 8/3/2019 DB01-Definitions and Concepts

    10/18

    are various data models available, which can be broadly classified into thefollowing:

    1. Physical Models: These data models are used to describe data at the lowestlevel of data abstraction i.e. the way the data is physically stored in thedatabase. Two popular data models used to describe the physical architectureare:

    a) Unifying Model

    b) Frame Memory Model

    2. Record Based Logical Models: These data models are used to describe dataat the logical and view levels. It uses concepts that may be understood bythe end users and at the same time not too far from the way data is actuallyorganized within the computer. In this model, the database is formed using fixedformat records of several types with each record type containing a fixednumber of fixed length fields. The different record based models include:

    a) Relational Model: In this model,data and the relationship betweenthem is represented as a collectionof tables. Each table has multiple

    columns and rows with each columnhaving unique name. All columns in aparticular row in the table form a record. The figureabove shows a relational database consisting of the tables Items (2 columns)and Supply (3 columns). The Relational model is discussed in detail in a latersection.

    b) Hierarchical Model: The Hierarchicalmodel is the oldest of databasemodels. Here records are logicallyorganised into a hierarchy of relationships forming an invertedtree pattern. All records in a hierarchy

    are called nodes with each noderelated to the next in a Parent-Childrelationship. Records that own otherrecords are called parent records.The top parent record (here SupplyData) is called the root record. Eachparent record can have one or more child records. But any child recordcanhave only a single parent record.

    c) Network Model: This model is used tostore data similar to the hierarchy modelsparent-child relationship. However unlikethe hierarchical model, it allows a record tobe a child of more than one parent records. The relationship between differentrecords is then represented by links in the form of pointers as shown in thefigure above. In the example, the I0003|Table record can beseen to be a childof both the Steelco| Kolkata and the Modern| Kolkata records.

    3. Object Based Logical Models: These data models are used in describing dataat the logical and view levels. These models are closer to human perceptionand farther from system perception. Different object based logical modelsinclude:

    a) The Entity Relationship (ER) Model: The ER model views the real world asa collection of basic objects called entities with relationships existingbetween those entities. Each entity in turn is described by a set ofattributes. Entities and relationships of the same type are grouped togetherto form an entity set and a

    relationship set. Several

    DB01 Definitions and Concepts Page 10 of 18 Joyrup Bhattacharya

    Supply

    SupName

    SupCity

    ItemCode

    Godrej Mumbai

    I0001

    Steelco Kolkata

    I0002

    SteelcoKolkat

    aI0003

    Items

    ItemCode

    Item

    I0001 Fridge

    I0002

    Almira

    hI0003 Table

    GodrejMumbaiSteelcoKolkataMod

    ernKolkata

    I0001FridgeI0002AlmirahI000

    3Table

    SuppliesSupplier Items

    Sup-

    Name Sup-City

    Item-

    Code Name

    SupplyData

    GodrejMumbai

    SteelcoKolkata

    ModernKolkata

    I0002Almirah

    I0001Fridge

    I0003Table

    I0003Table

  • 8/3/2019 DB01-Definitions and Concepts

    11/18

    graphical shapes are used to construct an ER diagram to express the overalllogical structure of a database.

    b) The Object Oriented Model: By the middle of the 1980's it was observedthat relational databases were not practical for storing data in fields likemedicine, multimedia and high energy physics, all of which needed moreflexibility in how their data was represented and accessed. This led to objectoriented databases where users could define their own methods of access todata and how it was represented and manipulated. It is based on a collectionof objects and codes called methods that operate on these objects. Objectsthat contain the same type of values and the same methods are groupedtogether into classes. Multimedia Databases, used for storing severaldifferent types of files i.e. text, audio, video and images in a single database,fall under this category.

    Transaction Management

    When working with a database, there may arise certain situations, when a particulartransaction involves two or more separate operations which form one logical unitof work. For example consider the situation in a stock transfer. Suppose x units ofitem-t are transferred from the store in a factory to the showroom for sale. For avalid transfer, the stock of item-t in the factory should get reduced by x unitsand simultaneously the stock of item-t in the showroom should get increased by xunits to keep the totalnumber of units constantbefore and after the transfer. Thetransaction will be incomplete and erroneous if either the factory stock or theshowroom stock is not updated due to some errors during the transfer. Thus eitherboth the transactions should occur or neither should occur. This all-or-nonerequirement is called atomicity. A similar situation arises in case of moneytransfer form one bank account to another. There the debit from one account mustbe followed by a credit from another account simultaneously.

    Moreover in case of money transfer, the total amount involved in the transactionshould be constant. Therefore an increase in the account A should correspond to adecrease in the account B, i.e. the sum of the money in account A and that inaccount B should be preserved. This requirement to maintain the correctness of

    the transfer is called consistency. After a particular transfer is over, the databaseshould be able to preserve the new values in spite of any system snag or failure.This property is called durability.

    We call this collection of separate operations that form a single logical unit ofwork, a transaction. Each transaction forms one unit of both atomicity andconsistency. In our above example, the change of records in the two accounts wascarried out by two separate operations or programs. Here each program by itselfdoes not transfer the database from one consistent state to another. Hence eachprogram by itself does not carry out a transaction as the atomicity property is notsatisfied in such an operation. Thus in case all the operations in a transaction do nottake place due to a system failure or any other mishap, a failed transaction shouldhave no effect on the state of the database and the database must be restored to

    the previous state before the said transaction had started.

    It is the responsibility of the Transaction Management Module of a DBMS topreserve the state of the database in case of any failures. Moreover it is theresponsibility of the database programmer to design the database in such a mannerso as to maintain these two properties in a transaction.

    DB01 Definitions and Concepts Page 11 of 18 Joyrup Bhattacharya

    Atomicity

    Consistency

    Transaction

  • 8/3/2019 DB01-Definitions and Concepts

    12/18

    Database Management System (DBMS)

    A Database Management System or DBMS isa collection of software programs thatenables users to define, create, maintainand manipulate a database for variousapplications.

    The first step in handling a database is to

    define the database. This includesspecifying the physical and logical structureof the database, defining the data types, theconstraints imposed on the data, etc. This isusually done using Data DefinitionLanguages(DDL).

    Once the logical and physical structure of thedatabase is defined, the next step iscreating the database. This impliespopulating the database i.e. actually enteringdata into a storage medium to form thedatabase.

    The final step is to manipulate the databaseto enter, retrieve or update data usingspecial application programs that incorporatestatements in special Data ManipulationLanguages (DML).

    We can thus summarise the differentfunctions of a DBMS as:

    1. Perform data storage and retrievalfunctions and handle user queries

    2. Implement data manipulationprocedures developed by the

    administrators

    3. Enforce database security at thephysical and logical level

    4. Interface with the OS to allocatecomputer resources like printers etc. tousers

    5. Implement back up and recovery incase of system crashes, power outagesetc.

    The above figure shows the essential parts of a Database ManagementSystem. These are now described in detail.

    1) Database: The lowest levelforms the database where the raw data is stored.At this level, we have the metadata and indexes stored along with the data.As discussed earlier, metadata deals with information related to the structure ofthe data. Apart from these the database level also contains another type of datastructure called indexes which are used to find data items quickly in a databaseand hence helps to improve database performance.

    2) DBMS Software: The next higher level is the DBMS software. It consists ofseveral modules used to manipulate and process the data in the database. Thedifferent modules that are used include the following:

    a) Storage Manager: The function of this module is to modify theinformation in the database and retrieve information from the database,

    when requested by the higher levels. It thus serves as an interface betweenthe low level data stored in the database and the application programs and

    DB01 Definitions and Concepts Page 12 of 18 Joyrup Bhattacharya

    DBMS

    DatabaseSystem

    Metadata

    (DatabaseDefinition)

    Data

    (DatabaseData)

    Indexes

    Database

    DBMSSoftware

    QueryProcesso

    r

    Storage Manager(File Manager + Buffer

    Manager)

    ApplicationPrograms /

    Queries

    Users andProgrammers

    Transaction

    Manager

  • 8/3/2019 DB01-Definitions and Concepts

    13/18

    queries submitted to the database system. It translates the various DMLstatements into low level file system commands. Thus in a simpledatabase, the storage manager may be the file system of the underlyingoperating system itself. However in larger databases it may consist of thefollowing components:

    i) Authorisation and Integrity Manager: It checks whether a user isauthorised to access the database. It is also responsible for maintaining theintegrityof the system. To maintain the integrity, it interacts with theQuery Processor to find out what data is being operated upon by thecurrent queries. In case of several queries running in the system, it takescare so that no two queries interfere with each other.

    ii) Transaction Manager: It keeps track of the changes made to the data torecover lost data in case of a system failure and maintain a consistentstate of the database. It maintains a data log containing a record of thechanges made so that un-executed changes can be executed after thesystem has recovered from a failure. It also maintains execution of differenttransactions simultaneously without any conflict.

    iii)The Buffer Manager: The buffer manager is used to handle mainmemory. It obtains blocks of data from the disk and allocates the blocks

    to a portion of the main memory. The buffer manager will keep a block inthe main memory as long as it is required and will return the block to thedisk if the main memory is needed by another block.

    iv)The File Manager: The file manager is used to keep track of filelocations on the disk. A file is stored in the storage device in a collection ofdisk blocks. When requested by the buffer manager, the file managerobtains the required block or blocks that contain a particular file.

    b) Query Manager/Processor: The job of Query Processor is to convert aqueryas submitted by the user, and expressed in a high level language (likeSQL) into a sequence of commands in a low level language to theStorage Manager to retrieve the appropriate information. It is also handlesrequests for modification of data and metadata. It is usually made up of the

    following modules:

    i) DML Compiler: This module is used to translate DML statements in aquery language (like SQL) to a low level language that the queryevaluation engine understands. The DML compiler also optimises the userqueries to increase the efficiency of the queries.

    ii) Embedded DML pre-compiler: This module interacts with the DMLCompiler to generate the appropriate codes for DML statementsembeddedin an application program.

    iii)DDL Interpreter: This module is responsible for interpreting DDLstatements and tabulating them in a set of tables called system tablesthat contain the metadata.

    iv)Query Evaluation Engine: This module receives low level instructionsfrom the DML Compiler and executes them to retrieve the required datafrom the database.

    3) Application Programs: Users interact with a database through applicationprogram interfaces. A typical DBMS allows programmers to write applicationprograms that through system calls to the DBMS are able to manipulatedata in a database. The most frequent interaction with a database is to query adatabase. Apart from queries, application programs are also written to modifydata or modify the database schema. However access is given only todatabase administrators to modify an existing schema or create a new database.There may be several application programs that are used by different user types.

    4) Users: At the outermost levelare the end users as described earlier, who areresponsible for maintaining and accessing the database. These include the

    DB01 Definitions and Concepts Page 13 of 18 Joyrup Bhattacharya

  • 8/3/2019 DB01-Definitions and Concepts

    14/18

    database administrator, the sophisticated users, the specialised users and theinexperienced users.

    Data Administrator (DA) & Database Administrator (DBA)

    It is the job of a special category of people in an organisation to determine whethera database technology has been successfully developed and implemented. Thesepeople are termed as Data Administrators (DA). The job of a DA is to look after

    the following:1. Strategic Planning: The DA is the key person involved in strategic planning

    of data resources and determines the major business areas or processes thedatabase should serve.

    2. Determine Data Requirement: The DA decides what data will be storedinthe database to carry out these processes and their corresponding data sources.

    3. Determine Access Policies: The DA lays down policies for accessing andmaintaining the database and determines the access rights of the differentdatabase users.

    4. The DA plays a business oriented role in determining the business

    strategies and policies involved in using a DBMS. To do so he should haveaccess to the top-level management and should be granted a wide range ofauthority in connection with the database.

    A Database Administrator (DBA) on the other hand is a technical person who isresponsible for defining the internal modelof a database. He is the person whocreates and maintains a database. To design a database, the DBA first has todiscuss with the users to determine their specific requirements. He then determinesthe physical storage requirement of the data, the accuracy requirement, frequencyof data access, search strategies, and security levels of different data. The DBA alsoidentifies the different data sources and the persons responsible for entering andupdating the data. Finally with all the specifications available, the DBA convertsthese requirements into a physical design which specifies the hardware

    requirements of the database.Depending upon the above functions, we can classifythe differentjobs of a DBAas:

    1. Schema Definition: The original database schema is createdby the DBAby writing a set of definitions. These are then translated by the DDL compiler toform a set of tables consisting of metadata that is stored in the data dictionary.

    2. Storage Structure & Access Method Definition: The storage structures ofdifferent data types and their access mechanisms are defined, guided by theneed to efficiently store and retrieve the data. These definitions are thentranslated by the DDL compilerto form the actual data structures.

    3. Schema Modification: In case there is the rare need to modify the logical orthe physical schema, the DBA is responsible to write a set of definitions thatare translated by the DDL compiler to accomplish the required modification tothe internal system tables.

    4. Data Access Authorisation: Every user of the database may not be required toaccess the entire database. Moreover some user may be allowed to modify datawhile some may be allowed to view data only. It is the DBA who is responsible forgranting rights to different classes of users. This authorisation data is keptin a special system file which is consulted by the DBMS whenever a user wants toaccess the database.

    5. Integrity Constraint Specification: Based on certain business rules or othercriteria there may be certain constraints on certain data types. For example abank may allow a minimum bank account balance, beyond which a customer will

    DB01 Definitions and Concepts Page 14 of 18 Joyrup Bhattacharya

  • 8/3/2019 DB01-Definitions and Concepts

    15/18

    not be able to withdraw money. The DBA is required to specify all such integrityconstraints explicitly.

    DB01 Definitions and Concepts Page 15 of 18 Joyrup Bhattacharya

  • 8/3/2019 DB01-Definitions and Concepts

    16/18

    Advantages and Disadvantages of using a DBMS

    The advantages of using a DBMS over a File Processing System are:

    1. Minimised data duplication:In a DBMS, a particular data isstored in one place only.Whenever any application is

    required to access the data, theDBMS retrieves the data forthe application from that place.Since a particular data is storedat a single place, storage spaceis saved. Moreover when anupdate is required, data needs tobe updated at one place only.This eliminates the problemof data integrity.

    2. Data remains together: In a Database system, all data are stored at asingle place called a database. Whenever an application program requiressome data, the DBMS retrieves the data from the database. In case data frommultiple locations need to be combined, the DBMS does the same by retrievingthe required data from the database.

    3. File format independent application programs: In a Database system theapplication programs that access the data, interact with the datathrough the DBMS and not directly with the database. In case any changeoccurs in the data formats, the DBMS takes care of the same. Thus physical andlogical data independence makes application programs independent of schemamodifications.

    4. Compatibility between different files: In a Database Processing system, theapplication programs do not interact directly with the data files, instead theyinteract with the DBMS. The DBMS in turn interacts with the database files to

    generate the require results. Hence, in case different programming platforms are used to develop the application programs, they need tointeract only with the DBMS and not with the different data files. Thus thequestion of compatibility in formats of different data files does not arise.

    5. User Friendly Interfaces: Database technology makes it easier to representdata in a user friendly manner by combining data from different tables asrequired.

    In spite of the huge success of a DBMS over a conventional file processing system,however there are certain limitations of the DBMS approach as described below:

    1. Concurrency Problems: In case a DBMS package is not designed for multipleusers, problems can arise when more than one user wants to access the

    database simultaneously. This problem of concurrently accessing the samerecordin a database is known as concurrency problem.

    For example let two persons A and B have ajoint bank account. Suppose twoof them simultaneously view their bank balance from two different ATMs. Let thebank balance shown be Rs. 40,000/-. Suppose A withdraws Rs. 20,000/- andcloses the transaction whereby the DBMS program writes back a balance amountof 20,000/-. However B still sees the bank balance as Rs. 40,000/- as no changeis made to the screen view of person B after the transaction by A. So B nowwithdraws Rs 25,000/- and closes the transaction which writes back the balancerecord as (40,000-25,000) = Rs.15,000/- by overwriting the previous recordbalance of Rs. 20,000/- as entered by A. Thus at the end of the transaction, theaccount shows a balance of Rs. 15,000/- when actually, there is a negative

    balance of Rs. 5000/-.

    DB01 Definitions and Concepts Page 16 of 18 Joyrup Bhattacharya

    ConcurrencyProblem

    SupplierProcessingApplication

    Order

    ProcessingApplication

    Database

    Supplier File User

    Order File User

    PaymentProcessingApplication

    Payment File User

    DBMS

  • 8/3/2019 DB01-Definitions and Concepts

    17/18

    One can avoid a concurrency problem by locking a file when it is used by oneperson, so that it is not available for another person at the same time. Anothermethod is to lock the particular recordthat is accessed by one user, so thatthe file may be available for another user for accessing other records.

    2. Ownership Problem: In a file based system, generally data in a particular file ishandled by a particular individual. When a database is created using those files,the data is no longer the specific property of the application user, but

    instead is owned by the entire company. Any user with an access right should beable to access or use the data. Giving up ownership of data may betraumatic for any company employee and managers.

    3. Resource Problem: When a DBMS is implemented, the amount of data thatneeds to be accessed and manipulated also increases. To handle the newdatabase and run the DBMS programs, extra resources or upgradation ofexisting resources may be required. Thus extra terminals, printers, storagedevices, servers, communication devices, etc. may need to be purchased. Thisadds to the costof setting up a DBMS.

    4. Security Problem: The DBMS should be able to give access to the database toauthorised personnel only. Security considerations should include means ofcontrolling physical access to terminals, storage devices, and specific interface

    forms for updating or deletion of records.

    Questions from this Section

    1. State the major differences between a file processing system and a DBMS.4

    2. What are the disadvantages of a conventional file system? 3

    3. What is integrity problem? 4

    4. What is a Database? 2

    5. What are the levels of data abstraction? Explain each of them briefly.2+4

    6. What is a Database Schema? What is a DB instance? 3

    7. Describe the three schema architecture of a database.3

    8. What is the difference between logical and physical data independence? 4

    9. What are the different types of Database users? 3

    10. State the different database languages.3

    11. Distinguish between DDL and DML. 4

    12. What are the basic characteristics of DML? What are the types of DML? 3+3

    13. What do you mean by atomicity and consistency? 2+2

    14. What do you mean by a transaction? 3

    15. Name different types of database models. 2

    16. What is a DBMS? State the advantages and disadvantages of a DBMS. 2+4

    17. What are the components of a Query Processor? 4

    18. What are the components of a Storage Manager? 4

    19. What are the major functions of a DBA? 4

    20. What are the responsibilities of DBA and that of a database designer? 4

    DB01 Definitions and Concepts Page 17 of 18 Joyrup Bhattacharya

  • 8/3/2019 DB01-Definitions and Concepts

    18/18

    DB01 Definitions and Concepts Page 18 of 18 Joyrup Bhattacharya