141405 - database management systems

Upload: nbpr

Post on 13-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 141405 - DataBase Management Systems

    1/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    1

    Unit Number & Name: I & Introduction Period: 1 of 45

    DBMS contains information about a particular enterprise

    Collection of interrelated dataSet of programs to access the dataAn environment that is both convenientand efficientto use

    Database Applications:Banking: all transactionsAirlines: reservations, schedulesUniversities: registration, gradesSales: customers, products, purchasesOnline retailers: order tracking, customized recommendationsManufacturing: production, inventory, orders, supply chainHuman resources: employee records, salaries, tax deductions

    Databases touch all aspects of our lives

    In the early days, database applications were built directly on top of file systemsDrawbacks of using file systems to store data:

    Data redundancy and inconsistency Multiple file formats, duplication of information in different files

    Difficulty in accessing data Need to write a new program to carry out each new task

    Data isolation multiple files and formatsIntegrity problems

    Integrity constraints (e.g. account balance > 0) become buried inprogram code rather than being stated explicitly

    Hard to add new constraints or change existing onesDrawbacks of using file systems (cont.)

    Atomicity of updates Failures may leave database in an inconsistent state with partial

    updates carried out Example: Transfer of funds from one account to another should

    either complete or not happen at allConcurrent access by multiple users

    Concurrent accessed needed for performance Uncontrolled concurrent accesses can lead to inconsistencies

    Example: Two people reading a balance and updating it at the

    same timeSecurity problems

    Hard to provide user access to some, but not all, dataDatabase systems offer solutions to all the above problems

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    2/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    2

    Unit Number & Name: I & Introduction Period: 2 of 45

    Page: 2 of 2

    Levels of Abstraction

    Physical level: describes how a record (e.g., customer) is stored.Logical level: describes data stored in database, and the relationships among the data.

    type customer= recordcustomer_id: string;

    customer_name : string;customer_street: string;customer_city : string;

    end;View level: application programs hide details of data types. Views can also hide

    information (such as an employees salary) for security purposes.View of Data

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    3/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    3

    Unit Number & Name: I & Introduction Period: 3 of 45

    Page: 2 of 2

    Instances and SchemasSimilar to types and variables in programming languagesSchema the logical structure of the database

    Example: The database consists of information about a set of customers andaccounts and the relationship between them)Analogous to type information of a variable in a programPhysical schema: database design at the physical levelLogical schema: database design at the logical level

    Instance the actual content of the database at a particular point in timeAnalogous to the value of a variable

    Physical Data Independence the ability to modify the physical schema without

    changing the logical schemaApplications depend on the logical schemaIn general, the interfaces between the various levels and components shouldbe well defined so that changes in some parts do not seriously influenceothers.

    Data ModelsA collection of tools for describing

    DataData relationshipsData semantics

    Data constraintsRelational modelEntity-Relationship data model (mainly for database design)Object-based data models (Object-oriented and Object-relational)Semistructured data model (XML)Other older models:

    Network modelHierarchical model

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    4/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    4

    Unit Number & Name: I & Introduction Period: 4 of 45

    Page: 2 of 2

    Data Manipulation Language (DML)

    Language for accessing and manipulating the data organized by the appropriate data

    model

    DML also known as query language

    Two classes of languages

    Procedural user specifies what data is required and how to get those data

    Declarative (nonprocedural) user specifies what data is required without

    specifying how to get those data

    SQL is the most widely used query language

    Data Definition Language (DDL)

    Specification notation for defining the database schema

    Example: create table account(

    account_number char(10),

    branch_name char(10),

    balance integer)

    DDL compiler generates a set of tables stored in a data dictionary

    Data dictionary contains metadata (i.e., data about data)

    Database schema

    Datastorage and definition language

    Specifies the storage structure and access methods used

    Integrity constraints

    Domain constraints Referential integrity (e.g. branch_name must correspond to a valid

    branch in the branch table)

    Authorization

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    5/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    5

    Unit Number & Name: I & Introduction Period: 5 of 45

    Page: 2 of 2

    Overall System StructureThe architecture of a database systems is greatly influenced by the underlying computersystem on which the database is running:

    CentralizedClient-serverParallel (multiple processors and disks)Distributed

    Lecture Plan R/TP/02

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    6/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    6

    Unit Number & Name: I & Introduction Period: 6 of 45

    Page: 2 of 2

    Database Users

    Users are differentiated by the way they expect to interact with

    the system

    Application programmers interact with system through DML calls

    Sophisticated users form requests in a database query language

    Specialized users write specialized database applications that do not fit into the

    traditional data processing framework

    Nave users invoke one of the permanent application programs that have been

    written previously

    Examples, people accessing database over the web, bank tellers, clerical staff

    Database Administrator

    Coordinates all the activities of the database system

    has a good understanding of the enterprises information resources and needs.

    Database administrator's duties include:

    Storage structure and access method definition

    Schema and physical organization modification

    Granting users authority to access the database

    Backing up data

    Monitoring performance and responding to changes

    Database tuning

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    7/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    7

    Unit Number & Name: I & Introduction Period: 7 of 45

    Page: 2 of 2

    The Entity-Relationship Model

    Models an enterprise as a collection ofentities and relationships

    Entity: a thing or object in the enterprise that is distinguishable from other

    objects

    Described by a set ofattributes

    Relationship: an association among several entities

    Represented diagrammatically by an entity-relationship diagram:

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    8/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    8

    Unit Number & Name: I & Introduction Period: 8 of 45

    Page: 2 of 2

    Entity Setscustomerandloan

    customer_id customer_ customer_ customer_ loan_ amount

    name street city number

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    9/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    9

    Unit Number & Name: I & Introduction Period: 9 of 45

    Page: 2 of 2

    Relational Model

    Structure of Relational Databases

    Fundamental Relational-Algebra-Operations

    Additional Relational-Algebra-Operations

    Extended Relational-Algebra-Operations

    Null Values

    Modification of the Database

    Example of a Relation

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    10/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    10

    Unit Number & Name: II & RELATIONAL MODEL Period: 10 of 45

    Page: 2 of 2

    The relational Model The catalog- Types KeysEach attribute of a relation has a name

    The set of allowed values for each attribute is called the domain of the attribute

    Attribute values are (normally) required to be atomic; that is, indivisible

    E.g. the value of an attribute can be an account number,

    but cannot be a set of account numbers

    Domain is said to be atomic if all its members are atomic

    The special value null is a member of every domain

    The null value causes complications in the definition of many operations

    We shall ignore the effect of null values in our main presentation and consider

    their effect later

    Keys

    Let K R

    Kis a superkey ofR if values forKare sufficient to identify a unique tuple of each

    possible relation r(R)

    by possible r we mean a relation rthat could exist in the enterprise we are

    modeling.

    Example: {customer_name, customer_street} and

    {customer_name}

    are both superkeys ofCustomer, if no two customers can possibly have the

    same name

    In real life, an attribute such as customer_idwould be used instead

    ofcustomer_name to uniquely identify customers, but we omit it tokeep our examples small, and instead assume customer names are

    unique.

    Kis a candidate key ifKis minimal

    Example: {customer_name} is a candidate key forCustomer, since it is a superkey

    and no subset of it is a superkey.

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    11/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    11

    Primary key: a candidate key chosen as the principal means of identifying tuples

    within a relation

    Should choose an attribute whose value never, or very rarely, changes.

    E.g. email address is unique, but may change

    Relational Algebra

    Procedural language

    Six basic operators

    select:

    project:

    union:

    set difference:

    Cartesian product: x

    rename:

    The operators take one or two relations as inputs and produce a new relation as a

    result.

    Select Operation Example

    Relation r

    A=B ^ D > 5 (r)

    A B C D

    15

    1223

    773

    10

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    12/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    12

    A B C D

    123

    710

    PPrroojjeeccttOOppeerraattiioonnEExxaammppllee

    n Relation r:A B C

    10

    203040

    1

    112

    A C

    11

    12

    =

    A C

    11

    2

    A,C (r)

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    13/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    13

    UUnniioonnOOppeerraattiioonnEExxaammppllee

    A B

    121

    A B

    23

    rs

    A B

    1213

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    14/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    14

    SSeettDDiiffffeerreenncceeOOppeerraattiioonnEExxaammppllee

    n Relations r, s:

    n r s:

    A B

    121

    A B

    23

    r

    s

    A B

    11

    CCaarrtteessiiaann--PPrroodduuccttOO eerraattiioonn EExxaamm llee

    n Relations r, s:

    n rx s:

    A B

    12

    A B

    11112222

    C D

    1010201010102010

    E

    aabbaabb

    C D

    10102010

    E

    aabbr

    s

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    15/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    15

    CCaarrtteessiiaann--PPrroodduuccttOO eerraattiioonn EExxaamm llee

    n Relations r, s:

    n rx s:

    A B

    12

    A B

    11112222

    C D

    1010201010102010

    E

    aabbaabb

    C D

    10102010

    E

    aabbr

    s

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    16/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    16

    Unit Number & Name: II & RELATIONAL MODEL Period: 12 of 45

    Page: 2 of 2

    Tuple Relational CalculusDomain Relational Calculus

    Tuple Relational CalculusA nonprocedural query language, where each query is of the form

    {t|P(t) }It is the set of all tuples tsuch that predicatePis true forttis a tuple variable, t[A ] denotes the value of tuple ton attributeA

    t rdenotes that tuple tis in relation rPis aformula similar to that of the predicate calculus

    Predicate Calculus Formula

    1. Set of attributes and constants

    2. Set of comparison operators: (e.g.,,,,,,)3. Set of connectives: and (), or (v) not ()

    4. Implication (): x y, if x if true, then y is true

    xy x vy5. Set of quantifiers:

    t r(Q (t)) there exists a tuple in tin relation rsuch that predicate Q (t) is true

    t r(Q (t)) Q is true for all tuples tin relation rBanking Example

    branch (branch_name, branch_city, assets )customer(customer_name, customer_street, customer_city )

    account(account_number, branch_name, balance )loan (loan_number, branch_name, amount)depositor(customer_name, account_number)borrower(customer_name, loan_number)

    Example Queries

    Find the loan_number, branch_name, and amount for loans of over $1200

    {t | t loan t [amount ] 1200}n Find the loan number for each loan of an amount greater than $1200

    {t |s loan (t [loan_number ] = s [loan_number ]s [amount ] 1200)}

    Notice that a relation on schema [loan_number ] is implicitly defined bythe query

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    17/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    17

    Domain Relational Calculus

    A nonprocedural query language equivalent in power to the tuple relational calculusEach query is an expression of the form:

    {x1, x2, , xn | P (x1, x2, , xn)}x1, x2, , xn represent domain variablesP represents a formula similar to that of the predicate calculus

    EExxaammpplleeQQuueerriieess

    n Find the loan_number, branch_name, and amountfor loans of over$1200

    n Find the names of all customers who have a loan from the Perryridge branchand the loan amount:

    { c, a | l( c, l borrower b ( l, b, a loanb = Perryridge))}

    { c, a | l( c, l borrower l, Perryridge, a loan)}

    { c | l, b, a ( c, l borrower l, b, a loan a > 1200)}

    n Find the names of all customers who have a loan of over $1200

    { l, b, a | l, b, a loan a > 1200}

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    18/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    18

    Unit Number & Name: II & RELATIONAL MODEL Period: 13 of 45

    Page: 2 of 2

    EExxaammpplleeQQuueerriieess

    n Find the names of all customers having a loan, an account, or both atthe Perryridge branch:

    { c | s,n ( c, s, n customer)x,y,z(x, y, z branch y= Brooklyn)

    a,b (x, y, z account c,a depositor)}

    n Find the names of all customers who have an account at allbranches located in Brooklyn:

    { c | l( c, l borrower

    b,a ( l, b, a loan b = Perryridge))

    a ( c, a depositor b,n ( a, b, n account b = Perryridge))}

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    19/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    19

    Unit Number & Name: II & RELATIONAL MODEL Period: 14 of 45

    Page: 2 of 2

    SQLData DefinitionBasic Query StructureSet OperationsAggregate FunctionsNull ValuesNested SubqueriesComplex QueriesViewsModification of the DatabaseJoined Relations

    Domain Types in SQLchar(n). Fixed length character string, with user-specified length n.varchar(n). Variable length character strings, with user-specified maximum lengthn.int. Integer (a finite subset of the integers that is machine-dependent).smallint. Small integer (a machine-dependent subset of the integer domain type).numeric(p,d). Fixed point number, with user-specified precision ofp digits, with ndigits to the right of decimal point.real, double precision. Floating point and double-precision floating point numbers,with machine-dependent precision.float(n). Floating point number, with user-specified precision of at least n digits.

    Integrity Constraints

    Integrity constraints guard against accidental damage to the database, by ensuring thatauthorized changes to the database do not result in a loss of data consistency.

    A checking account must have a balance greater than $10,000.00A salary of a bank employee must be at least $4.00 an hourA customer must have a (non-null) phone number

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    20/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    20

    Unit Number & Name: II & RELATIONAL MODEL Period: 15 of 45

    Page: 2 of 2

    Advanced SQLSQL Data Types and SchemasIntegrity ConstraintsAuthorizationEmbedded SQLDynamic SQLFunctions and Procedural ConstructsRecursive QueriesAdvanced SQL Features

    Built-in Data Types in SQL

    date: Dates, containing a (4 digit) year, month and date

    Example: date 2005-7-27time: Time of day, in hours, minutes and seconds.Example: time 09:00:30 time 09:00:30.75

    timestamp: date plus time of dayExample: timestamp 2005-7-27 09:00:30.75

    interval: period of timeExample: interval 1 daySubtracting a date/time/timestamp value from another gives an interval valueInterval values can be added to date/time/timestamp values

    Can extract values of individual fields from date/time/timestampExample: extract (year from r.starttime)

    Can cast string types to date/time/timestampExample: cast as dateExample: cast as time

    Referential Integrity in SQL Example

    create tablecustomer

    (customer_name char(20),

    customer_street char(30),

    customer_city char(30),

    primary key (customer_name ))

    create tablebranch

    (branch_name char(15),

    branch_city char(30),assets numeric(12,2),

    primary key (branch_name ))

    create tableaccount

    (account_number char(10),

    branch_name char(15),

    balance integer,

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    21/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    21

    primary key (account_number),

    foreign key (branch_name) referencesbranch )

    create tabledepositor

    (customer_name char(20),

    account_number char(10),primary key (customer_name, account_number),

    foreign key (account_number) referencesaccount,

    foreign key (customer_name ) referencescustomer)

    Privileges in SQL

    select: allows read access to relation,or the ability to query using the view

    Example: grant usersU1,U2, andU3 select authorization on thebranch

    relation:

    grant select onbranch toU1, U2, U3

    insert: the ability to insert tuples

    update: the ability to update using the SQL update statementdelete: the ability to delete tuples.

    all privileges: used as a short form for all the allowable privileges

    Revoking Authorization in SQL

    The revoke statement is used to revoke authorization.

    revoke

    on from

    Example:

    revoke select onbranch fromU1, U2, U3

    All privileges that depend on the privilege being revoked are also revoked.

    may be all to revoke all privileges the revokee may hold.

    If the same privilege was granted twice to the same user by different grantees,the user may retain the privilege after the revocation.

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    22/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    22

    Unit Number & Name: II & RELATIONAL MODEL Period: 16 of 45

    Page: 2 of 2

    Embedded SQLThe SQL standard defines embeddings of SQL in a variety of programminglanguages such as C, Java, and Cobol.A language to which SQL queries are embedded is referred to as a host language,and the SQL structures permitted in the host language comprise embeddedSQL.The basic form of these languages follows that of the System R embedding of SQLinto PL/I.EXEC SQL statement is used to identify embedded SQL request to the preprocessor

    EXEC SQL END_EXECNote: this varies by language (for example, the Java embedding uses

    # SQL { . }; )

    From within a host language, find the names and cities of customers with more thanthe variable amount dollars in some account.Specify the query in SQL and declare a cursor for itEXEC SQL

    declare c cursor forselect depositor.customer_name, customer_cityfrom depositor, customer, accountwhere depositor.customer_name = customer.customer_name

    and depositor account_number = account.account_numberand account.balance > :amount

    END_EXEC

    The open statement causes the query to be evaluatedEXEC SQL open c END_EXEC

    The fetch statement causes the values of one tuple in the query result to be placed onhost language variables.

    EXEC SQL fetch c into :cn, :cc END_EXECRepeated calls to fetch get successive tuples in the query result

    A variable called SQLSTATE in the SQL communication area (SQLCA) gets set to02000 to indicate no more data is availableThe close statement causes the database system to delete the temporary relation that

    holds the result of the query.EXEC SQL close c END_EXECNote: above details vary with language. For example, the Java embedding defines Javaiterators to step through result tuples.

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    23/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    23

    Dynamic SQLAllows programs to construct and submit SQL queries at run time.Example of the use of dynamic SQL from within a C program.

    char * sqlprog = update accountset balance = balance * 1.05

    where account_number = ?EXEC SQL prepare dynprog from :sqlprog;char account[10] = A-101;EXEC SQL execute dynprogusing :account;The dynamic SQL program contains a ?, which is a place holder for a value that isprovided when the SQL program is executed.

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    24/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    24

    Unit Number & Name: III & DATABASE DESIGN Period: 19 of 45

    Page: 2 of 2

    Functional DependenciesConstraints on the set of legal relations.Require that the value for a certain set of attributes determines uniquelythe value for another set of attributes.A functional dependency is a generalization of the notion of a key.LetRbe a relation schema

    R and R

    The functional dependency

    holds onR if and only if for any legal relations r(R), whenever any twotuples t1 and t2 ofragree on the attributes, they also agree on theattributes. That is,

    t1[] = t2 [] t1[ ] = t2 [ ]Example: Considerr(A,B ) with the following instance ofr.On this instance,AB does NOT hold, but BA does hold.

    Kis a superkey for relation schemaR if and only ifKRKis a candidate key forR if and only if

    KR, andfor no K, R

    Functional dependencies allow us to express constraints that cannot beexpressed using superkeys. Consider the schema:

    bor_loan = (customer_id, loan_number, amount).We expect this functional dependency to hold:

    loan_number amount

    but would not expect the following to hold:amount customer_name

    A functional dependency is trivial if it is satisfied by all instances of arelation

    Example: customer_name, loan_number customer_name

    customer_name customer_nameIn general, is trivial if

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    25/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    25

    Unit Number & Name: III & DATABASE DESIGN Period: 21 of 45

    Page: 2 of 2

    Functional Dependencies First, Second, Third Normal Forms

    First Normal FormDomain is atomic if its elements are considered to be indivisible units

    Examples of non-atomic domains: Set of names, composite attributes Identification numbers like CS101 that can be broken up into parts

    A relational schema R is in first normal form if the domains of all attributes of R areatomicNon-atomic values complicate storage and encourage redundant (repeated) storage ofdata

    Example: Set of accounts stored with each customer, and set of owners storedwith each accountWe assume all relations are in first normal form

    Atomicity is actually a property of how the elements of the domain are used.Example: Strings would normally be considered indivisibleSuppose that students are given roll numbers which are strings of the formCS0012 orEE1127If the first two characters are extracted to find the department, the domain ofroll numbers is not atomic.Doing so is a bad idea: leads to encoding of information in applicationprogram rather than in the database.

    Third Normal Form

    A relation schemaR is in third normal form (3NF) if for all:

    inF+at least one of the following holds:

    is trivial (i.e., )

    is a superkey forR

    Each attributeA in is contained in a candidate key forR.(NOTE:

    each attribute may be in a different candidate key)If a relation is in BCNF it is in 3NF (since in BCNF one of the first two conditionsabove must hold).Third condition is a minimal relaxation of BCNF to ensure dependency preservation

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    26/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    26

    Unit Number & Name: III & DATABASE DESIGN Period: 22 of 45

    Page: 2 of 2

    Functional Dependencies First, Second, Third Normal Forms

    Third Normal Form: Motivation

    There are some situations where

    BCNF is not dependency preserving, and

    efficient checking for FD violation on updates is important

    Solution: define a weaker normal form, called Third Normal Form

    (3NF)

    Allows some redundancy

    But functional dependencies can be checked on individual relations

    without computing a join.

    There is always a lossless-join, dependency-preserving decomposition into3NF.

    EXAMPLE

    Relation R:R = (J, K, L )

    F = {JKL, LK}Two candidate keys: JKandJLR is in 3NF

    JKL JKis a superkey

    LK Kis contained in a candidate key

    Testing for 3NF

    Optimization: Need to check only FDs in F, need not check all FDs inF+.

    Use attribute closure to check for each dependency , if is a superkey.

    If is not a superkey, we have to verify if each attribute in is contained in acandidate key ofR

    this test is rather more expensive, since it involve finding candidate keystesting for 3NF has been shown to be NP-hardInterestingly, decomposition into third normal form (described shortly) can bedone in polynomial time

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    27/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    27

    Unit Number & Name: III & DATABASE DESIGN Period: 23 of 45

    Page: 2 of 2

    Dependency PreservationLetFibe the set of dependenciesF + that include only attributes inRi.

    A decomposition is dependency preserving, if(F1F2 Fn )+ =F +

    If it is not, then checking updates for violation offunctional dependencies may require computing joins,which is expensive.

    Testing for Dependency Preservation

    To check if a dependency is preserved in a decomposition ofR intoR1,R2, ,Rn we apply the following test (with attribute closure donewith respect toF)

    result=while (changes to result) do

    for eachRi in the decompositiont= (resultRi)+Riresult = result t

    Ifresultcontains all attributes in, then the functional dependency is preserved.

    We apply the test on all dependencies inF to check if a decomposition isdependency preservingThis procedure takes polynomial time, instead of the exponential timerequired to computeF+ and (F1F2 Fn)+Example

    R = (A, B, C)F = {AB

    B C}Key = {A}

    R is not in BCNFDecompositionR1 = (A, B), R2 = (B, C)R1 andR2 in BCNFLossless-join decompositionDependency preserving

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    28/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    28

    Unit Number & Name: III & DATABASE DESIGN Period: 24 of 45

    Page: 2 of 2

    Boyce/Codd Normal Form

    A relation schemaR is in BCNF with respect to a setFof functionaldependencies if for all functional dependencies inF+ of the form

    where R and R, at least one of the following holds: is trivial (i.e., )

    is a superkey forRExample schema notin BCNF:

    bor_loan = ( customer_id, loan_number, amount)

    because loan_number amountholds on bor_loanbut loan_numberisnot a superkey

    How good is BCNF?

    There are database schemas in BCNF that do not seem to be sufficientlynormalized

    Consider a databaseclasses (course, teacher, book)

    such that (c, t, b) classes means that tis qualified to teach c, and b is arequired textbook forc

    The database is supposed to list for each course the set of teachers anyone of which can be the courses instructor, and the set of books, all ofwhich are required for the course

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    29/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    29

    Unit Number & Name: III & DATABASE DESIGN Period: 25 of 45

    Page: 2 of 2

    Multi-valued Dependencies and Fourth Normal Form

    LetRbe a relation schema and let R and R. The multivalueddependency

    holds onR if in any legal relation r(R), for all pairs for tuples t1 and t2in rsuch that t1[] = t2 [], there exist tuples t3 and t4 in rsuch that:

    t1[] = t2 [] = t3 [] = t4 []t3[] = t1 []

    t3[R ] = t2[R ]t4 [] = t2[]t4[R ] = t1[R ]

    Tabular representation of

    Example

    LetRbe a relation schema with a set of attributes that are partitioned into3 nonempty subsets.

    Y, Z, WWe say that YZ(YmultideterminesZ)if and only if for all possible relations r(R )

    rand rthen

    rand rNote that since the behavior ofZand Ware identical it follows that

    YZifY WIn our example:

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    30/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    30

    course teachercourse book

    The above formal definition is supposed to formalize the notion that

    given a particular value ofY(course) it has associated with it a set ofvalues ofZ (teacher) and a set of values ofW (book), and these two setsare in some sense independent of each other.

    Note:l IfYZ then YZl Indeed we have (in above notation)Z1 = Z2

    The claim follows.Use of Multivalued Dependencies

    We use multivalued dependencies in two ways:1. To test relations to determine whether they are legal under a given set

    of functional and multivalued dependencies2. To specify constraints on the set of legal relations. We shall thusconcern ourselves only with relations that satisfy a given set of functionaland multivalued dependencies.

    If a relation rfails to satisfy a given multivalued dependency, we canconstruct a relations r that does satisfy the multivalued dependency byadding tuples to r.

    Fourth Normal Form

    A relation schemaR is in 4NF with respect to a setD of functional andmultivalued dependencies if for all multivalued dependencies inD+ ofthe form , where R and R, at least one of the followinghold:

    is trivial (i.e., or = R) is a superkey for schemaR

    If a relation is in 4NF it is in BCNF

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    31/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    31

    Unit Number & Name: III & DATABASE DESIGN Period: 26 of 45

    Page: 2 of 2

    Join Dependencies and Fifth Normal Form

    Join dependencies generalize multivalued dependencieslead to project-join normal form (PJNF) (also called fifthnormal form)

    A class of even more general constraints, leads to a normal form calleddomain-key normal form.Problem with these generalized constraints: are hard to reason with, andno set of sound and complete set of inference rules exists.

    Hence rarely used

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    32/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    32

    Unit Number & Name: IV & TRANSACTIONS Period: 28 of 45

    Page: 2 of 2

    Transaction Concepts - Transaction Recovery

    A transaction is a unitof program execution that accesses and possiblyupdates various data items.E.g. transaction to transfer $50 from account A to account B:

    1. read(A)2. A :=A 503. write(A)4. read(B)

    5. B :=B + 506. write(B)Two main issues to deal with:

    Failures of various kinds, such as hardware failures and systemcrashesConcurrent execution of multiple transactions

    Example of Fund Transfer

    n Transaction to transfer $50 from account A to account B:1. read(A)

    2. A :=A 503. write(A)4. read(B)5. B :=B + 506. write(B)

    Atomicity requirement

    if the transaction fails after step 3 and before step 6, money will belost leading to an inconsistent database state

    Failure could be due to software or hardware

    the system should ensure that updates of a partially executedtransaction are not reflected in the databaseDurability requirement once the user has been notified that thetransaction has completed (i.e., the transfer of the $50 has taken place),the updates to the database by the transaction must persist even if thereare software or hardware failures.n Transaction to transfer $50 from account A to account B:

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    33/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    33

    1. read(A)2. A :=A 503. write(A)

    4. read(B)5. B :=B + 506. write(B)Consistency requirement in above example:

    the sum of A and B is unchanged by the execution of thetransaction

    In general, consistency requirements include Explicitly specified integrity constraints such as primary

    keys and foreign keys Implicit integrity constraints

    e.g. sum of balances of all accounts, minus sum ofloan amounts must equal value of cash-in-hand

    A transaction must see a consistent database.During transaction execution the database may be temporarilyinconsistent.When the transaction completes successfully the database must beconsistent

    Erroneous transaction logic can lead to inconsistency

    Isolation requirement if between steps 3 and 6, another transactionT2 is allowed to access the partially updated database, it will see aninconsistent database (the sum A + B will be less than it should be).

    T1 T2

    1. read(A)2. A :=A 503. write(A)

    read(A), read(B), print(A+B)4. read(B)5. B :=B + 50

    6. write(BIsolation can be ensured trivially by running transactions serially

    that is, one after the other.However, executing multiple transactions concurrently has significant

    benefits, as we will see later.

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    34/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    34

    Unit Number & Name: IV & TRANSACTIONS Period: 29 of 45

    Page: 2 of 2

    ACID Properties

    A transaction is a unit of program execution that accesses and possiblyupdates various data items.To preserve the integrity of data the databasesystem must ensure:

    Atomicity. Either all operations of the transaction are properly reflectedin the database or none are.Consistency. Execution of a transaction in isolation preserves the

    consistency of the database.Isolation. Although multiple transactions may execute concurrently,each transaction must be unaware of other concurrently executingtransactions. Intermediate transaction results must be hidden from otherconcurrently executed transactions.

    That is, for every pair of transactions Ti and Tj, it appears to Ti thateitherTj, finished execution before Ti started, orTj startedexecution afterTi finished.

    Durability. After a transaction completes successfully, the changes ithas made to the database persist, even if there are system failures.

    System Recovery Media Recovery

    Failure ClassificationStorage StructureRecovery and AtomicityLog-Based RecoveryShadow PagingRecovery With Concurrent Transactions

    Buffer ManagementFailure with Loss of Nonvolatile StorageAdvanced Recovery TechniquesARIES Recovery AlgorithmRemote Backup Systems

    Failure Classification

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    35/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    35

    Transaction failure :Logical errors: transaction cannot complete due to some internalerror condition

    System errors: the database system must terminate an activetransaction due to an error condition (e.g., deadlock)System crash: a power failure or other hardware or software failurecauses the system to crash.

    Fail-stop assumption: non-volatile storage contents are assumedto not be corrupted by system crash

    Database systems have numerous integrity checks toprevent corruption of disk data

    Disk failure: a head crash or similar disk failure destroys all or part ofdisk storage

    Destruction is assumed to be detectable: disk drives use checksumsto detect failures

    Recovery Algorithms

    Recovery algorithms are techniques to ensure database consistency andtransaction atomicity and durability despite failures

    Focus of this chapterRecovery algorithms have two parts

    Actions taken during normal transaction processing to ensureenough information exists to recover from failures

    Actions taken after a failure to recover the database contents to astate that ensures atomicity, consistency and durabilityRecovery and Atomicity

    Modifying the database without ensuring that the transaction will commitmay leave the database in an inconsistent state.Consider transaction Ti that transfers $50 from accountA to accountB;goal is either to perform all database modifications made by Ti or none atall.Several output operations may be required forTi (to outputA andB). A

    failure may occur after one of these modifications have been made butbefore all of them are made.

    To ensure atomicity despite failures, we first output informationdescribing the modifications to stable storage without modifying thedatabase itself.

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    36/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    36

    We study two approaches:log-based recovery, andshadow-paging

    We assume (initially) that transactions run serially, that is, one after theother.

    Two Phase Commit - Save Points

    Lock-Based Protocols

    A lock is a mechanism to control concurrent access to a data itemData items can be locked in two modes :

    1. exclusive (X) mode. Data item can be both read as well aswritten. X-lock is requested using lock-X instruction.

    2. shared (S) mode. Data item can only be read. S-lock isrequested using lock-S instruction.

    Lock requests are made to concurrency-control manager. Transaction canproceed only after request is granted.Lock-compatibility matrixA transaction may be granted a lock on an item if the requested lock iscompatible with locks already held on the item by other transactionsAny number of transactions can hold shared locks on an item,

    but if any transaction holds an exclusive on the item no other

    transaction may hold any lock on the item.If a lock cannot be granted, the requesting transaction is made to wait tillall incompatible locks held by other transactions have been released. Thelock is then granted.

    Example of a transaction performing locking:T2: lock-S(A);

    read (A);unlock(A);lock-S(B);

    read (B);unlock(B);display(A+B)

    Locking as above is not sufficient to guarantee serializability ifA andB get updated in-between the read ofA andB, the displayed sum wouldbe wrong.

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    37/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    37

    A locking protocol is a set of rules followed by all transactions whilerequesting and releasing locks. Locking protocols restrict the set of

    possible schedules.

    The Two-Phase Locking Protocol

    This is a protocol which ensures conflict-serializable schedules.Phase 1: Growing Phase

    transaction may obtain lockstransaction may not release locks

    Phase 2: Shrinking Phasetransaction may release lockstransaction may not obtain locks

    The protocol assures serializability. It can be proved that the transactionscan be serialized in the order of theirlock points (i.e. the point where atransaction acquired its final lock).

    Two-phase locking does notensure freedom from deadlocksCascading roll-back is possible under two-phase locking. To avoid this,follow a modified protocol called strict two-phase locking. Here atransaction must hold all its exclusive locks till it commits/aborts.Rigorous two-phase locking is even stricter: here alllocks are held till

    commit/abort. In this protocol transactions can be serialized in the orderin which they commit.There can be conflict serializable schedules that cannot be obtained if

    two-phase locking is used.However, in the absence of extra information (e.g., ordering of access todata), two-phase locking is needed for conflict serializability in thefollowing sense:

    Given a transaction Ti that does not follow two-phase locking, we can finda transaction Tj that uses two-phase locking, and a schedule forTi and Tjthat is not conflict serializable.

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    38/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    38

    Unit Number & Name: IV & TRANSACTIONS Period: 32 of 45

    Page: 2 of 2

    SQL Facilities for recovery

    Transaction Definition in SQL

    Data manipulation language must include a construct for specifying theset of actions that comprise a transaction.In SQL, a transaction begins implicitly.A transaction in SQL ends by:

    Commit workcommits current transaction and begins a new one.Rollback workcauses current transaction to abort.

    In almost all database systems, by default, every SQL statement alsocommits implicitly if it executes successfullyImplicit commit can be turned off by a database directive

    E.g. in JDBC, connection.setAutoCommit(false);

    Concurrency Need for Concurrency

    Lock-Based ProtocolsTimestamp-Based ProtocolsValidation-Based ProtocolsMultiple GranularityMultiversion SchemesInsert and Delete OperationsConcurrency in Index Structures

    Lock-Based ProtocolsA lock is a mechanism to control concurrent access to a data itemData items can be locked in two modes :

    1. exclusive (X) mode. Data item can be both read as well aswritten. X-lock is requested using lock-X instruction.

    2. shared (S) mode. Data item can only be read. S-lock isrequested using lock-S instruction.

    Lock requests are made to concurrency-control manager. Transaction can proceedonly after request is granted.

    Graph-Based ProtocolsGraph-based protocols are an alternative to two-phase locking

    Impose a partial ordering on the set D = {d1, d2 ,..., dh} of all data items.

    Ifdi dj then any transaction accessing both di and dj must access di beforeaccessing dj.

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    39/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    39

    Implies that the set D may now be viewed as a directed acyclic graph, called adatabase graph.

    The tree-protocolis a simple kind of graph protocol.

    Tree Protocol

    1. Only exclusive locks are allowed.2. The first lock by Ti may be on any data item. Subsequently, a data Q can be

    locked by Ti only if the parent ofQ is currently locked by Ti.3. Data items may be unlocked at any time.4. A data item that has been locked and unlocked by Ti cannot subsequently be

    relocked by Ti

    Graph-Based Protocols (Cont.)The tree protocol ensures conflict serializability as well as freedom from deadlock.Unlocking may occur earlier in the tree-locking protocol than in the two-phaselocking protocol.

    shorter waiting times, and increase in concurrencyprotocol is deadlock-free, no rollbacks are required

    DrawbacksProtocol does not guarantee recoverability or cascade freedom

    Need to introduce commit dependencies to ensure recoverabilityTransactions may have to lock data items that they do not access.

    increased locking overhead, and additional waiting time potential decrease in concurrency

    Schedules not possible under two-phase locking are possible under tree protocol, andvice versa.

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    40/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    40

    Unit Number & Name: IV & TRANSACTIONS Period: 34 of 45

    Page: 2 of 2

    Locking Protocols Two Phase Locking Intent Locking

    Lock-Based Protocols

    A lock is a mechanism to control concurrent access to a data itemData items can be locked in two modes :

    1. exclusive (X) mode. Data item can be both read as well aswritten. X-lock is requested using lock-X instruction.

    2. shared (S) mode. Data item can only be read. S-lock isrequested using lock-S instruction.

    Lock requests are made to concurrency-control manager. Transaction canproceed only after request is granted.Lock-compatibility matrixA transaction may be granted a lock on an item if the requested lock iscompatible with locks already held on the item by other transactionsAny number of transactions can hold shared locks on an item,

    but if any transaction holds an exclusive on the item no othertransaction may hold any lock on the item.

    If a lock cannot be granted, the requesting transaction is made to wait tillall incompatible locks held by other transactions have been released. The

    lock is then granted.

    Example of a transaction performing locking:T2: lock-S(A);

    read (A);unlock(A);lock-S(B);read (B);unlock(B);

    display(A+B)Locking as above is not sufficient to guarantee serializability ifA and

    B get updated in-between the read ofA andB, the displayed sum wouldbe wrong.A locking protocol is a set of rules followed by all transactions whilerequesting and releasing locks. Locking protocols restrict the set of

    possible schedules.

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    41/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    41

    The Two-Phase Locking Protocol

    This is a protocol which ensures conflict-serializable schedules.Phase 1: Growing Phasetransaction may obtain lockstransaction may not release locks

    Phase 2: Shrinking Phasetransaction may release lockstransaction may not obtain locks

    The protocol assures serializability. It can be proved that the transactionscan be serialized in the order of theirlock points (i.e. the point where atransaction acquired its final lock).

    Two-phase locking does notensure freedom from deadlocksCascading roll-back is possible under two-phase locking. To avoid this,follow a modified protocol called strict two-phase locking. Here atransaction must hold all its exclusive locks till it commits/aborts.Rigorous two-phase locking is even stricter: here alllocks are held tillcommit/abort. In this protocol transactions can be serialized in the orderin which they commit.

    There can be conflict serializable schedules that cannot be obtained if

    two-phase locking is used.However, in the absence of extra information (e.g., ordering of access todata), two-phase locking is needed for conflict serializability in thefollowing sense:

    Given a transaction Ti that does not follow two-phase locking, we can finda transaction Tj that uses two-phase locking, and a schedule forTi and Tjthat is not conflict serializable.

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    42/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    42

    Unit Number & Name: IV & TRANSACTIONS Period: 35 of 45

    Page: 2 of 2

    Deadlock- Serializability Recovery Isolation Levels

    System is deadlocked if there is a set of transactions such that everytransaction in the set is waiting for another transaction in the set.

    Deadlock preventionprotocols ensure that the system will neverenterinto a deadlock state. Some prevention strategies :

    Require that each transaction locks all its data items before itbegins execution (predeclaration).Impose partial ordering of all data items and require that atransaction can lock data items only in the order specified by the

    partial order (graph-based protocol).

    More Deadlock Prevention Strategies

    DDeeaaddlloocckkHHaannddlliinngg

    n Consider the following two transactions:T1: write (X) T2: write(Y)

    write(Y) write(X)

    n Schedule with deadlock

    T1 T2

    lock-X onXwrite (X)

    lock-X on Ywrite (X)wait forlock-X onX

    wait forlock-X on Y

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    43/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    43

    Following schemes use transaction timestamps for the sake of deadlockprevention alone.

    wait-die scheme non-preemptiveolder transaction may wait for younger one to release data item.Younger transactions never wait for older ones; they are rolled

    back instead.a transaction may die several times before acquiring needed dataitem

    wound-wait scheme preemptiveolder transaction wounds (forces rollback) of younger transactioninstead of waiting for it. Younger transactions may wait for olderones.

    may be fewer rollbacks than wait-die scheme.

    Both in wait-die and in wound-waitschemes, a rolled back transactions isrestarted with its original timestamp. Older transactions thus have

    precedence over newer ones, and starvation is hence avoided.Timeout-Based Schemes :

    a transaction waits for a lock only for a specified amount of time.After that, the wait times out and the transaction is rolled back.thus deadlocks are not possible

    simple to implement; but starvation is possible. Also difficult todetermine good value of the timeout interval.

    Deadlock Detection

    Deadlocks can be described as a wait-for graph, which consists of a pairG = (V,E),

    Vis a set of vertices (all the transactions in the system)Eis a set of edges; each element is an ordered pairTiTj.

    IfTi Tj is inE, then there is a directed edge from Ti to Tj, implyingthat Ti is waiting forTj to release a data item.

    When Ti requests a data item currently being held by Tj, then the edge TiTj is inserted in the wait-for graph. This edge is removed only when Tj isno longer holding a data item needed by Ti.The system is in a deadlock state if and only if the wait-for graph has acycle. Must invoke a deadlock-detection algorithm periodically to lookfor cycles.

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    44/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    44

    Deadlock Recovery

    When deadlock is detected :Some transaction will have to rolled back (made a victim) to breakdeadlock. Select that transaction as victim that will incur

    minimum cost.Rollback -- determine how far to roll back transaction

    Total rollback: Abort the transaction and then restart it. More effective to roll back transaction only as far as

    necessary to break deadlock.Starvation happens if same transaction is always chosen as victim.Include the number of rollbacks in the cost factor to avoidstarvation

    Wait-for graph without a cycle Wait-for graph with a cycle

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    45/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    45

    Unit Number & Name: V & IMPLEMENTATION TECHNIQUES Period: 37 of 45

    Page: 2 of 2

    Overview of Physical Storage Media

    Several types of data storage exist in most computer systems. They vary in speedof access, cost per unit of data, and reliability.

    Cache: most costly and fastest form of storage. Usually very small, and managedby the operating system.

    Main Memory (MM): the storage area for data available to be operated on. General-purpose machine instructions operate on main memory. Contents of main memory are usually lost in a power failure or ``crash''. Usually too small (even with megabytes) and too expensive to store the

    entire database. Flash memory: EEPROM (electrically erasable programmable read-only

    memory). Data in flash memory survive from power failure. Reading data from flash memory takes about 10 nano-secs (roughly as fast as

    from main memory), and writing data into flash memory is morecomplicated: write-once takes about 4-10 microsecs.

    To overwrite what has been written, one has to first erase the entire bank ofthe memory. It may support only a limited number of erase cycles( to ).

    It has found its popularity as a replacement for disks for storing smallvolumes of data (5-10 megabytes).

    Magnetic-disk storage:primary medium for long-term storage. Typically the entire database is stored on disk. Data must be moved from disk to main memory in order for the data to be

    operated on. After operations are performed, data must be copied back to disk if any

    changes were made. Disk storage is called direct access storage as it is possible to read data on

    the disk in any order (unlike sequential access). Disk storage usually survives power failures and system crashes.

    Optical storage: CD-ROM (compact-disk read-only memory), WORM (write-once read-many) disk (for archival storage of data), and Juke box (containing a

    few drives and numerous disks loaded on demand). Tape Storage: used primarily for backup and archival data.

    Cheaper, but much slower access, since tape must be read sequentially fromthe beginning.

    Used as protection from disk failures! The storage device hierarchy is presented in Figure 10.1, where the higher levels

    are expensive (cost per bit), fast (access time), but the capacity is smaller.

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    46/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    46

    Figure 10.1: Storage-device hierarchy

    Another classification: Primary, secondary, and tertiary storage.1. Primary storage: the fastest storage media, such as cash and main memory.2. Secondary (or on-line) storage: the next level of the hierarchy, e.g., magnetic

    disks.3. Tertiary (or off-line) storage: magnetic tapes and optical disk juke boxes. Volatility of storage. Volatile storage loses its contents when the power is

    removed. Without power backup, data in the volatile storage (the part of thehierarchy from main memory up) must be written to nonvolatile storage forsafekeeping.

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    47/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    47

    Unit Number & Name: V & IMPLEMENTATION TECHNIQUES Period: 38 of 45

    Page: 2 of 2

    Magnetic Disks RAID

    RAID (an acronym forredundant array of independent disks; originally redundant

    array of inexpensive disks[1][2]) is a storage technology that combines multiple disk

    drivecomponents into a logical unit. Data is distributed across the drives in one of several

    ways called "RAID levels", depending on what level of redundancy and performance

    (via parallel communication) is required.

    RAID is an example of storage virtualization and was first defined by David A.

    Patterson, Garth A. Gibson, and Randy Katz at the University of California, Berkeley in

    1987.[3] Marketers representing industry RAID manufacturers later attempted to reinvent

    the term to describe a redundant array of independent disks as a means of dissociating alow-cost expectation from RAID technology.[4]

    RAID is now used as an umbrella term for computer data storage schemes that can divide

    and replicate data among multiple physical drives. The physical drives are said to be in a

    RAID,[5] which is accessed by the operating system as one single drive. The different

    schemes or architectures are named by the word RAID followed by a number (e.g., RAID

    0, RAID 1). Each scheme provides a different balance between two key goals:

    increase data reliability and increase input/output performance.

    A number of standard schemes have evolved which are referred to as levels. There were

    five RAID levels originally conceived, but many more variations have evolved, notably

    severalnested levels and many non-standard levels (mostly proprietary). RAID levels andtheir associated data formats are standardised by SNIA in the Common RAID Disk Drive

    Format (DDF) standard.

    Following is a brief textual summary of the most commonly used RAID levels.[6]

    RAID 0 (block-level striping without parity or mirroring) has no (or zero)redundancy. It provides improved performance and additional storage but no faulttolerance. Hence simple stripe sets are normally referred to as RAID 0. Any drivefailure destroys the array, and the likelihood of failure increases with more drivesin the array (at a minimum, catastrophic data loss is almost twice as likelycompared to single drives without RAID). A single drive failure destroys theentire array because when data is written to a RAID 0 volume, the data is broken

    into fragments called blocks. The number of blocks is dictated by the stripe size,which is a configuration parameter of the array. The blocks are written to theirrespective drives simultaneously on the same sector. This allows smaller sectionsof the entire chunk of data to be read off the drive in parallel, increasingbandwidth. RAID 0 does not implement error checking, so any error isuncorrectable. More drives in the array means higher bandwidth, but greater riskof data loss.

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    48/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    48

    InRAID 1 (mirroring without parity or striping), data is written identically tomultiple drives, thereby producing a "mirrored set"; at least 2 drives are requiredto constitute such an array. While more constituent drives may be employed,many implementations deal with a maximum of only 2; of course, it might be

    possible to use such a limited level 1 RAID itself as a constituent of a level 1RAID, effectively masking the limitation.[citation needed] The array continues tooperate as long as at least one drive is functioning. With appropriate operatingsystem support, there can be increased read performance, and only a minimalwrite performance reduction; implementing RAID 1 with a separate controller foreach drive in order to perform simultaneous reads (and writes) is sometimescalled multiplexing(orduplexingwhen there are only 2 drives).

    InRAID 2 (bit-level striping with dedicated Hamming-code parity), all diskspindle rotation is synchronized, and data is striped such that each sequential bit ison a different drive.Hamming-code parity is calculated across corresponding bitsand stored on at least one parity drive.

    InRAID 3 (byte-level striping with dedicated parity), all disk spindle rotation issynchronized, and data is striped so each sequential byte is on a different drive.Parity is calculated across corresponding bytes and stored on a dedicated paritydrive.

    RAID 4 (block-level striping with dedicated parity) is identical to RAID 5 (seebelow), but confines all parity data to a single drive. In this setup, files may bedistributed between multiple drives. Each drive operates independently, allowingI/O requests to be performed in parallel. However, the use of a dedicated paritydrive could create a performancebottleneck; because the parity data must bewritten to a single, dedicated parity drive for each block of non-parity data, theoverall write performance may depend a great deal on the performance of this

    parity drive. RAID 5 (block-level striping with distributed parity) distributes parity along with

    the data and requires all drives but one to be present to operate; the array is notdestroyed by a single drive failure. Upon drive failure, any subsequent reads canbe calculated from the distributed parity such that the drive failure is masked fromthe end user. However, a single drive failure results in reduced performance of theentire array until the failed drive has been replaced and the associated data rebuilt.Additionally, there is the potentially disastrousRAID 5 write hole.

    RAID 6(block-level striping with double distributed parity) provides faulttolerance of two drive failures; the array continues to operate with up to two faileddrives. This makes larger RAID groups more practical, especially for high-

    availability systems. This becomes increasingly important as large-capacity driveslengthen the time needed to recover from the failure of a single drive. Single-parity RAID levels are as vulnerable to data loss as a RAID 0 array until the faileddrive is replaced and its data rebuilt; the larger the drive, the longer the rebuildtakes. Double parity gives additional time to rebuild the array without the databeing at risk if a single additional drive fails before the rebuild is complete.

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    49/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    49

    Unit Number & Name: V & IMPLEMENTATION TECHNIQUES Period: 39 of 45

    Page: 2 of 2

    Tertiary storage File Organization Organization of Records in Files

    Optical Disks

    1. CD-ROM has become a popular medium for distributing software, multimediadata, and other electronic published information.

    2. Capacity of CD-ROM: 500 MB. Disks are cheap to mass produce and alsodrives.

    3. CD-ROM: much longer seek time (250m-sec), lower rotation speed (400 rpm),leading to high latency and lower data-transfer rate (about 150 KB/sec). Drives

    spins at audio CD spin speed (standard) is available.4. Recently, a new optical format, digit video disk (DVD) has become standard.

    These disks hold between 4.7 and 17 GB data.5. WORM (write-once, read many) disks are popular for archival storage of data

    since they have a high capacity (about 500 MB), longer life time than HD, andcan be removed from drive -- good for audit trail (hard to tamper).

    Magnetic Tapes6. Long history, slow, and limited to sequential access, and thus are used for

    backup, storage for infrequent access, and off-line medium for system transfer.7. Moving to the correct spot may take minutes, but once positioned, tape drives

    can write data at density and speed approaching to those of disk drives.8. 8mm tape drive has the highest density, and we store 5 GB data on a 350-foot

    tape.9. Popularly used for storage of large volumes of data, such as video, image, or

    remote sensing data.

    File organization is the methodology which is applied to structured computer files. Filescontain computer records which can be documents or information which is stored in acertain way for later retrieval. File organization refers primarily to the logicalarrangement of data (which can itself be organized in a system of records with correlation

    between the fields/columns) in a file system. It should not be confused with the physicalstorage of the file in some types of storage media. There are certain basic types ofcomputer file, which can include files stored as blocks of data and streams of data, wherethe information streams out of the file while it is being read until the end of the file isencountered.We will look at two components of file organization here:

    1. The way the internal file structure is arranged and

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    50/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    50

    2. The external file as it is presented to the O/S or program that calls it. Here wewill also examine the concept of file extensions.

    We will examine various ways that files can be stored and organized. Files are presentedto the application as a stream of bytes and then an EOF (end of file) condition.

    A program that uses a file needs to know the structure of the file and needs to interpret itscontents.

    Internal File Structure

    Methods and Design Paradigm

    It is a high-level design decision to specify a system of file organization for a computersoftware program or a computer system designed for a particular purpose. Performance is

    high on the list of priorities for this design process, depending on how the file is beingused. The design of the file organization usually depends mainly on the systemenvironment. For instance, factors such as whether the file is going to be used fortransaction-oriented processes like OLTP or Data Warehousing, or whether the file isshared among various processes like those found in a typical distributed system orstandalone. It must also be asked whether the file is on a network and used by a numberof users and whether it may be accessed internally or remotely and how often it isaccessed.However, all things considered the most important considerations might be:

    1. Rapid access to a record or a number of records which are related to each other.2. The Adding, modification, or deletion of records.

    3. Efficiency of storage and retrieval of records.4. Redundancy, being the method of ensuring data integrity.

    A file should be organized in such a way that the records are always available forprocessing with no delay. This should be done in line with the activity and volatility ofthe information.

    Types of File Organization

    Organizing a file depends on what kind of file it happens to be: a file in the simplest formcan be a text file, (in other words a file which is composed of ascii (American StandardCode for Information Interchange) text.) Files can also be created as binary or executabletypes (containing elements other than plain text.) Also, files are keyed with attributes

    which help determine their use by the host operating system.

    Techniques of File Organization

    The three techniques of file organization are:1. Heap (unordered)2. Sorted

    1. Sequential (SAM)

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    51/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    51

    2. Line Sequential (LSAM)3. Indexed Sequential (ISAM)

    3. Hashed or DirectIn addition to the three techniques, there are four methods of organizing files. They

    aresequential, line-sequential, indexed-sequential, inverted list and direct or hashedaccessorganization.

    Sequential Organization

    A sequential file contains records organized in the order they were entered. The order ofthe records is fixed. The records are stored and sorted in physical, contiguous blockswithin each block the records are in sequence.Records in these files can only be read or written sequentially.Once stored in the file, the record cannot be made shorter, or longer, or deleted. However,the record can be updatedif the length does not change. (This is done by replacing the

    records by creating a new file.) New records will always appear at the end of the file.If the order of the records in a file is not important, sequential organization willsuffice, no matter how many records you may have. Sequential output is also useful forreport printing orsequential reads which some programs prefer to do.

    Line-Sequential Organization

    Line-sequential files are like sequential files, except that the records can contain onlycharacters as data. Line-sequential files are maintained by the native byte stream files ofthe operating system.In the COBOL environment, line-sequential files that are created with WRITE statements

    with the ADVANCING phrase can be directed to a printer as well as to a disk.

    Indexed-Sequential Organization

    Key searches are improved by this system too. The single-level indexing structure is thesimplest one where a file, whose records are pairs, contains a key pointer. This pointeristhe position in the data file of the record with the given key. A subset of the records,which are evenly spaced along the data file, is indexed, in order to mark intervals of datarecords.This is how a key search is performed: the search key is compared with the index keys tofind the highest index key coming in front of the search key, while a linear search isperformed from the record that the index key points to, until the search key is matched oruntil the record pointed to by the next index entry is reached. Regardless of double fileaccess (index + data) required by this sort of search, the access time reduction issignificant compared with sequential file searches.Let's examine, for sake of example, a simple linear search on a 1,000 record sequentiallyorganized file. An average of 500 key comparisons are needed (and this assumes thesearch keys are uniformly distributed among the data keys). However, using an indexevenly spaced with 100 entries, the total number of comparisons is reduced to 50 in theindex file plus 50 in the data file: a five to one reduction in the operations count!

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    52/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    52

    Hierarchical extension of this scheme is possible since an index is a sequential file initself, capable of indexing in turn by another second-level index, and so forth and so on.And the exploit of the hierarchical decomposition of the searches more and more, todecrease the access time will pay increasing dividends in the reduction of processing

    time. There is however a point when this advantage starts to be reduced by the increasedcost of storage and this in turn will increase the index access time.Hardware for Index-Sequential Organization is usually Disk-based, rather than tape.Records are physically ordered by primary key. And the index gives the physical locationof each record. Records can be accessed sequentially or directly, via the index. The indexis stored in a file and read into memory at the point when the file is opened. Also, indexesmust be maintained.Life sequential organization the data is stored in physical contiguous box. How ever thedifference is in the use of indexes. There are three areas in the disc storage:

    Primary Area:-Contains file records stored by key or ID numbers. Overflow Area:-Contains records area that cannot be placed in primary area.

    Index Area:-It contains keys of records and there locations on the disc.

    Inverted List

    In file organization, this is a file that is indexed on many of the attributes of the dataitself. The inverted list method has a single index for each key type. The records are notnecessarily stored in a sequence. They are placed in the are data storage area, but indexesare updated for the record keys and location.Here's an example, in a company file, an index could be maintained for all products,another one might be maintained forproduct types. Thus, it is faster to search the indexesthan every record. These types of file are also known as "inverted

    indexes."Nevertheless, inverted list files use more media space and the storage devicesget full quickly with this type of organization. The benefits are apparent immediatelybecause searching is fast. However, updating is much slower.Content-based queries in text retrieval systems use inverted indexes as their preferredmechanism. Data items in these systems are usually stored compressedwhich wouldnormally slow the retrieval process, but the compression algorithm will be chosen tosupport this technique.When querying a file there are certain circumstances when the query is designed tobe modalwhich means that rules are set which require that different information be heldin the index. Here's an example of this modality: when phrase querying is undertaken, theparticular algorithm requires that offsets to word classifications are held in addition to

    document numbers.

    ]Direct or Hashed Access

    With direct or hashed access a portion of disk space is reserved and a hashingalgorithm computes the record address. So there is additional space required for this kindof file in the store. Records are placed randomly through out the file. Records areaccessed by addresses that specify their disc location. Also, this type of file organization

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    53/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    53

    requires a disk storage rather than tape. It has an excellent search retrieval performance,but care must be taken to maintain the indexes. If the indexes become corrupt, what is leftmay as well go to the bit-bucket, so it is as well to have regular backups of this kind offile just as it is for all stored valuable data!

    External File Structure and File Extensions

    Microsoft Windows and MS-DOS File Systems

    The external structure of a file depends on whether it is being created ona FAT or NTFSpartition. The maximum filename length on a NTFS partition is 256characters, and 11 characters on FAT (8 character name+"."+3 characterextension.) NTFS filenames keep their case, whereas FAT filenames have no concept ofcase (but case is ignored when performing a search under NTFS Operating System).Also, there is the new VFAT which permits 256 character filenames.

    UNIX and Apple Macintosh File Systems

    The concept of directories and files is fundamental to the UNIX operating system.On Microsoft Windows-based operating systems, directories are depicted asfolders andmoving about is accomplished by clicking on the different icons. In UNIX, the directoriesare arranged as a hierarchy with the root directorybeing at the top of the tree.The rootdirectory is always depicted as /. Within the / directory, there are subdirectories(e.g.: etc and sys). Files can be written to any directory depending on the permissions.Files can be readable, writable and/orexecutable.

    Organizing files using Libraries

    With the advent of Microsoft Windows 7 the concept of file organization and

    management has improved drastically by way of use of powerful tool called Libraries. ALibrary is file organization system to bring together related files and folders stored indifferent locations of the local as well as network computer such that these can beaccessed centrally through a single access point. For instance, various images stored indifferent folders in the local computer or/and across a computer network can beaccumulated in an Image Library. Aggregation of similar files can be manipulated, sortedor accessed conveniently as and when required through a single access point on acomputer desktop by use of a Library. This feature is particularly very useful foraccessing similar content of related content, and also, for managing projects using relatedand common data.

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    54/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    54

    Unit Number & Name: V & IMPLEMENTATION TECHNIQUES Period: 40 of 45

    Page: 2 of 2

    Indexing and Hashing Ordered Indices

    Indexing mechanisms used to speed up access to desireddata. E.g. author catalog in library_ Search key attribute or set of attributes used to look uprecords in a file._ An index file consists of records (called index entries) of theformsearch-key pointer_ Index files are typically much smaller than the original file

    _ Two basic kinds of indices: Ordered indices: search keys are stored in sorted order Hash indices: search keys are distributed uniformly acrossbuckets using a hash function

    Index Evaluation MetricsIndexing techniques evaluated on basis of:_ Access types supported efficiently. E.g., records with a specified value in an attribute or records with an attribute value falling in a specified rangeof values.

    _ Access time_ Insertion time_ Deletion time_ Space overhead

    Ordered Indices_ In an ordered index, index entries are stored sorted on thesearch key value. E.g., author catalog in library._ Primary index: in a sequentially ordered file, the index whosesearch key specifies the sequential order of the file. Also called clustering index

    The search key of a primary index is usually but notnecessarily the primary key._ Secondary index: an index whose search key specifies anorder different from the sequential order of the file. Also callednon-clustering index._ Index-sequential file: ordered sequential file with a primaryindex.

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    55/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    55

    Unit Number & Name: V & IMPLEMENTATION TECHNIQUES Period: 41 of 45

    Page: 2 of 2

    B+ tree Index Files B tree Index Files

    B+-tree indices are an alternative to indexed-sequential files._ Disadvantage of indexed-sequential files: performancedegrades as file grows, since many overflow blocks getcreated. Periodic reorganization of entire file is required._ Advantage of B+-tree index files: automatically reorganizesitself with small, local, changes, in the face of insertions anddeletions. Reorganization of entire file is not required to

    maintain performance._ Disadvantage of B+-trees: extra insertion and deletionoverhead, space overhead._ Advantages of B+-trees outweigh disadvantages, and they areused extensively.

    A B+-tree is a rooted tree satisfying the following properties:_ All paths from root to leaf are of the same length

    _ Each node that is not a root or a leaf has between dn/ 2e and nchildren._ A leaf node has between d(n 1)/ 2e and n 1 values_ Special cases: if the root is not a leaf, it has at least 2 children.If the root is a leaf (that is, there are no other nodes in thetree), it can have between 0 and (n 1) values.

    B+-Tree Node Structure_ Typical nodeP1 K1 P2 . . . Pn1 Kn1 Pn

    Ki are the search-key values Pi are pointers to children (for non-leaf nodes) or pointers torecords or buckets of records (for leaf nodes)._ The search-keys in a node are orderedK1 < K2 < K3 < ... < Kn 1

    Example of a B+

    http://csetube.co.nr/

    http://csetube.co.nr/
  • 7/27/2019 141405 - DataBase Management Systems

    56/60

    http://

    csetub

    e.co

    .nr/

    GKMCET

    Lecture Plan

    Code & Subject Name: 141405 & Database Management Systems

    56

    -treePerryridgeMianus RedwoodBrighton Downtown Mianus Perryridge Redwood Round Hill

    B+-tree for account file (n = 3)

    Static Hashing Dynamic Hashing

    Static Hashing_ A bucket is a unit of storage containing one or more records (abucket is typically a disk block). In a hash file organization we

    obtain the bucket of a record directly from its search-key valueusing a hash function.

    _ Hash function h is a function from the set of all search-keyvalues K to the set of all bucket addresses B.

    _ Hash function is used to locate records for access, insertion aswell as deletion.

    _ Records with different search-key values may be mapped tothe same bucket; thus entire bucket has to be searchedsequentially to locate a record.

    Dynamic Hashing_ Good for database that grows and shrinks in size_ Allows the hash function to be modified dynamically_ Extendable hashing one form of dynamic hashing Hash function generates values over a large range typically b-bit integers, with b = 32.

    At any time use only a prefix of the hash function to indexinto a table of bucket addresses. Let the length of the prefix

    be i bits, 0 _ i _ 32 Initially i = 0 Value of i grows and shrinks as the