open source database management system€¦ · dbms is a software tool to organize (create,...

131
Open source DATABASE MANAGEMENT SYSTEM

Upload: others

Post on 19-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

  • Open source DATABASE

    MANAGEMENT SYSTEM

  • UNIT- I

    DATABASE MANAGEMENT SYSTEM INTRODUCTION

    What is DBMS?

    A database management system (DBMS) refers to the technology for creating and managing

    databases. DBMS is a software tool to organize (create, retrieve, update, and manage) data in a

    database.Knowledge refers to the useful use of information. As you know, that information can

    be transported, stored, and shared without any problems and difficulties, but the same cannot be

    said about knowledge. Knowledge necessarily involves personal experience and

    practice.Database systems are meant to handle an extensive collection of information.

    Management of data involves both defining structures for storage of information and providing

    mechanisms that can do the manipulation of those stored information. Moreover, the database

    system must ensure the safety of the information stored, despite system crashes or attempts at

    unauthorized access.

    USES OF DBMS

    • To develop software applications In less time.

    • Data independence and efficient use of data.

    • For uniform data administration.

    • For data integrity and security.

    • For concurrent access to data, and data recovery from crashes.

    • To use user-friendly declarative query language.

    Where is a Database Management System (DBMS) being Used?

    • Airlines: reservations, schedules, etc

    • Telecom: calls made, customer details, network usage, etc

    • Universities: registration, results, grades, etc

    • Sales: products, purchases, customers, etc

    • Banking: all transactions etc

  • Advantages of DBMS

    A DBMS manages data and has many benefits. These are:

    • Data independence: Application programs should be as free or independent as possible from

    details of data representation and storage. DBMS can supply an abstract view of the data for

    insulating application code from such facts.

    • Efficient data access: DBMS utilizes a mixture of sophisticated concepts and techniques for

    storing and retrieving data competently. This feature becomes important in cases where the data

    is stored on external storage devices.

    • Data integrity and security: If data is accessed through the DBMS, the DBMS can enforce

    integrity constraints on the data.

    • Data administration: When several users share the data, integrating the administration of data

    can offer significant improvements. Experienced professionals understand the nature of the data

    being managed and can be responsible for organizing the data representation to reduce

    redundancy and make the data to retrieve efficiently.

    Components of DBMS

    • Users: Users may be of any kind such as DB administrator, System developer, or database

    users.

    • Database application: Database application may be Departmental, Personal, organization's and /

    or Internal.

    • DBMS: Software that allows users to create and manipulate database access,

    • Database: Collection of logical data as a single unit.

  • Data Models

    Data Model is the modeling of the data description, data semantics, and consistency

    constraints of the data. It provides the conceptual tools for describing the design of a database at

    each level of data abstraction. Therefore, there are following four data models used for

    understanding the structure of the database:

    1) Relational Data Model: This type of model designs the data in the form of rows and columns

    within a table. Thus, a relational model uses tables for representing data and in-between

    relationships. Tables are also called relations. This model was initially described by Edgar F.

    Codd, in 1969. The relational data model is the widely used model which is primarily used by

    commercial data processing applications.

    2) Entity-Relationship Data Model: An ER model is the logical representation of data as

    objects and relationships among them. These objects are known as entities, and relationship is an

    association among these entities. This model was designed by Peter Chen and published in 1976

    papers. It was widely used in database designing. A set of attributes describe the entities. For

    example, student_name, student_id describes the 'student' entity. A set of the same type of

    entities is known as an 'Entity set', and the set of the same type of relationships is known as

    'relationship set'.

    3) Object-based Data Model: An extension of the ER model with notions of functions,

    encapsulation, and object identity, as well. This model supports a rich type system that includes

    structured and collection types. Thus, in 1980s, various database systems following the object-

    oriented approach were developed. Here, the objects are nothing but the data carrying its

    properties.

  • 4) Semistructured Data Model: This type of data model is different from the other three data

    models (explained above). The semistructured data model allows the data specifications at places

    where the individual data items of the same type may have different attributes sets. The

    Extensible Markup Language, also known as XML, is widely used for representing the

    semistructured data. Although XML was initially designed for including the markup information

    to the text document, it gains importance because of its application in the exchange of data.

    INTRODUCTION TO SQL

    SQL

    o SQL stands for Structured Query Language. It is used for storing and managing data in

    relational database management system (RDMS).

    o It is a standard language for Relational Database System. It enables a user to create, read,

    update and delete relational databases and tables.

    o All the RDBMS like MySQL, Informix, Oracle, MS Access and SQL Server use SQL as

    their standard database language.

    o SQL allows users to query the database in a number of ways, using English-like

    statements.

    Rules:

    SQL follows the following rules:

    o Structure query language is not case sensitive. Generally, keywords of SQL are written in

    uppercase.

    o Statements of SQL are dependent on text lines. We can use a single SQL statement on

    one or multiple text line.

    o Using the SQL statements, you can perform most of the actions in a

    o depends on tuple relational calculus and relational algebra.

    SQL process:

    o When an SQL command is executing for any RDBMS, then the system figure out the

    best way to carry out the request and the SQL engine determines that how to interpret the

    task.

    o In the process, various components are included. These components can be optimization

    Engine, Query engine, Query dispatcher, classic, etc.

    o All the non-SQL queries are handled by the classic query engine, but SQL query engine

    won't handle logical files.

  • Characteristics of SQL

    o SQL is easy to learn.

    o SQL is used to access data from relational database management systems.

    o SQL can execute queries against the database.

    o SQL is used to describe the data.

    o SQL is used to define the data in the database and manipulate it when needed.

    o SQL is used to create and drop the database and table.

    o SQL is used to create a view, stored procedure, function in a database.

    o SQL allows users to set permissions on tables, procedures, and views.

  • Integrity Constraints

    o Integrity constraints are a set of rules. It is used to maintain the quality of information.

    o Integrity constraints ensure that the data insertion, updating, and other processes have to

    be performed in such a way that data integrity is not affected.

    o Thus, integrity constraint is used to guard against accidental damage to the database.

    Types of Integrity Constraint

    1. Domain constraints

    o Domain constraints can be defined as the definition of a valid set of values for an

    attribute.

    o The data type of domain includes string, character, integer, time, date, currency, etc. The

    value of the attribute must be available in the corresponding domain.

  • Example:

    2. Entity integrity constraints

    o The entity integrity constraint states that primary key value can't be null.

    o This is because the primary key value is used to identify individual rows in relation and if

    the primary key has a null value, then we can't identify those rows.

    o A table can contain a null value other than the primary key field.

    Example:

    3. Referential Integrity Constraints

    o A referential integrity constraint is specified between two tables.

    o In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary

    Key of Table 2, then every value of the Foreign Key in Table 1 must be null or be

    available in Table 2.

  • Example:

    4. Key constraints

    o Keys are the entity set that is used to identify an entity within its entity set uniquely.

    o An entity set can have multiple keys, but out of which one key will be the primary key. A

    primary key can contain a unique and null value in the relational table.

    Example:

  • Relational Model concept

    Relational model can represent as a table with columns and rows. Each row is known as a tuple.

    Each table of the column has a name or attribute.

    Domain: It contains a set of atomic values that an attribute can take.

    Attribute: It contains the name of a column in a particular table. Each attribute Ai must have a

    domain, dom(Ai)

    Relational instance: In the relational database system, the relational instance is represented by a

    finite set of tuples. Relation instances do not have duplicate tuples.

    Relational schema: A relational schema contains the name of the relation and name of all

    columns or attributes.

    Relational key: In the relational key, each row has one or more attributes. It can identify the row

    in the relation uniquely.

    In the given table, NAME, ROLL_NO, PHONE_NO, ADDRESS, and AGE are the

    attributes.

    NAME ROLL_NO PHONE_NO ADDRESS AGE

    Ram 14795 7305758992 Noida 24

    Shyam 12839 9026288936 Delhi 35

    Laxman 33289 8583287182 Gurugram 20

    Mahesh 27857 7086819134 Ghaziabad 27

    Ganesh 17282 9028 9i3988 Delhi 40

  • o The instance of schema STUDENT has 5 tuples.

    o t3 =

    Properties of Relations

    o Name of the relation is distinct from all other relations.

    o Each relation cell contains exactly one atomic (single) value

    o Each attribute contains a distinct name

    o Attribute domain has no significance

    o tuple has no duplicate value

    o Order of tuple can have a different sequence

  • UNIT -II

    What is a Database Transaction?

    A transaction is a logical unit of processing in a DBMS which entails one or more database

    access operation. In a nutshell, database transactions represent real-world events of any

    enterprise.

    All types of database access operation which are held between the beginning and end transaction

    statements are considered as a single logical transaction. During the transaction the database is

    inconsistent. Only once the database is committed the state is changed from one consistent state

    to another.

    In this tutorial, you will learn:

    • What is a Database Transaction?

    • Facts about Database Transactions

    • Why do you need concurrency in Transactions?

    • States of Transactions

    • What are ACID Properties?

    • Types of Transactions

    • What is a Schedule?

    Facts about Database Transactions

    • A transaction is a program unit whose execution may or may not change the contents of a

    database.

    • The transaction is executed as a single unit

    • If the database operations do not update the database but only retrieve data, this type of

    transaction is called a read-only transaction.

    • A successful transaction can change the database from one CONSISTENT STATE to

    another

    • DBMS transactions must be atomic, consistent, isolated and durable

    • If the database were in an inconsistent state before a transaction, it would remain in the

    inconsistent state after the transaction.

    Why do you need concurrency in Transactions?

    A database is a shared resource accessed. It is used by many users and processes concurrently.

    For example, the banking system, railway, and air reservations systems, stock market

    monitoring, supermarket inventory, and checkouts, etc.

    Not managing concurrent access may create issues like:

    https://www.guru99.com/images/1/100518_0500_DBMSTransac1.pnghttps://www.guru99.com/dbms-transaction-management.html#1https://www.guru99.com/dbms-transaction-management.html#2https://www.guru99.com/dbms-transaction-management.html#3https://www.guru99.com/dbms-transaction-management.html#4https://www.guru99.com/dbms-transaction-management.html#5https://www.guru99.com/dbms-transaction-management.html#6https://www.guru99.com/dbms-transaction-management.html#7

  • • Hardware failure and system crashes

    • Concurrent execution of the same transaction, deadlock, or slow performance

    States of Transactions

    The various states of a Database Transaction are listed below

    State Transaction types

    Active State A transaction enters into an active state when the execution process

    begins. During this state read or write operations can be performed.

    Partially

    Committed

    A transaction goes into the partially committed state after the end of a

    transaction.

    Committed State When the transaction is committed to state, it has already completed its

    execution successfully. Moreover, all of its changes are recorded to the

    database permanently.

    Failed State A transaction considers failed when any one of the checks fails or if the

    transaction is aborted while it is in the active state.

    Terminated State State of transaction reaches terminated state when certain transactions

    which are leaving the system can't be restarted.

    State Transition Diagram for a Database Transaction

    Let's study a state transition diagram that highlights how a transaction moves between these

    various states.

    1. Once a transaction states execution, it becomes active. It can issue READ or WRITE operation.

    2. Once the READ and WRITE operations complete, the transactions becomes partially committed state.

    3. Next, some recovery protocols need to ensure that a system failure will not result in an inability to record changes in the transaction permanently. If this check is a success, the

    transaction commits and enters into the committed state.

    https://www.guru99.com/images/1/100518_0500_DBMSTransac6.png

  • 4. If the check is a fail, the transaction goes to the Failed state. 5. If the transaction is aborted while it's in the active state, it goes to the failed state. The

    transaction should be rolled back to undo the effect of its write operations on the

    database.

    6. The terminated state refers to the transaction leaving the system.

    What are ACID Properties?

    For maintaining the integrity of data, the DBMS system you have to ensure ACID properties.

    ACID stands for Atomicity, Consistency, Isolation, and Durability.

    • Atomicity: A transaction is a single unit of operation. You either execute it entirely or do

    not execute it at all. There cannot be partial execution.

    • Consistency: Once the transaction is executed, it should move from one consistent state

    to another.

    • Isolation: Transaction should be executed in isolation from other transactions (no

    Locks). During concurrent transaction execution, intermediate transaction results from

    simultaneously executed transactions should not be made available to each other. (Level

    0,1,2,3)

    • Durability: · After successful completion of a transaction, the changes in the database

    should persist. Even in the case of system failures.

  • Example of ACID

    Transaction 1: Begin X=X+50, Y = Y-50 END

    Transaction 2: Begin X=1.1*X, Y=1.1*Y END

    Transaction 1 is transferring $50 from account X to account Y.

    Transaction 2 is crediting each account with a 10% interest payment.

    If both transactions are submitted together, there is no guarantee that the Transaction 1 will

    execute before Transaction 2 or vice versa. Irrespective of the order, the result must be as if the

    transactions take place serially one after the other.

    Types of Transactions

    Based on Application areas

    • Non-distributed vs. distributed

    • Compensating transactions

    • Transactions Timing

    • On-line vs. batch

    Based on Actions

    • Two-step

    • Restricted

    • Action model

    Based on Structure

    • Flat or simple transactions: It consists of a sequence of primitive operations executed

    between a begin and end operations.

    • Nested transactions: A transaction that contains other transactions.

    • Workflow

    What is a Schedule?

    A Schedule is a process creating a single group of the multiple parallel transactions and

    executing them one by one. It should preserve the order in which the instructions appear in each

    transaction. If two transactions are executed at the same time, the result of one transaction may

    affect the output of other.

    Example

    Initial Product Quantity is 10

    Transaction 1: Update Product Quantity to 50

  • Transaction 2: Read Product Quantity

    If Transaction 2 is executed before Transaction 1, outdated information about the product

    quantity will be read. Hence, schedules are required.

    Parallel execution in a database is inevitable. But, Parallel execution is permitted when there is

    an equivalence relation amongst the simultaneously executing transactions. This equivalence is

    of 3 Types.

    RESULT EQUIVALENCE:

    If two schedules display the same result after execution, it is called result equivalent schedule.

    They may offer the same result for some value and different results for another set of values. For

    example, one transaction updates the product quantity, while other updates customer details.

    What is Concurrency Control?

    Concurrency control is the procedure in DBMS for managing simultaneous operations without

    conflicting with each another. Concurrent access is quite easy if all users are just reading data.

    There is no way they can interfere with one another. Though for any practical database, would

    have a mix of reading and WRITE operations and hence the concurrency is a challenge.

    Concurrency control is used to address such conflicts which mostly occur with a multi-user

    system. It helps you to make sure that database transactions are performed concurrently without

    violating the data integrity of respective databases.

    Therefore, concurrency control is a most important element for the proper functioning of a

    system where two or multiple database transactions that require access to the same data, are

    executed simultaneously.

    Lock-based Protocols

    A lock is a data variable which is associated with a data item. This lock signifies that operations

    that can be performed on the data item. Locks help synchronize access to the database items by

    concurrent transactions.

    All lock requests are made to the concurrency-control manager. Transactions proceed only once

    the lock request is granted.

    Binary Locks: A Binary lock on a data item can either locked or unlocked states.

    Shared/exclusive: This type of locking mechanism separates the locks based on their uses. If a

    lock is acquired on a data item to perform a write operation, it is called an exclusive lock.

  • 1. Shared Lock (S):

    A shared lock is also called a Read-only lock. With the shared lock, the data item can be shared

    between transactions. This is because you will never have permission to update data on the data

    item.

    For example, consider a case where two transactions are reading the account balance of a person.

    The database will let them read by placing a shared lock. However, if another transaction wants

    to update that account's balance, shared lock prevent it until the reading process is over.

    2. Exclusive Lock (X):

    With the Exclusive Lock, a data item can be read as well as written. This is exclusive and can't

    be held concurrently on the same data item. X-lock is requested using lock-x instruction.

    Transactions may unlock the data item after finishing the 'write' operation.

    For example, when a transaction needs to update the account balance of a person. You can

    allows this transaction by placing X lock on it. Therefore, when the second transaction wants to

    read or write, exclusive lock prevent this operation.

    3. Simplistic Lock Protocol

    This type of lock-based protocols allows transactions to obtain a lock on every object before

    beginning operation. Transactions may unlock the data item after finishing the 'write' operation.

    4. Pre-claiming Locking

    Pre-claiming lock protocol helps to evaluate operations and create a list of required data items

    which are needed to initiate an execution process. In the situation when all locks are granted, the

    transaction executes. After that, all locks release when all of its operations are over.

    Log-Based Recovery

    o The log is a sequence of records. Log of each transaction is maintained in some stable

    storage so that if any failure occurs, then it can be recovered from there.

    o If any operation is performed on the database, then it will be recorded in the log.

    o But the process of storing the logs should be done before the actual transaction is applied

    in the database.

    Let's assume there is a transaction to modify the City of a student. The following logs are written

    for this transaction.

    o When the transaction is initiated, then it writes 'start' log.

  • 1.

    o When the transaction modifies the City from 'Noida' to 'Bangalore', then another log is

    written to the file.

    1.

    o When the transaction is finished, then it writes another log to indicate the end of the

    transaction.

    1.

    There are two approaches to modify the database:

    1. Deferred database modification:

    o The deferred modification technique occurs if the transaction does not modify the

    database until it has committed.

    o In this method, all the logs are created and stored in the stable storage, and the database is

    updated when a transaction commits.

    2. Immediate database modification:

    o The Immediate modification technique occurs if database modification occurs while the

    transaction is still active.

    o In this technique, the database is modified immediately after every operation. It follows

    an actual database modification.

    Recovery using Log records

    When the system is crashed, then the system consults the log to find which transactions need to

    be undone and which need to be redone.

    1. If the log contains the record and or , then the

    Transaction Ti needs to be redone.

    2. If log contains record but does not contain the record either or

    , then the Transaction Ti needs to be undone.

    Crash Recovery

    DBMS is a highly complex system with hundreds of transactions being executed every second.

    The durability and robustness of a DBMS depends on its complex architecture and its

    underlying hardware and system software. If it fails or crashes amid transactions, it is expected

    that the system would follow some sort of algorithm or techniques to recover lost data.

  • Failure Classification

    To see where the problem has occurred, we generalize a failure into various categories, as

    follows −

    Transaction failure

    A transaction has to abort when it fails to execute or when it reaches a point from where it can’t

    go any further. This is called transaction failure where only a few transactions or processes are

    hurt.

    Reasons for a transaction failure could be −

    • Logical errors − Where a transaction cannot complete because it has some code error or

    any internal error condition.

    • System errors − Where the database system itself terminates an active transaction

    because the DBMS is not able to execute it, or it has to stop because of some system

    condition. For example, in case of deadlock or resource unavailability, the system aborts

    an active transaction.

    System Crash

    There are problems − external to the system − that may cause the system to stop abruptly and

    cause the system to crash. For example, interruptions in power supply may cause the failure of

    underlying hardware or software failure.

    Examples may include operating system errors.

    Disk Failure

    In early days of technology evolution, it was a common problem where hard-disk drives or

    storage drives used to fail frequently.

    Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or any

    other failure, which destroys all or a part of disk storage.

    Storage Structure

    We have already described the storage system. In brief, the storage structure can be divided into

    two categories −

    • Volatile storage − As the name suggests, a volatile storage cannot survive system

    crashes. Volatile storage devices are placed very close to the CPU; normally they are

    embedded onto the chipset itself. For example, main memory and cache memory are

    examples of volatile storage. They are fast but can store only a small amount of

    information.

  • • Non-volatile storage − These memories are made to survive system crashes. They are

    huge in data storage capacity, but slower in accessibility. Examples may include hard-

    disks, magnetic tapes, flash memory, and non-volatile (battery backed up) RAM.

    Recovery and Atomicity

    When a system crashes, it may have several transactions being executed and various files

    opened for them to modify the data items. Transactions are made of various operations, which

    are atomic in nature. But according to ACID properties of DBMS, atomicity of transactions as a

    whole must be maintained, that is, either all the operations are executed or none.

    When a DBMS recovers from a crash, it should maintain the following −

    • It should check the states of all the transactions, which were being executed.

    • A transaction may be in the middle of some operation; the DBMS must ensure the

    atomicity of the transaction in this case.

    • It should check whether the transaction can be completed now or it needs to be rolled

    back.

    • No transactions would be allowed to leave the DBMS in an inconsistent state.

    There are two types of techniques, which can help a DBMS in recovering as well as maintaining

    the atomicity of a transaction −

    • Maintaining the logs of each transaction, and writing them onto some stable storage

    before actually modifying the database.

    • Maintaining shadow paging, where the changes are done on a volatile memory, and later,

    the actual database is updated.

    Log-based Recovery

    Log is a sequence of records, which maintains the records of actions performed by a transaction.

    It is important that the logs are written prior to the actual modification and stored on a stable

    storage media, which is failsafe.

    Log-based recovery works as follows −

    • The log file is kept on a stable storage media.

    • When a transaction enters the system and starts execution, it writes a log about it.

    • When the transaction modifies an item X, it write logs as follows −

    It reads Tn has changed the value of X, from V1 to V2.

    • When the transaction finishes, it logs −

  • The database can be modified using two approaches −

    • Deferred database modification − All logs are written on to the stable storage and the

    database is updated when a transaction commits.

    • Immediate database modification − Each log follows an actual database modification.

    That is, the database is modified immediately after every operation.

    Recovery with Concurrent Transactions

    When more than one transaction are being executed in parallel, the logs are interleaved. At the

    time of recovery, it would become hard for the recovery system to backtrack all logs, and then

    start recovering. To ease this situation, most modern DBMS use the concept of 'checkpoints'.

    Checkpoint

    Keeping and maintaining logs in real time and in real environment may fill out all the memory

    space available in the system. As time passes, the log file may grow too big to be handled at all.

    Checkpoint is a mechanism where all the previous logs are removed from the system and stored

    permanently in a storage disk. Checkpoint declares a point before which the DBMS was in

    consistent state, and all the transactions were committed.

    Structures Used for Database Recovery Several structures of an Oracle database safeguard data against possible failures. The following

    sections briefly introduce each of these structures and its role in database recovery.

    Database Backups A database backup consists of operating system backups of the physical files that constitute an

    Oracle database. To begin database recovery from a media failure, Oracle uses file backups to

    restore damaged datafiles or control files.

    Oracle offers several options in performing database backups; see Chapter 23, "Database

    Backup", for more information.

    The Redo Log The redo log, present for every Oracle database, records all changes made in an Oracle database.

    The redo log of a database consists of at least two redo log files that are separate from the

    datafiles (which actually store a database's data). As part of database recovery from an instance

    or media failure, Oracle applies the appropriate changes in the database's redo log to the

    datafiles, which updates database data to the instant that the failure occurred.

    https://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch23.htm

  • A database's redo log can be comprised of two parts: the online redo log and the archived redo

    log, discussed in the following sections.

    The Online Redo Log Every Oracle database has an associated online redo log. The online redo

    log works with the Oracle background process LGWR to immediately record all changes made

    through the associated instance. The online redo log consists of two or more pre-allocated files

    that are reused in a circular fashion to record ongoing database changes; see "The Online

    Redo Log" for more information.

    The Archived (Offline) Redo Log Optionally, you can configure an Oracle database to archive

    files of the online redo log once they fill. The online redo log files that are archived are uniquely

    identified and make up the archived redo log. By archiving filled online redo log files, older redo

    log information is preserved for more extensive database recovery operations, while the pre-

    allocated online redo log files continue to be reused to store the most current database changes;

    see "The Archived Redo Log" page 22-16 for more information.

    Rollback Segments

    Rollback segments are used for a number of functions in the operation of an Oracle database. In

    general, the rollback segments of a database store the old values of data changed by ongoing

    transactions (that is, uncommitted transactions). Among other things, the information in a

    rollback segment is used during database recovery to "undo" any "uncommitted" changes applied

    from the redo log to the datafiles. Therefore, if database recovery is necessary, the data is in a

    consistent state after the rollback segments are used to remove all uncommitted data from the

    datafiles; see "Rollback Segments" for more information.

    Control Files

    In general, the control file(s) of a database store the status of the physical structure of the

    database. Certain status information in the control file (for example, the current online redo log

    file, the names of the datafiles, and so on) guides Oracle during instance or media recovery;

    see "Control Files" for more information.

    Checkpoint Explanation

    o The checkpoint is a type of mechanism where all the previous logs are removed from the

    system and permanently stored in the storage disk.

    o The checkpoint is like a bookmark. While the execution of the transaction, such

    checkpoints are marked, and the transaction is executed then using the steps of the

    transaction, the log files will be created.

    o When it reaches to the checkpoint, then the transaction will be updated into the database,

    and till that point, the entire log file will be removed from the file. Then the log file is

    updated with the new step of transaction till next checkpoint and so on.

    https://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch22.htm#onlineredologhttps://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch22.htm#onlineredologhttps://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch22.htm#onlineredologhttps://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch22.htm#offlineredologhttps://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch22.htm#offlineredologhttps://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch3.htm#rbsegs%20-%20sectiohttps://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch22.htm#controlsectionhttps://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch3.htm#rbsegs%20-%20sectiohttps://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch22.htm#controlsection

  • o The checkpoint is used to declare a point before which the DBMS was in the consistent

    state, and all transactions were committed.

    Recovery using Checkpoint

    In the following manner, a recovery system recovers the database from this failure:

    o The recovery system reads log files from the end to start. It reads log files from T4 to T1.

    o Recovery system maintains two lists, a redo-list, and an undo-list.

    o The transaction is put into redo state if the recovery system sees a log with

    and or just . In the redo-list and their previous list, all the

    transactions are removed and then redone before saving their logs.

    o For example: In the log file, transaction T2 and T3 will have and . The T1 transaction will have only in the log file. That's why the

    transaction is committed after the checkpoint is crossed. Hence it puts T1, T2 and T3

    transaction into redo list.

    o The transaction is put into undo state if the recovery system sees a log with

    but no commit or abort log found. In the undo-list, all the transactions are undone, and

    their logs are removed.

    o For example: Transaction T4 will have . So T4 will be put into undo list

    since this transaction is not yet complete and failed amid.

  • Media Recovery

    If you restore the archived redo log files and data files, then you must perform media recovery

    before you can open the database. Any database transactions in the archived redo log files not

    reflected in the data files are applied to the data files, bringing them to a transaction-consistent

    state before the database is opened.

    Media recovery requires a control file, data files (typically restored from backup), and online and

    archived redo log files containing changes since the time the data files were backed up. Media

    recovery is most often used to recover from media failure, such as the loss of a file or disk, or a

    user error, such as the deletion of the contents of a table.

    Media recovery can be a complete recovery or a point-in-time recovery. Complete recovery can

    apply to individual datafiles, tablespaces, or the entire database. Point-in-time recovery applies to

    the whole database (and also sometimes to individual tablespaces, with automation help from

    Oracle Recover Manager (RMAN)).

    In a complete recovery, you restore backup data files and apply all changes from the archived

    and online redo log files to the data files. The database is returned to its state at the time of

    failure and can be opened with no loss of data.

    In a point-in-time recovery, you return a database to its contents at a user-selected time in the

    past. You restore a backup of data files created before the target time and a complete set of

    archived redo log files from backup creation through the target time. Recovery applies changes

    between the backup time and the target time to the data files. All changes after the target time are

    discarded.

    RMAN enables you to perform both a complete and a point-in-time recovery of your database.

    However, this documentation focuses on complete recovery.

  • UNIT 3

    OBJECT BASED DATABASE AND XML

    What is structure data type?

    Structure type

    A structured data type is a compound data type which falls under user-defined category and used

    for grouping simple data types or other compound data types. This contains a sequence of

    member variable names along with their type/attributes and they are enclosed within curl

    brackets.

    struct < struct name > {

    < type > < member >;

    };

    Need for Struct data types

    There are some situations when we need to group different types of variables in one group. Let's

    see one situation here- we want to store the name, roll and age of a student.

    unsigned int student_roll; char student_name [MAX_STRING]; unsigned int student_age;

    Here we have a logical grouping between there three variables but still these three variables are

    scattered. We are accessing three different variables for storing attribute values of a single

    student. Now how to group this inside one logical entity, like three variables for a single student

    grouped inside one variable. This type of grouping is called structure. One point to note here is

    that array can group only same type elements and here structure has different types.

    Example of Struct data types

    Let's define these again with a structure type

    struct student_t { unsigned int roll; char name[MAX_STRING]; unsigned int age; };

  • struct student_t student1; student1.roll = ; strcpy (student1.name, ); student1.age =

    Structure size and memory layout

    Structure is user defined type to group of different type of variables of either compiler

    defined legacy types or other user defined types or mixed. Individual entity of a structure

    element is called member. Members inside a structure are placed sequentially next to next

    in the memory layout. Thus minimum size of a structure is the sum total of all sizes of

    members, with considering padding.

    Struct data types syntax

    struct { ; ; ... ; };

    Structure demo example

    /* Structure type demo example program */ #include /* Structure type student */ struct student { char name[100]; char dept[100]; int rollno; float marks; }; /* Structure type main routine */

  • int main (int argc, char *argv[]) { /* declare struct variable */ struct student s1; printf("\nEnter the name, dept, roll number and marks of student:\n"); scanf("%s %s %d %f", s1.name, s1.dept, &s1.rollno, &s1.marks); printf("\nThe name, dept, roll number and marks of the student are:"); printf("\n%s %s %d %.2f",s1.name,s1.dept,s1.rollno,s1.marks); }

    Program output

    Enter the name, dept, roll number and marks of student:

    Student1 ECE 1 96.5

    The name, dept, roll number and marks of the student are:

    Student1 ECE 1 96.50

    OPERATIONS ON STRUCTURED DATA

    Structured data is the data which conforms to a data model, has a well define structure, follows

    a consistent order and can be easily accessed and used by a person or a computer program.

    Structured data is usually stored in well-defined schemas such as Databases. It is generally

    tabular with column and rows that clearly define its attributes.

    SQL (Structured Query language) is often used to manage structured data stored in databases.

    Characteristics of Structured Data: • Data conforms to a data model and has easily identifiable structure

    • Data is stored in the form of rows and columns

    Example : Database

    • Data is well organised so, Definition, Format and Meaning of data is explicitly known

    • Data resides in fixed fields within a record or file

    • Similar entities are grouped together to form relations or classes

    • Entities in the same group have same attributes

    • Easy to access and query, So data can be easily used by other programs

    • Data elements are addressable, so efficient to analyse and process

  • Sources of Structured Data: • SQL Databases

    • Spreadsheets such as Excel

    • OLTP Systems

    • Online forms

    • Sensors such as GPS or RFID tags

    • Network and Web server logs

    • Medical devices.

    Advantages of Structured Data: • Structured data have a well defined structure that helps in easy storage and access of data

    • Data can be indexed based on text string as well as attributes. This makes search

    operation hassle-free

    • Data mining is easy i.e knowledge can be easily extracted from data

    • Operations such as Updating and deleting is easy due to well structured form of data

    • Business Intelligence operations such as Data warehousing can be easily undertaken

    • Easily scalable in case there is an increment of data

    • Ensuring security to data is easy

    ENCAPSULATION AND ADTs

    An object-oriented database must provide support for all data types not just the built in data

    types such as character, integer, and float. To understand abstract data types lets take two steps

    back by taking off the abstract and then the data from abstract data type. We now have a type, a

    type would be defined as a collection of a type values. A simple example of this is the Integer

    type, it consists of values 0, 1, 2, 3, etc. If we add the word data back in we would define data

    type as a type and the set of operations that will manipulate the type. If we expand off our

    integer example, a data type would be an integer variable, an integer variable is a member of the

    integer data type. Addition, subtraction, and multiplication are examples of operations that can

    be performed on the integer data type.

    If we now add the word abstract back in we can define an abstract data type (ADT) as a data

    type, that is a type and the set of operations that will manipulate the type. The set of operations

    are only defined by their inputs and outputs. The ADT does not specify how the data type will

    be implemented, all of the ADT's details are hidden from the user of the ADT. This process of

    hiding the details is called encapsulation. If we extend the example for the integer data type to

    an abstract data type, the operations might be delete an integer, add an integer, print an integer,

    and check to see if a certain integer exists. Notice that we do not care how the operation will be

    done but simply how do invoke the operation.

    Let's start by looking at traditional programming languages and the data types that they use.

    Traditional languages are based on text and numerical data types, and you are limited to what

    kinds of data types that the programming language will support. Variables that are used by the

    programming language have to be defined using one of the supported data types. OT has done

  • away with the restrictions of just using these built in data types and allows you to create different

    data types. Once these new data types are defined they are treated the same way as built in data

    types. The ability to create new data types when needed and then use these data types is called

    data abstraction, and the new data types are called abstract data types (ADTs).

    An abstract data type is more than a set of values. When used to create an object, it can also

    have method attached to it, and the details of these methods are hidden from the user. Data

    abstraction and ADT's are a cornerstone for OT because they can be created as needed, and this

    helps you to think of and design computer systems to more accurately reflect the way data types

    are represented in the real world.

    One of the main reasons why hierarchical, network and relational databases are being replaced is

    their failure to support ADT's. These traditional databases have very strict rules foe the layout of

    data and simply are not flexible enough to handle ADT's.

    Encapsulation

    Encapsulation gathers the data and methods of an object and puts them into a package, creating a

    well defined boundary around the object. Encapsulation is often referred to as information

    hiding, and encapsulation can be used to restrict which users and what operations can be

    performed against the data inside the object.

    Classes provide encapsulation or information hiding by access control. A class will grant or

    deny access to its objects using the public and private access specifiers. Public members define

    an interface between a class and the users of that class. Public members can be accessed by any

    function in a program. Objects can contain both public and private variables,

    the public variables are used with the objects methods or interfaces.

    Private variables are only known to the object, and cannot be accessed by an interface. For

    example a private method might be used to compute an internal value.

    Encapsulation can be used in non-database object-oriented applications to guarantee that all

    operations are done via the methods that the programmer has defined in the class definition,

    insuring that data can not be changed outside of its own pre-defined methods. However,

    declarative database languages such as SQL allows what might be called ?declarative" retrieval

    and updates of data, and does not follow the rules of encapsulation. This is called an impedance

    mismatch, and is inconsistent with object-oriented database management.

    As an example, in a relational database we could define a behavior called ADD_ORDER which

    will check to see if there is enough product in inventory for the order. The order object will not

    be created if there was not enough product in inventory. This behavior will make sure that no

    order is placed for product that is unavailable. However in a relational database, you could use

    SQL and bypass this validity check and thereby add an invalid order into the database.

  • INHERITANCE

    Inheritance enables you to share attributes between objects such that a subclass inherits attributes

    from its parent class. OracleAS TopLink provides several methods to preserve inheritance

    relationships, and enables you to override mappings that are specified in a superclass, or to map

    attributes that are not mapped in the superclass. Subclasses must include the same database field

    (or fields) as the parent class for their primary key (although the primary key can have different

    names in these two tables). As a result, when you are mapping relationships to a subclass stored

    in a separate table, the subclass table must include the parent table primary key, even if the

    subclass primary key differs from the parent primary key.

    This section describes OracleAS TopLink inheritance, and introduces several topics and

    techniques to leverage inheritance in your own applications, including:

    • Understanding Object Inheritance

    • Representing Inheritance in the Database

    • Class Types

    • Class Indicators

    • Class Extraction Methods

    • Entity Bean Inheritance Restrictions

    For more information about implementing inheritance in code, see "Implementing Inheritance in

    Java".

    Understanding Object Inheritance

    Consider a simple database used by a courier company. It contains registration information for

    three types of vehicles: trucks, cars, and bicycles. For each vehicle type, your application

    requires the following information:

    • VID (Vehicle Identification)

    • LastMaint (mileage since last maintenance)

    • LoadCap (load capacity)

    If these are all the attributes shared by all vehicles in the application, then these attributes must

    all appear in the super class, Vehicle. You can then build subclasses for each of the vehicle types

    that reflects their differences. For example, the Truck class may have an attribute indicating

    whether the local department of transportation considers it to be a commercial vehicle

    (NumAxles), the Car class may require a NumPass (number of passengers) attribute, and the

    Bicycle class, by virtue of its more limited range, may require a Location attribute. Through

    inheritance, each vehicle automatically inherits the basic vehicle information, but by being

    separate subclasses, also have unique characteristics.

    Figure 3-6 Inheritance in a Courier Application

    https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1131763https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1143147https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1142056https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1131778https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1131783https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1141997https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping008.htm#i1132200https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping008.htm#i1132200

  • Description of the illustration inhrex.gif

    Representing Inheritance in the Database

    You can represent inheritance in the database in one of two ways:

    • Multiple tables that represent the parent class and each child class

    • A single table that comprises the parent and all child classes

    Figure 3-7 Inheritance in the Database in Individual Tables

    Description of the illustration dbinhrt1.gif

    If your database already represents the objects in the inheritance hierarchy this way, you can map

    the objects and relationships without modifying the tables. However, it is most efficient to

    represent all classes from a given inheritance hierarchy in a single table, because it substantially

    reduces the number of table reads and eliminates joins when querying on objects in the

    hierarchy.

    Figure 3-8 Inheritance in the Database in a Single Table

    Description of the illustration dbinhrt2.gif

    To consolidate tables in the database this way, determine the class type of the objects represented

    by the rows in the table. There are two ways to determine class type:

    • If you can add columns to the database table, add a class indicator column that represents

    the vehicle class type (Truck, Car, or Bicycle).

    For more information about class indicators, see "Class Indicators".

    • If you cannot modify the table, build a class extraction method that executes an

    appropriate login to determine the class type.

    For more information about class extraction methods, see "Class Extraction Methods".

    Class Types

    The OracleAS TopLink inheritance hierarchy includes three types of classes:

    https://docs.oracle.com/cd/B14099_19/web.1012/b15901/img_text/inhrex.htmhttps://docs.oracle.com/cd/B14099_19/web.1012/b15901/img_text/dbinhrt1.htmhttps://docs.oracle.com/cd/B14099_19/web.1012/b15901/img_text/dbinhrt2.htmhttps://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1131778https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1131783

  • • Root Class

    • Branch Class

    • Leaf Class

    Figure 3-9 Inheritance Hierarchy Class Types

    Description of the illustration rtbrlf.gif

    Root Class

    The root class stores information for all instantiable classes in its subclass hierarchy. By default,

    queries performed on the root class return instances of the root class and its instantiable

    subclasses. However, you can also configure the root class to return only instances of itself,

    without instances of its subclasses when queried. All class types beneath the root class inherit

    from the root class.

    Branch Class

    Branch classes have a persistent superclass and subclasses. By default, queries performed on the

    branch class return instances of the branch class and any of its subclasses. As with the root class,

    you can configure the branch class to return only instances of itself, without instances of its

    subclasses when queried. All classes below the branch class inherit attributes from the branch

    class, including any attributes the branch class inherits from classes above it in the hierarchy.

    Leaf Class

    Leaf classes have a persistent superclass in the hierarchy, but do not have subclasses. Queries

    performed on the leaf class return only instances of the leaf class.

    What is Object?

    Object consists of entity and attributes which can describe the state of real world object and

    action associated with that object.

    Characteristics of Object

    https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1163782https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1163786https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1163790https://docs.oracle.com/cd/B14099_19/web.1012/b15901/img_text/rtbrlf.htm

  • Some important characteristics of an object are:

    1. Object name

    • The name is used to refer different objects in the program.

    2. Object identifier

    • This is the system generated identifier which is assigned, when a new object is created.

    3. Structure of object

    • Structure defines, how the object is constructed using constructor.

    • In object oriented database the state of complex object can be constructed from other objects by

    using certain type of constructor.

    • The formal way of representing objects as (i,c,v) where 'i' is object identifier, 'c' is type

    constructor and 'v' is current value of an object.

    4. Transient object

    • In OOPL, objects which are present only at the time of execution are called as transient object.

    For example: Variables in OOPL

    5. Persistent objects

    • An object which exists even after the program is completely executed (or terminated), is called

    as persistent objects. Object-oriented databases can store objects in secondary memory.

    Object identity

    • Every object has unique identity. In an object oriented system, when object is created OID is

    assigned to it.

  • • In RDBMS OID is value based and primary key is used to provide uniqueness of each table in

    relation. Primary key is unique only for that relation and not for the entire system. Primary key

    is chosen from the attributes of the relation which makes object independent on the object state.

    • In OODBMS OID are variable name or pointer.

    Properties of OID

    1. Uniqueness: OID cannot be same to every object in the system and it is generated

    automatically by the system.

    2. Invariant: OID cannot be changed throughout its entire lifetime.

    3.Invisible: OID is not visible to user.

    Attributes

    Attributes are nothing but the properties of objects in the system.

    Example: Employee can have attribute 'name' and 'address' with assigned values as:

    Attribute Value

    Name Radha

    Address Pune

    ID 07

    Type of Attributes

    The three types of attributes are as follows:

  • 1. Simple attributes

    Attributes can be of primitive data type such as, integer, string, real etc. which can take literal

    value.

    Example: 'ID' is simple attribute and value is 07.

    2. Complex attributes

    Attributes which consist of collections or reference of other multiple objects are called as

    complex attributes.

    Example: Collection of Employees consists of many employee names.

    3. Reference attributes

    Attributes that represent a relationship between objects and consist of value or collection of

    values are called as reference attributes.

    Example: Manager is reference of staff object.

    The rich variety of data types in an ORDBMS offers a database designer many opportunities for

    a more efficient design. As discussed in previous sections, an ORDBMS supports number of

    much better solution compared to RDBMS and other databases.

    • ORDBMS allows to store the video as an user-defined abstract data type (ADT) object

    and write methods that capture any special manipulation that an user wish to perform. Allowing

    users to define arbitrary new data types is a key feature of ORDBMs. The ORDBMS allows

    users to store and retrieve objects of type jpeg-image which stores a compressed image

    representing a single frame of film, just like an object of any other type, such as integer.

  • Common Implementation Challenges

    There are a range of different issues and challenges that need to be addressed for successful

    program implementation. Some of these challenges are particularly unique to rural communities.

    Common challenges are described below, along with suggestions on how to address these

    challenges:

    • Resources and sustainability: Funding, technological, and human resources are

    typically limited in rural communities. It can be particularly difficult to generate enough

    start-up funds to sustain the program as it begins. Having a network of stakeholders and

    partners in the community may be beneficial for providing resources and support for a

    program.

    • Geographic limitations: Geography influences a number of factors that can challenge

    program implementation and operations (e.g., isolation and weather). Depending on the

    type of program, setting, frequency of participation, and type of activities involved, these

    challenges can become significant. This becomes a particularly important issue when

    there is limited transportation access for the target population. This requires changes in

  • approaches and program design that take into account lengthy travel times, availability of

    transportation, and opportunity to offer the program remotely or through other

    technologies.

    • Recruiting staff: Rural communities that are implementing rural health programs that

    require physicians, dietitians, or physical therapists for example have faced barriers to

    recruiting appropriately trained staff. Some programs work with volunteer or retired

    practitioners, or students.

    • Hard-to-reach populations: The priority population may be highly mobile. For

    example, one rural health program was striving to provide care to two hard-to-reach

    populations: Hispanic poultry workers and migrant farm workers. These populations

    travel from camp to camp during different times each year, making it challenging to

    reach them. Several rural health programs use mobile vans to provide traveling health

    services.

    • Cultural and social issues: A number of challenges to program success arise out of

    unique cultural and social norms that influence expectations about the program and its

    likelihood of success. Examples of these types of issues include:

    o Deeply rooted traditions and cultures around food

    o Lack of trust for medical professionals and outsiders

    o Social beliefs around certain behaviors

    It is critical for program implementers to make a conscious effort to recognize and

    understand the population their program will serve, so they can develop appropriate

    strategies. Involving members from the target population throughout the whole process

    can help achieve cultural competency, encourage participation, and reduce social stigmas.

    Implementers also may need to adapt materials, such as information packets, to ensure all

    program materials are culturally appropriate.

    • Language: Rural health programs may target communities with a large Hispanic or

    immigrant population. Such programs need to ensure that their staff understands the

    importance of providing services or public health education in a culturally appropriate

    manner. In addition, programs may need to either employ staff proficient in Spanish or

    other languages.

    • Keeping the community motivated: Regardless of the community and populations

    targeted in the program efforts, an awareness of health concerns needs to exist and

    individual and organizational commitments are necessary toward making the changes

    needed to address those concerns. It’s important for program planners to understand that

    success will depend on conducting education and outreach efforts to determine

    community members’ expectations about program impact and to motivate them to

    achieve better health outcomes.

  • Difference between RDBMS , ORDBMS and OODBMS

    RDBMS ,ORDBMS AND OODBMS

    Compare RDBMS with ORDBMS.

    S.No RDBMS ORDBMS

    1 Relational Database Management Systems Object – Relational

    Database Systems

    2 Based on Relational Data Model Based on Object Data Model

    (ODM)

    3 Dominant model Gaining popularity

    4

    ORDBMS is an attempt to extend

    relational database systems to

    provide a bridge between the

    relational and object-oriented

    paradigms.

    5 RDBMS support a small, fixed collection

    of data types ( eg. Integers, dates, strings )

    which has proven adequate for traditional

    application domains such as administrative

    data processing

    ORDBMS is based on Object-

    Oriented Database systems and

    Relational Database systems and

    are aimed at application domains

    where complex objects play a

    central role.

    6 Supports Structured Query Language

    ( SQL )

    Supports Object Query Language

    ( OQL )

    SQL : 1999 standard extends SQL

    to incorporate support for the

    object-relational model of data

    7 RDBMS products :

    • IBM’s DB2

    • Informix

    • Oracle

    • Sybase

    Object-oriented model products:

    • Objectstore

    • Versant

    Object-relational model products:

    Used in DBMS products from

  • • Microsoft’s Access

    • Fox Base

    • Paradox

    • Tandem

    • Teradata

    • IBM

    • Informix

    • Objectstore

    • Oracle

    • Versant

    • Others

    8 Supports Standard data types and additional

    data types

    Supports standard data types and

    new richer data types.

    The new richer data types

    supported are

    • User-defined data types that

    supports image, voice and video

    footage and these must be stored in

    the database

    • Inheritance data types to

    inherit the commonality between

    different types (eg. To inherit

    some features of image objects

    while defining compressed image

    objects and low-resolution image

    objects

    • Object Identity data types

    like references or pointers to

    objects (eg video) for giving

    objects a unique object identity,

    which can be used to refer or point

    to them from elsewhere in the data.

    9 Case Scenario : Case Scenario :

    9. Compare the similarities and differences between OODBMS and ORDBMS. In particular

    compare OQL and SQL : 1999 and discuss the underlying data model.

    OODBMS : Object-Oriented Database Management Systems

    ORDBMS : Object-Relational Database Management Systems

    Similarities

  • Both supports user-defined ADTs, structured types, object identity and reference types and

    inheritance.

    Both supports an extended form of SQL. OODBMS support ODL/OQL. ORDBMS support an

    extended form of SQL.

    ORDBMS consciously try to add OODBMS features to an RDBMS and OODBMS in turn have

    developed query language based on relational query languages.

    Both provide DBMS functionality such as concurrency control and recovery.

    Differences

    S.No OODBMS ORDBMS

    1 OODBMSs aim to achieve seamless

    integration with a programming language

    such as C++, Java.

    Such integration is not an important

    goal for an ORDBMS.

    2 An OODBMS is aimed at applications

    where an object-centric viewpoint is

    appropriate.

    An ORDBMS is optimized for

    applications in which large data

    collections are the focus, even though

    objects may have a rich structure and

    be fairly large,

    3 The query facilities of OQL are not

    supported efficiently in most OODBMSs.

    The query facilities are the

    centerpiece of an ORDBMS.

    XML

    XML stands for Extensible Markup Language. It is a set of rules that define tags that break a

    document into parts and identify the parts of the document. These tags define a syntax that can

    then be used in combination with an XSL stylesheet to reconstruct the document.

    The tags that are defined must follow the XML rules, but their content and arrangement can be

    anything the developer wants. A file of XML text, arranged to represent a certain document, is

    called an XML application. Oracle Access Manager OutputXML is an XML application,

    designed to create HTML which will in turn present Oracle Access Manager pages to a browser.

  • Oracle Access Manager also uses XML as a structured way to provide some parameters that

    control its operation. This is a different use than for OutputXML, but since the applications are

    much shorter and the XML syntax rules are followed here as well, one of these files will serve as

    an example. For example, frontpageadminparams.xml has the following content:

    This indented presentation, showing the tag levels, is an automatic feature of Microsoft's Internet

    Explorer. XML editors will also show the file in this way.

    Some important parts of this file are the following:

    This, the XML declaration, is the first line of any well-formed XML application. Internet

    Explorer and some editors will not show the file as formatted XML unless this line is present.

    The starting and ending ? make this an XML processing instruction. version="1.0" is an

    attribute. Attributes are name-value pairs separated by an equals sign, which provide additional

    information for the instruction. Currently there is only one version of XML.

    ParamsCtlg is a tag, which starts the definition of the first element in the XML application. The

    definition ends with the matching closing tag, which has the same form except it uses a / before

    the tag name:

    Everything between the starting and ending tags defines the element ParamsCtlg. Nested within

    it is the element CompoundList, which has elements nested within it, and so on. An important

    attribute is xmlns, which stands for XML namespace.This specifies an owner and possible

    reference source for this XML application. We identify ourselves as creators of this application.

  • The technically precise way to write this element would have been

    ParamName="top_frame" Value="_top"

    However, when the definition is a short one like this, the XML rules allow use of an abbreviated

    closing tag. /> indicates the closing tag for the immediately preceding start tag.

    The attributes ParamName="top_frame" and Value="_top" provide the useful content of the file,

    which is the name of a variable used by Oracle Access Manager and its value.

    XML Schema is commonly known as XML Schema Definition (XSD). It is used to describe

    and validate the structure and the content of XML data. XML schema defines the elements,

    attributes and data types. Schema element supports Namespaces. It is similar to a database

    schema that describes the data in a database.

    Syntax

    You need to declare a schema in your XML document as follows −

    Example

    The following example shows how to use schema −

    The basic idea behind XML Schemas is that they describe the legitimate format that an XML

    document can take.

    Elements

    As we saw in the XML - Elements chapter, elements are the building blocks of XML document.

    An element can be defined within an XSD as follows −

    https://www.tutorialspoint.com/xml/xml_elements.htm

  • Definition Types

    You can define XML schema elements in the following ways −

    Simple Type

    Simple type element is used only in the context of the text. Some of the predefined simple types

    are: xs:integer, xs:boolean, xs:string, xs:date. For example −

    Complex Type

    A complex type is a container for other element definitions. This allows you to specify which

    child elements an element can contain and to provide some structure within your XML

    documents. For example −

    In the above example, Address element consists of child elements. This is a container for

    other definitions, that allows to build a simple hierarchy of elements in the XML

    document.

    Global Types

    With the global type, you can define a single type in your document, which can be used by all

    other references. For example, suppose you want to generalize the person and company for

    different addresses of the company. In such case, you can define a general type as follows −

    Now let us use this type in our example as follows −

  • Instead of having to define the name and the company twice (once for Address1 and once

    for Address2), we now have a single definition. This makes maintenance simpler, i.e., if you

    decide to add "Postcode" elements to the address, you need to add them at just one place.

    Querying and Transformation Given the increasing number of applications that use XML to exchange, mediate, and store data,

    tools for effective management of XML data are becoming increasingly important. In particular,

    tools for querying and transformation of XML data are essential to extract information from

    large bodies of XML data, and to convert data between different representations (schemas) in

    XML. Just as the output of a relational query is a relation, the output of an XML query can be an

    XML document. As a result, querying and transformation can be combined into a single tool.

    Several languages provide increasing degrees of querying and transformation capabilities:

    • XPath is a language for path expressions, and is actually a building block for the remaining two

    query languages.

    • XSLT was designed to be a transformation language, as part of the XSL style sheet system,

    which is used to control the formatting of XML data into HTML or other print or display

    languages. Although designed for formatting, XSLT can generate XML as output, and can

    express many interesting queries. Furthermore, it is currently the most widely available language

    for manipulating XML data.

    • XQuery has been proposed as a standard for querying of XML data. XQuery combines features

    from many of the earlier proposals for querying XML, in particular the language Quilt.

    A tree model of XML data is used in all these languages. An XML document is modeled as

    a tree, with nodes corresponding to elements and attributes. Element nodes can have children

  • nodes, which can be subelements or attributes of the element. Correspondingly, each node

    (whether attribute or element), other than the root element, has a parent node, which is an

    element. The order of elements and attributes in the XML document is modeled by the ordering

    of children of nodes of the tree. The terms parent, child, ancestor, descendant, and siblings are

    interpreted in the tree model of XML data.

    The text content of an element can be modeled as a text node child of the element. Elements

    containing text broken up by intervening subelements can have multiple text node children. For

    instance, an element containing “this is a wonderful book” would have a

    subelement child corresponding to the element bold and two text node children corresponding to

    “this is a” and “book”. Since such structures are not commonly used in database data, we shall

    assume that elements do not contain both text and subelements.

    XPath addresses parts of an XML document by means of path expressions. The lan- guage can

    be viewed as an extension of the simple path expressions in object-oriented and object-relational

    databases (See Section 9.5.1).

    A path expression in XPath is a sequence of location steps separated by “/” (in- stead of the “.”

    operator that separates steps in SQL:1999). The result of a path ex- pression is a set of values.

    For instance, on the document in Figure 10.8, the XPath expression

    would return the same names, but without the enclosing tags.

    Like a directory hierarchy, the initial ’/’ indicates the root of the document. (Note that this is an

    abstract root “above” that is the document tag.) Path expressions are evaluated from

    left to right. As a path expression is evaluated, the result of the path at any point consists of a set

    of nodes from the document.

    When an element name, such as customer, appears before the next ’/’, it refers to all elements of

    the specified name that are children of elements in the current element set. Since multiple

    children can have the same name, the number of nodes in the node set can increase or decrease

    with each step. Attribute values may also be accessed, using the “@” symbol. For instance,

    /bank-2/account/@account-number returns a set of all values of account-number attributes of

    account elements. By default, IDREF links are not followed; we shall see how to deal with

    IDREFs later.

    XPath supports a number of other features:

    • Selection predicates may follow any step in a path, and are contained in square brackets. For

    example,

    http://lh3.googleusercontent.com/-2tEcjprd65w/VUpPP1g2fdI/AAAAAAABqXU/lACKL-Kqt8E/s1600-h/image%255B5%255D.png

  • The Application program interface

    An Application Programming Interface (API) contains software building tools, subroutine

    definitions as well as communication protocols that facilitate interaction between systems. An

    API may be for a database system, operating system, computer hardware or a web-based system.

    An Application Programming Interface makes it simpler to use certain technologies to build

    applications for the programmers. API can include specifications for data structures, variables,

    routines, object classes, remote calls etc.

    A diagram that shows the API in the system is as follows −

    Uses of Application Programming Interfaces

    API’s are useful in many scenarios. Some of these are given in detail as follows −

    Operating Systems

    The interface between an operating system and an application is specified with an API. For

    example- Posix has API’s that can convert an application written for one POSIX Operating

    System to one that can be used on another POSIX operating system.

    Libraries and Frameworks

    Often API’s are related to software libraries. The API describes the behaviour of the system

    while the libraries actually implement that behaviour. A single API can have multiple libraries as

    it can have many different implementations. Sometimes, an API can be linked to a software

    framework as well. A framework is based on many libraries that implement different API’s

    whose behaviour is built into the framework.

    Web APIs

    The application programming interfaces for web servers or web browsers are known as web

    API’s. These web API’s can be server side or client side.

    Server side web APIs have an interface that contains endpoints which lead to request-response

    message systems that are written in JSON or XML. Most of this is achieved using a HTTP web

  • server. Client side web API’s are used to extend the functionality of a web browser. Earlier they

    were in the form of plug-in browser extensions but now JavaScript bindings are used.

    Remote APIs

    The remote application programming interfaces allow the programmers to manipulate remote

    resources. Most remote API’s are required to maintain object abstraction in object oriented

    programming. This can be done by executing a method call locally which then invokes the

    corresponding method call on a remote object and gets the result locally as a return value.

    Release policies for API

    The policies for releasing API’s are private, partner and public. Details about these are given as

    follows −

    Private release policies

    The application programming interfaces released under this policy are for private internal use by

    the company.

    Partner release policies

    The application programming interfaces released under this policy can be used by the company

    and its specific business partners. This means that the companies can control the quality of the

    API, by monitoring the apps which have access to it.

    Public release policies

    The application programming interfaces released under public release policies are freely

    available to the public. Some examples of this are Microsoft Windows API, Apple’s Cocoa and

    Carbon API’s etc.

    Storage of xml data

    Character

    Relational (shredded)

    Native XML

    Character

    Storage options

    ◼ Large character fields in DBMS

    ◼ Flat files

    ◼ .xml files

  • Fast insert & retrieval

    Poor search

    RelationalData still stored as character

    Portions of the data extracted into additional relational tables

    Increased parse time

    Increased search capabilities

    Native XML

    Exclusive XML DBMS

    ◼ Sedna

    ◼ Timber

    Integrated XML DBMS

    ◼ DB2

    ◼ Oracle

    Native XML Benefits

    XML messages stored in their original format

    Documents can be transformed straight from the database via XPath or XSLT.

    Increased search capabilities for documents that must be stored as XML.

    XML Applications

    We've seen a lot of theory in this chapter, so I'm going to spend the rest of this chapter taking a

    look at how XML is used today in the real world. The world of XML is huge these days; in fact,

    XML is now used internally even in Netscape and Microsoft products, as well as installations of

  • programming languages such as Perl. You can find a good list of organizations that produce their

    own XML-based languages.

    It's useful and encouraging to see how XML is being used today in these XML-based languages.

    Here's a new piece of terminology: As you know, XML is a metamarkup language, so it's

    actually used to create languages. The languages so created are applications of XML; as a result,

    they're called XML applications.

    Note that the term XML application means an application of XML to a specific domain, such as

    MathML, the mathematics markup language; it does not refer to a program that uses XML (a fact

    that causes a lot of confusion among people who know nothing about XML).

    Thousands of XML applications are around today, and we'll see some of them here. You can see

    the advantage to various groups when defining their own markup languages. For example,

    physicists or chemists can use the symbols and graphics of their discipline in customized

    browsers. In fact, I'll start with Chemical Markup Language (CML) .

    • Root element is .

    • contains a and an element.

    • contains one or more element.

    • contains one , at least 2 s, and no more than 6 s.

    • One of the answer must have an attribute "correct=y" which indicates the correct

    answer.

    • might appear before of after all the s.

    The Example Quiz

    In which continent is the country Japan located?

    Asia

    Europe

    Africa

    America

    Tuna

    Cow

    Whale

  • Lobster

    Which one cannot swim?

    How many points are on a hexagon?

    5

    6

    7

    8

    • A DTD declaration for that XML spec:

    question))>

    XML with DTD

    Problems:

    • Hard to limit the number of s to 6 maximum. One way will be to declare it

    like:

    (answer, answer, answer)|

    (answer, answer, answer, answer)|

    (answer, answer, answer, answer, answer)|

    (answer, answer, answer, answer,answer,answer)))>

    but do you want to?

    Even with that, you still have to handle the requirement where might appear

    after the s, so it will be:

    (answer, answer, answer)|

    (answer, answer, answer, answer)|

    (answer, answer, answer, answer, answer)|

    (answer, answer, answer, answer,answer,answer)))|

    (((answer, answer)|

    (answer, answer, answer)|

    (answer, answer, answer, answer)|

    https://www.permadi.com/tutorial/xmlExamples/quizDTD.xml

  • (answer, answer, answer, answer, answer)|

    (answer, answer, answer, answer, answer, answer)), question))>

    • The DTD does not limit the number of s that has the "correct" attribute. So

    there might be an with 2 or more correct answers. Can this be solved without

    changing the structure of the XML? Probably not.

    • Create a Schema declaration for that XML spec:

    Attempt 1:

  • Problems:

    • must appear before s in this schema because it's declared in

    .

    • The attribute "correct" can be assigned any values, while we only want to

    accept "y".

    • This schema allows more than 1 correct s. Again, it might not be possible to

    create a schema which prevents this.

    Attempt 2:

  • This schema is much cleaner that before. We used a lot of unnamed types because we don't

    need to reuse the types. "correctType" is a type that contains only 1 valid value, which is "y".

    The element is declared within to allow to appear before or

    after the s.

    Problems:

    • This schema allows more than 1 correct s.

  • UNIT-5

    NOSQL

    INTRODUCTION TO NOSQL:

    A NoSQL originally referring to non SQL or non relational is a database that provides a mechanism

    for storage and retrieval of data. This data is modeled in means other than the tabular relations used

    in relational databases. Such databases came into existence in the late 1960s, but did not obtain the

    NoSQL moniker until a surge of popularity in the early twenty-first century. NoSQL databases are

    used in real-time web applications and big data and their use are increasing over time. NoSQL

    systems are also sometimes called Not only SQL to emphasize the fact that they may support SQL-

    like query languages.

    A NoSQL database includes simplicity of design, simpler horizontal scaling to clusters of machines

    and finer control over availability. The data structures used by NoSQL databases are different from

    those used by default in relational databases which makes some operations faster in NoSQL. The

    suitability of a given NoSQL database depends on the problem it should solve. Data structures used

    by NoSQL databases are sometimes also viewed as more flexible than relational database tables.

  • Many NoSQL stores compromise consistency in favor of availability, speed and partition tolerance.

    Barriers to the greater adoption of NoSQL stores include the use of low-level query languages, lack

    of standardized interfaces, and huge previous investments in existing relational databases. Most

    NoSQL stores lack true ACID(Atomicity, Consistency, Isolation, Durability) transactions but a few

    databases, such as MarkLogic, Aerospike, FairCom c-treeACE, Google Spanner (though technically

    a NewSQL database), Symas LMDB, and OrientDB have made them central to their designs.

    Most NoSQL databases offer a concept of eventual consistency in which database changes are

    propagated to all nodes so queries for data might not return updated data immediately or might result

    in reading data that is not accurate which is a problem known as stale reads. Also some NoSQL

    systems may exhibit lost writes and other forms of data loss. Some NoSQL systems provide

    concepts such as write-ahead logging to avoid data loss. For distributed transaction processing across

    multiple databases, data consistency is an even bigger challenge. This is difficult for both NoSQL

    and relational databases. Even current relational databases do not allow referential integrity

    constraints to span databases. There are few systems that maintain bo