open source database management system€¦ · dbms is a software tool to organize (create,...

Open source DATABASE

MANAGEMENT SYSTEM

UNIT- I

DATABASE MANAGEMENT SYSTEM INTRODUCTION

What is DBMS?

A database management system (DBMS) refers to the technology for creating and managing

databases. DBMS is a software tool to organize (create, retrieve, update, and manage) data in a

database.Knowledge refers to the useful use of information. As you know, that information can

be transported, stored, and shared without any problems and difficulties, but the same cannot be

said about knowledge. Knowledge necessarily involves personal experience and

practice.Database systems are meant to handle an extensive collection of information.

Management of data involves both defining structures for storage of information and providing

mechanisms that can do the manipulation of those stored information. Moreover, the database

system must ensure the safety of the information stored, despite system crashes or attempts at

unauthorized access.

USES OF DBMS

• To develop software applications In less time.

• Data independence and efficient use of data.

• For uniform data administration.

• For data integrity and security.

• For concurrent access to data, and data recovery from crashes.

• To use user-friendly declarative query language.

Where is a Database Management System (DBMS) being Used?

• Airlines: reservations, schedules, etc

• Telecom: calls made, customer details, network usage, etc

• Universities: registration, results, grades, etc

• Sales: products, purchases, customers, etc

• Banking: all transactions etc

Advantages of DBMS

A DBMS manages data and has many benefits. These are:

• Data independence: Application programs should be as free or independent as possible from

details of data representation and storage. DBMS can supply an abstract view of the data for

insulating application code from such facts.

• Efficient data access: DBMS utilizes a mixture of sophisticated concepts and techniques for

storing and retrieving data competently. This feature becomes important in cases where the data

is stored on external storage devices.

• Data integrity and security: If data is accessed through the DBMS, the DBMS can enforce

integrity constraints on the data.

• Data administration: When several users share the data, integrating the administration of data

can offer significant improvements. Experienced professionals understand the nature of the data

being managed and can be responsible for organizing the data representation to reduce

redundancy and make the data to retrieve efficiently.

Components of DBMS

• Users: Users may be of any kind such as DB administrator, System developer, or database

users.

• Database application: Database application may be Departmental, Personal, organization's and /

or Internal.

• DBMS: Software that allows users to create and manipulate database access,

• Database: Collection of logical data as a single unit.

Data Models

Data Model is the modeling of the data description, data semantics, and consistency

constraints of the data. It provides the conceptual tools for describing the design of a database at

each level of data abstraction. Therefore, there are following four data models used for

understanding the structure of the database:

1) Relational Data Model: This type of model designs the data in the form of rows and columns

within a table. Thus, a relational model uses tables for representing data and in-between

relationships. Tables are also called relations. This model was initially described by Edgar F.

Codd, in 1969. The relational data model is the widely used model which is primarily used by

commercial data processing applications.

2) Entity-Relationship Data Model: An ER model is the logical representation of data as

objects and relationships among them. These objects are known as entities, and relationship is an

association among these entities. This model was designed by Peter Chen and published in 1976

papers. It was widely used in database designing. A set of attributes describe the entities. For

example, student_name, student_id describes the 'student' entity. A set of the same type of

entities is known as an 'Entity set', and the set of the same type of relationships is known as

'relationship set'.

3) Object-based Data Model: An extension of the ER model with notions of functions,

encapsulation, and object identity, as well. This model supports a rich type system that includes

structured and collection types. Thus, in 1980s, various database systems following the object-

oriented approach were developed. Here, the objects are nothing but the data carrying its

properties.

4) Semistructured Data Model: This type of data model is different from the other three data

models (explained above). The semistructured data model allows the data specifications at places

where the individual data items of the same type may have different attributes sets. The

Extensible Markup Language, also known as XML, is widely used for representing the

semistructured data. Although XML was initially designed for including the markup information

to the text document, it gains importance because of its application in the exchange of data.

INTRODUCTION TO SQL

SQL

o SQL stands for Structured Query Language. It is used for storing and managing data in

relational database management system (RDMS).

o It is a standard language for Relational Database System. It enables a user to create, read,

update and delete relational databases and tables.

o All the RDBMS like MySQL, Informix, Oracle, MS Access and SQL Server use SQL as

their standard database language.

o SQL allows users to query the database in a number of ways, using English-like

statements.

Rules:

SQL follows the following rules:

o Structure query language is not case sensitive. Generally, keywords of SQL are written in

uppercase.

o Statements of SQL are dependent on text lines. We can use a single SQL statement on

one or multiple text line.

o Using the SQL statements, you can perform most of the actions in a

o depends on tuple relational calculus and relational algebra.

SQL process:

o When an SQL command is executing for any RDBMS, then the system figure out the

best way to carry out the request and the SQL engine determines that how to interpret the

task.

o In the process, various components are included. These components can be optimization

Engine, Query engine, Query dispatcher, classic, etc.

o All the non-SQL queries are handled by the classic query engine, but SQL query engine

won't handle logical files.

Characteristics of SQL

o SQL is easy to learn.

o SQL is used to access data from relational database management systems.

o SQL can execute queries against the database.

o SQL is used to describe the data.

o SQL is used to define the data in the database and manipulate it when needed.

o SQL is used to create and drop the database and table.

o SQL is used to create a view, stored procedure, function in a database.

o SQL allows users to set permissions on tables, procedures, and views.

Integrity Constraints

o Integrity constraints are a set of rules. It is used to maintain the quality of information.

o Integrity constraints ensure that the data insertion, updating, and other processes have to

be performed in such a way that data integrity is not affected.

o Thus, integrity constraint is used to guard against accidental damage to the database.

Types of Integrity Constraint

1. Domain constraints

o Domain constraints can be defined as the definition of a valid set of values for an

attribute.

o The data type of domain includes string, character, integer, time, date, currency, etc. The

value of the attribute must be available in the corresponding domain.

Example:

2. Entity integrity constraints

o The entity integrity constraint states that primary key value can't be null.

o This is because the primary key value is used to identify individual rows in relation and if

the primary key has a null value, then we can't identify those rows.

o A table can contain a null value other than the primary key field.

Example:

3. Referential Integrity Constraints

o A referential integrity constraint is specified between two tables.

o In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary

Key of Table 2, then every value of the Foreign Key in Table 1 must be null or be

available in Table 2.

Example:

4. Key constraints

o Keys are the entity set that is used to identify an entity within its entity set uniquely.

o An entity set can have multiple keys, but out of which one key will be the primary key. A

primary key can contain a unique and null value in the relational table.

Example:

Relational Model concept

Relational model can represent as a table with columns and rows. Each row is known as a tuple.

Each table of the column has a name or attribute.

Domain: It contains a set of atomic values that an attribute can take.

Attribute: It contains the name of a column in a particular table. Each attribute Ai must have a

domain, dom(Ai)

Relational instance: In the relational database system, the relational instance is represented by a

finite set of tuples. Relation instances do not have duplicate tuples.

Relational schema: A relational schema contains the name of the relation and name of all

columns or attributes.

Relational key: In the relational key, each row has one or more attributes. It can identify the row

in the relation uniquely.

In the given table, NAME, ROLL_NO, PHONE_NO, ADDRESS, and AGE are the

attributes.

NAME ROLL_NO PHONE_NO ADDRESS AGE

Ram 14795 7305758992 Noida 24

Shyam 12839 9026288936 Delhi 35

Laxman 33289 8583287182 Gurugram 20

Mahesh 27857 7086819134 Ghaziabad 27

Ganesh 17282 9028 9i3988 Delhi 40

o The instance of schema STUDENT has 5 tuples.

o t3 =

Properties of Relations

o Name of the relation is distinct from all other relations.

o Each relation cell contains exactly one atomic (single) value

o Each attribute contains a distinct name

o Attribute domain has no significance

o tuple has no duplicate value

o Order of tuple can have a different sequence

UNIT -II

What is a Database Transaction?

A transaction is a logical unit of processing in a DBMS which entails one or more database

access operation. In a nutshell, database transactions represent real-world events of any

enterprise.

All types of database access operation which are held between the beginning and end transaction

statements are considered as a single logical transaction. During the transaction the database is

inconsistent. Only once the database is committed the state is changed from one consistent state

to another.

In this tutorial, you will learn:

• What is a Database Transaction?

• Facts about Database Transactions

• Why do you need concurrency in Transactions?

• States of Transactions

• What are ACID Properties?

• Types of Transactions

• What is a Schedule?

Facts about Database Transactions

• A transaction is a program unit whose execution may or may not change the contents of a

database.

• The transaction is executed as a single unit

• If the database operations do not update the database but only retrieve data, this type of

transaction is called a read-only transaction.

• A successful transaction can change the database from one CONSISTENT STATE to

another

• DBMS transactions must be atomic, consistent, isolated and durable

• If the database were in an inconsistent state before a transaction, it would remain in the

inconsistent state after the transaction.

Why do you need concurrency in Transactions?

A database is a shared resource accessed. It is used by many users and processes concurrently.

For example, the banking system, railway, and air reservations systems, stock market

monitoring, supermarket inventory, and checkouts, etc.

Not managing concurrent access may create issues like:

https://www.guru99.com/images/1/100518_0500_DBMSTransac1.pnghttps://www.guru99.com/dbms-transaction-management.html#1https://www.guru99.com/dbms-transaction-management.html#2https://www.guru99.com/dbms-transaction-management.html#3https://www.guru99.com/dbms-transaction-management.html#4https://www.guru99.com/dbms-transaction-management.html#5https://www.guru99.com/dbms-transaction-management.html#6https://www.guru99.com/dbms-transaction-management.html#7

• Hardware failure and system crashes

• Concurrent execution of the same transaction, deadlock, or slow performance

States of Transactions

The various states of a Database Transaction are listed below

State Transaction types

Active State A transaction enters into an active state when the execution process

begins. During this state read or write operations can be performed.

Partially

Committed

A transaction goes into the partially committed state after the end of a

transaction.

Committed State When the transaction is committed to state, it has already completed its

execution successfully. Moreover, all of its changes are recorded to the

database permanently.

Failed State A transaction considers failed when any one of the checks fails or if the

transaction is aborted while it is in the active state.

Terminated State State of transaction reaches terminated state when certain transactions

which are leaving the system can't be restarted.

State Transition Diagram for a Database Transaction

Let's study a state transition diagram that highlights how a transaction moves between these

various states.

1. Once a transaction states execution, it becomes active. It can issue READ or WRITE operation.

2. Once the READ and WRITE operations complete, the transactions becomes partially committed state.

3. Next, some recovery protocols need to ensure that a system failure will not result in an inability to record changes in the transaction permanently. If this check is a success, the

transaction commits and enters into the committed state.

https://www.guru99.com/images/1/100518_0500_DBMSTransac6.png

4. If the check is a fail, the transaction goes to the Failed state. 5. If the transaction is aborted while it's in the active state, it goes to the failed state. The

transaction should be rolled back to undo the effect of its write operations on the

database.

6. The terminated state refers to the transaction leaving the system.

What are ACID Properties?

For maintaining the integrity of data, the DBMS system you have to ensure ACID properties.

ACID stands for Atomicity, Consistency, Isolation, and Durability.

• Atomicity: A transaction is a single unit of operation. You either execute it entirely or do

not execute it at all. There cannot be partial execution.

• Consistency: Once the transaction is executed, it should move from one consistent state

to another.

• Isolation: Transaction should be executed in isolation from other transactions (no

Locks). During concurrent transaction execution, intermediate transaction results from

simultaneously executed transactions should not be made available to each other. (Level

0,1,2,3)

• Durability: · After successful completion of a transaction, the changes in the database

should persist. Even in the case of system failures.

Example of ACID

Transaction 1: Begin X=X+50, Y = Y-50 END

Transaction 2: Begin X=1.1*X, Y=1.1*Y END

Transaction 1 is transferring $50 from account X to account Y.

Transaction 2 is crediting each account with a 10% interest payment.

If both transactions are submitted together, there is no guarantee that the Transaction 1 will

execute before Transaction 2 or vice versa. Irrespective of the order, the result must be as if the

transactions take place serially one after the other.

Types of Transactions

Based on Application areas

• Non-distributed vs. distributed

• Compensating transactions

• Transactions Timing

• On-line vs. batch

Based on Actions

• Two-step

• Restricted

• Action model

Based on Structure

• Flat or simple transactions: It consists of a sequence of primitive operations executed

between a begin and end operations.

• Nested transactions: A transaction that contains other transactions.

• Workflow

What is a Schedule?

A Schedule is a process creating a single group of the multiple parallel transactions and

executing them one by one. It should preserve the order in which the instructions appear in each

transaction. If two transactions are executed at the same time, the result of one transaction may

affect the output of other.

Example

Initial Product Quantity is 10

Transaction 1: Update Product Quantity to 50

Transaction 2: Read Product Quantity

If Transaction 2 is executed before Transaction 1, outdated information about the product

quantity will be read. Hence, schedules are required.

Parallel execution in a database is inevitable. But, Parallel execution is permitted when there is

an equivalence relation amongst the simultaneously executing transactions. This equivalence is

of 3 Types.

RESULT EQUIVALENCE:

If two schedules display the same result after execution, it is called result equivalent schedule.

They may offer the same result for some value and different results for another set of values. For

example, one transaction updates the product quantity, while other updates customer details.

What is Concurrency Control?

Concurrency control is the procedure in DBMS for managing simultaneous operations without

conflicting with each another. Concurrent access is quite easy if all users are just reading data.

There is no way they can interfere with one another. Though for any practical database, would

have a mix of reading and WRITE operations and hence the concurrency is a challenge.

Concurrency control is used to address such conflicts which mostly occur with a multi-user

system. It helps you to make sure that database transactions are performed concurrently without

violating the data integrity of respective databases.

Therefore, concurrency control is a most important element for the proper functioning of a

system where two or multiple database transactions that require access to the same data, are

executed simultaneously.

Lock-based Protocols

A lock is a data variable which is associated with a data item. This lock signifies that operations

that can be performed on the data item. Locks help synchronize access to the database items by

concurrent transactions.

All lock requests are made to the concurrency-control manager. Transactions proceed only once

the lock request is granted.

Binary Locks: A Binary lock on a data item can either locked or unlocked states.

Shared/exclusive: This type of locking mechanism separates the locks based on their uses. If a

lock is acquired on a data item to perform a write operation, it is called an exclusive lock.

1. Shared Lock (S):

A shared lock is also called a Read-only lock. With the shared lock, the data item can be shared

between transactions. This is because you will never have permission to update data on the data

item.

For example, consider a case where two transactions are reading the account balance of a person.

The database will let them read by placing a shared lock. However, if another transaction wants

to update that account's balance, shared lock prevent it until the reading process is over.

2. Exclusive Lock (X):

With the Exclusive Lock, a data item can be read as well as written. This is exclusive and can't

be held concurrently on the same data item. X-lock is requested using lock-x instruction.

Transactions may unlock the data item after finishing the 'write' operation.

For example, when a transaction needs to update the account balance of a person. You can

allows this transaction by placing X lock on it. Therefore, when the second transaction wants to

read or write, exclusive lock prevent this operation.

3. Simplistic Lock Protocol

This type of lock-based protocols allows transactions to obtain a lock on every object before

beginning operation. Transactions may unlock the data item after finishing the 'write' operation.

4. Pre-claiming Locking

Pre-claiming lock protocol helps to evaluate operations and create a list of required data items

which are needed to initiate an execution process. In the situation when all locks are granted, the

transaction executes. After that, all locks release when all of its operations are over.

Log-Based Recovery

o The log is a sequence of records. Log of each transaction is maintained in some stable

storage so that if any failure occurs, then it can be recovered from there.

o If any operation is performed on the database, then it will be recorded in the log.

o But the process of storing the logs should be done before the actual transaction is applied

in the database.

Let's assume there is a transaction to modify the City of a student. The following logs are written

for this transaction.

o When the transaction is initiated, then it writes 'start' log.

1.

o When the transaction modifies the City from 'Noida' to 'Bangalore', then another log is

written to the file.

1.

o When the transaction is finished, then it writes another log to indicate the end of the

transaction.

1.

There are two approaches to modify the database:

1. Deferred database modification:

o The deferred modification technique occurs if the transaction does not modify the

database until it has committed.

o In this method, all the logs are created and stored in the stable storage, and the database is

updated when a transaction commits.

2. Immediate database modification:

o The Immediate modification technique occurs if database modification occurs while the

transaction is still active.

o In this technique, the database is modified immediately after every operation. It follows

an actual database modification.

Recovery using Log records

When the system is crashed, then the system consults the log to find which transactions need to

be undone and which need to be redone.

1. If the log contains the record and or , then the

Transaction Ti needs to be redone.

2. If log contains record but does not contain the record either or

, then the Transaction Ti needs to be undone.

Crash Recovery

DBMS is a highly complex system with hundreds of transactions being executed every second.

The durability and robustness of a DBMS depends on its complex architecture and its

underlying hardware and system software. If it fails or crashes amid transactions, it is expected

that the system would follow some sort of algorithm or techniques to recover lost data.

Failure Classification

To see where the problem has occurred, we generalize a failure into various categories, as

follows −

Transaction failure

A transaction has to abort when it fails to execute or when it reaches a point from where it can’t

go any further. This is called transaction failure where only a few transactions or processes are

hurt.

Reasons for a transaction failure could be −

• Logical errors − Where a transaction cannot complete because it has some code error or

any internal error condition.

• System errors − Where the database system itself terminates an active transaction

because the DBMS is not able to execute it, or it has to stop because of some system

condition. For example, in case of deadlock or resource unavailability, the system aborts

an active transaction.

System Crash

There are problems − external to the system − that may cause the system to stop abruptly and

cause the system to crash. For example, interruptions in power supply may cause the failure of

underlying hardware or software failure.

Examples may include operating system errors.

Disk Failure

In early days of technology evolution, it was a common problem where hard-disk drives or

storage drives used to fail frequently.

Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or any

other failure, which destroys all or a part of disk storage.

Storage Structure

We have already described the storage system. In brief, the storage structure can be divided into

two categories −

• Volatile storage − As the name suggests, a volatile storage cannot survive system

crashes. Volatile storage devices are placed very close to the CPU; normally they are

embedded onto the chipset itself. For example, main memory and cache memory are

examples of volatile storage. They are fast but can store only a small amount of

information.

• Non-volatile storage − These memories are made to survive system crashes. They are

huge in data storage capacity, but slower in accessibility. Examples may include hard-

disks, magnetic tapes, flash memory, and non-volatile (battery backed up) RAM.

Recovery and Atomicity

When a system crashes, it may have several transactions being executed and various files

opened for them to modify the data items. Transactions are made of various operations, which

are atomic in nature. But according to ACID properties of DBMS, atomicity of transactions as a

whole must be maintained, that is, either all the operations are executed or none.

When a DBMS recovers from a crash, it should maintain the following −

• It should check the states of all the transactions, which were being executed.

• A transaction may be in the middle of some operation; the DBMS must ensure the

atomicity of the transaction in this case.

• It should check whether the transaction can be completed now or it needs to be rolled

back.

• No transactions would be allowed to leave the DBMS in an inconsistent state.

There are two types of techniques, which can help a DBMS in recovering as well as maintaining

the atomicity of a transaction −

• Maintaining the logs of each transaction, and writing them onto some stable storage

before actually modifying the database.

• Maintaining shadow paging, where the changes are done on a volatile memory, and later,

the actual database is updated.

Log-based Recovery

Log is a sequence of records, which maintains the records of actions performed by a transaction.

It is important that the logs are written prior to the actual modification and stored on a stable

storage media, which is failsafe.

Log-based recovery works as follows −

• The log file is kept on a stable storage media.

• When a transaction enters the system and starts execution, it writes a log about it.

• When the transaction modifies an item X, it write logs as follows −

It reads Tn has changed the value of X, from V1 to V2.

• When the transaction finishes, it logs −

The database can be modified using two approaches −

• Deferred database modification − All logs are written on to the stable storage and the

database is updated when a transaction commits.

• Immediate database modification − Each log follows an actual database modification.

That is, the database is modified immediately after every operation.

Recovery with Concurrent Transactions

When more than one transaction are being executed in parallel, the logs are interleaved. At the

time of recovery, it would become hard for the recovery system to backtrack all logs, and then

start recovering. To ease this situation, most modern DBMS use the concept of 'checkpoints'.

Checkpoint

Keeping and maintaining logs in real time and in real environment may fill out all the memory

space available in the system. As time passes, the log file may grow too big to be handled at all.

Checkpoint is a mechanism where all the previous logs are removed from the system and stored

permanently in a storage disk. Checkpoint declares a point before which the DBMS was in

consistent state, and all the transactions were committed.

Structures Used for Database Recovery Several structures of an Oracle database safeguard data against possible failures. The following

sections briefly introduce each of these structures and its role in database recovery.

Database Backups A database backup consists of operating system backups of the physical files that constitute an

Oracle database. To begin database recovery from a media failure, Oracle uses file backups to

restore damaged datafiles or control files.

Oracle offers several options in performing database backups; see Chapter 23, "Database

Backup", for more information.

The Redo Log The redo log, present for every Oracle database, records all changes made in an Oracle database.

The redo log of a database consists of at least two redo log files that are separate from the

datafiles (which actually store a database's data). As part of database recovery from an instance

or media failure, Oracle applies the appropriate changes in the database's redo log to the

datafiles, which updates database data to the instant that the failure occurred.

https://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch23.htm

A database's redo log can be comprised of two parts: the online redo log and the archived redo

log, discussed in the following sections.

The Online Redo Log Every Oracle database has an associated online redo log. The online redo

log works with the Oracle background process LGWR to immediately record all changes made

through the associated instance. The online redo log consists of two or more pre-allocated files

that are reused in a circular fashion to record ongoing database changes; see "The Online

Redo Log" for more information.

The Archived (Offline) Redo Log Optionally, you can configure an Oracle database to archive

files of the online redo log once they fill. The online redo log files that are archived are uniquely

identified and make up the archived redo log. By archiving filled online redo log files, older redo

log information is preserved for more extensive database recovery operations, while the pre-

allocated online redo log files continue to be reused to store the most current database changes;

see "The Archived Redo Log" page 22-16 for more information.

Rollback Segments

Rollback segments are used for a number of functions in the operation of an Oracle database. In

general, the rollback segments of a database store the old values of data changed by ongoing

transactions (that is, uncommitted transactions). Among other things, the information in a

rollback segment is used during database recovery to "undo" any "uncommitted" changes applied

from the redo log to the datafiles. Therefore, if database recovery is necessary, the data is in a

consistent state after the rollback segments are used to remove all uncommitted data from the

datafiles; see "Rollback Segments" for more information.

Control Files

In general, the control file(s) of a database store the status of the physical structure of the

database. Certain status information in the control file (for example, the current online redo log

file, the names of the datafiles, and so on) guides Oracle during instance or media recovery;

see "Control Files" for more information.

Checkpoint Explanation

o The checkpoint is a type of mechanism where all the previous logs are removed from the

system and permanently stored in the storage disk.

o The checkpoint is like a bookmark. While the execution of the transaction, such

checkpoints are marked, and the transaction is executed then using the steps of the

transaction, the log files will be created.

o When it reaches to the checkpoint, then the transaction will be updated into the database,

and till that point, the entire log file will be removed from the file. Then the log file is

updated with the new step of transaction till next checkpoint and so on.

https://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch22.htm#onlineredologhttps://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch22.htm#onlineredologhttps://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch22.htm#onlineredologhttps://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch22.htm#offlineredologhttps://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch22.htm#offlineredologhttps://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch3.htm#rbsegs%20-%20sectiohttps://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch22.htm#controlsectionhttps://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch3.htm#rbsegs%20-%20sectiohttps://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch22.htm#controlsection

o The checkpoint is used to declare a point before which the DBMS was in the consistent

state, and all transactions were committed.

Recovery using Checkpoint

In the following manner, a recovery system recovers the database from this failure:

o The recovery system reads log files from the end to start. It reads log files from T4 to T1.

o Recovery system maintains two lists, a redo-list, and an undo-list.

o The transaction is put into redo state if the recovery system sees a log with

and or just . In the redo-list and their previous list, all the

transactions are removed and then redone before saving their logs.

o For example: In the log file, transaction T2 and T3 will have and . The T1 transaction will have only in the log file. That's why the

transaction is committed after the checkpoint is crossed. Hence it puts T1, T2 and T3

transaction into redo list.

o The transaction is put into undo state if the recovery system sees a log with

but no commit or abort log found. In the undo-list, all the transactions are undone, and

their logs are removed.

o For example: Transaction T4 will have . So T4 will be put into undo list

since this transaction is not yet complete and failed amid.

Media Recovery

If you restore the archived redo log files and data files, then you must perform media recovery

before you can open the database. Any database transactions in the archived redo log files not

reflected in the data files are applied to the data files, bringing them to a transaction-consistent

state before the database is opened.

Media recovery requires a control file, data files (typically restored from backup), and online and

archived redo log files containing changes since the time the data files were backed up. Media

recovery is most often used to recover from media failure, such as the loss of a file or disk, or a

user error, such as the deletion of the contents of a table.

Media recovery can be a complete recovery or a point-in-time recovery. Complete recovery can

apply to individual datafiles, tablespaces, or the entire database. Point-in-time recovery applies to

the whole database (and also sometimes to individual tablespaces, with automation help from

Oracle Recover Manager (RMAN)).

In a complete recovery, you restore backup data files and apply all changes from the archived

and online redo log files to the data files. The database is returned to its state at the time of

failure and can be opened with no loss of data.

In a point-in-time recovery, you return a database to its contents at a user-selected time in the

past. You restore a backup of data files created before the target time and a complete set of

archived redo log files from backup creation through the target time. Recovery applies changes

between the backup time and the target time to the data files. All changes after the target time are

discarded.

RMAN enables you to perform both a complete and a point-in-time recovery of your database.

However, this documentation focuses on complete recovery.

UNIT 3

OBJECT BASED DATABASE AND XML

What is structure data type?

Structure type

A structured data type is a compound data type which falls under user-defined category and used

for grouping simple data types or other compound data types. This contains a sequence of

member variable names along with their type/attributes and they are enclosed within curl

brackets.

struct < struct name > {

< type > < member >;

};

Need for Struct data types

There are some situations when we need to group different types of variables in one group. Let's

see one situation here- we want to store the name, roll and age of a student.

unsigned int student_roll; char student_name [MAX_STRING]; unsigned int student_age;

Here we have a logical grouping between there three variables but still these three variables are

scattered. We are accessing three different variables for storing attribute values of a single

student. Now how to group this inside one logical entity, like three variables for a single student

grouped inside one variable. This type of grouping is called structure. One point to note here is

that array can group only same type elements and here structure has different types.

Example of Struct data types

Let's define these again with a structure type

struct student_t { unsigned int roll; char name[MAX_STRING]; unsigned int age; };

struct student_t student1; student1.roll = ; strcpy (student1.name, ); student1.age =

Structure size and memory layout

Structure is user defined type to group of different type of variables of either compiler

defined legacy types or other user defined types or mixed. Individual entity of a structure

element is called member. Members inside a structure are placed sequentially next to next

in the memory layout. Thus minimum size of a structure is the sum total of all sizes of

members, with considering padding.

Struct data types syntax

struct { ; ; ... ; };

Structure demo example

/* Structure type demo example program */ #include /* Structure type student */ struct student { char name[100]; char dept[100]; int rollno; float marks; }; /* Structure type main routine */

int main (int argc, char *argv[]) { /* declare struct variable */ struct student s1; printf("\nEnter the name, dept, roll number and marks of student:\n"); scanf("%s %s %d %f", s1.name, s1.dept, &s1.rollno, &s1.marks); printf("\nThe name, dept, roll number and marks of the student are:"); printf("\n%s %s %d %.2f",s1.name,s1.dept,s1.rollno,s1.marks); }

Program output

Enter the name, dept, roll number and marks of student:

Student1 ECE 1 96.5

The name, dept, roll number and marks of the student are:

Student1 ECE 1 96.50

OPERATIONS ON STRUCTURED DATA

Structured data is the data which conforms to a data model, has a well define structure, follows

a consistent order and can be easily accessed and used by a person or a computer program.

Structured data is usually stored in well-defined schemas such as Databases. It is generally

tabular with column and rows that clearly define its attributes.

SQL (Structured Query language) is often used to manage structured data stored in databases.

Characteristics of Structured Data: • Data conforms to a data model and has easily identifiable structure

• Data is stored in the form of rows and columns

Example : Database

• Data is well organised so, Definition, Format and Meaning of data is explicitly known

• Data resides in fixed fields within a record or file

• Similar entities are grouped together to form relations or classes

• Entities in the same group have same attributes

• Easy to access and query, So data can be easily used by other programs

• Data elements are addressable, so efficient to analyse and process

Sources of Structured Data: • SQL Databases

• Spreadsheets such as Excel

• OLTP Systems

• Online forms

• Sensors such as GPS or RFID tags

• Network and Web server logs

• Medical devices.

Advantages of Structured Data: • Structured data have a well defined structure that helps in easy storage and access of data

• Data can be indexed based on text string as well as attributes. This makes search

operation hassle-free

• Data mining is easy i.e knowledge can be easily extracted from data

• Operations such as Updating and deleting is easy due to well structured form of data

• Business Intelligence operations such as Data warehousing can be easily undertaken

• Easily scalable in case there is an increment of data

• Ensuring security to data is easy

ENCAPSULATION AND ADTs

An object-oriented database must provide support for all data types not just the built in data

types such as character, integer, and float. To understand abstract data types lets take two steps

back by taking off the abstract and then the data from abstract data type. We now have a type, a

type would be defined as a collection of a type values. A simple example of this is the Integer

type, it consists of values 0, 1, 2, 3, etc. If we add the word data back in we would define data

type as a type and the set of operations that will manipulate the type. If we expand off our

integer example, a data type would be an integer variable, an integer variable is a member of the

integer data type. Addition, subtraction, and multiplication are examples of operations that can

be performed on the integer data type.

If we now add the word abstract back in we can define an abstract data type (ADT) as a data

type, that is a type and the set of operations that will manipulate the type. The set of operations

are only defined by their inputs and outputs. The ADT does not specify how the data type will

be implemented, all of the ADT's details are hidden from the user of the ADT. This process of

hiding the details is called encapsulation. If we extend the example for the integer data type to

an abstract data type, the operations might be delete an integer, add an integer, print an integer,

and check to see if a certain integer exists. Notice that we do not care how the operation will be

done but simply how do invoke the operation.

Let's start by looking at traditional programming languages and the data types that they use.

Traditional languages are based on text and numerical data types, and you are limited to what

kinds of data types that the programming language will support. Variables that are used by the

programming language have to be defined using one of the supported data types. OT has done

away with the restrictions of just using these built in data types and allows you to create different

data types. Once these new data types are defined they are treated the same way as built in data

types. The ability to create new data types when needed and then use these data types is called

data abstraction, and the new data types are called abstract data types (ADTs).

An abstract data type is more than a set of values. When used to create an object, it can also

have method attached to it, and the details of these methods are hidden from the user. Data

abstraction and ADT's are a cornerstone for OT because they can be created as needed, and this

helps you to think of and design computer systems to more accurately reflect the way data types

are represented in the real world.

One of the main reasons why hierarchical, network and relational databases are being replaced is

their failure to support ADT's. These traditional databases have very strict rules foe the layout of

data and simply are not flexible enough to handle ADT's.

Encapsulation

Encapsulation gathers the data and methods of an object and puts them into a package, creating a

well defined boundary around the object. Encapsulation is often referred to as information

hiding, and encapsulation can be used to restrict which users and what operations can be

performed against the data inside the object.

Classes provide encapsulation or information hiding by access control. A class will grant or

deny access to its objects using the public and private access specifiers. Public members define

an interface between a class and the users of that class. Public members can be accessed by any

function in a program. Objects can contain both public and private variables,

the public variables are used with the objects methods or interfaces.

Private variables are only known to the object, and cannot be accessed by an interface. For

example a private method might be used to compute an internal value.

Encapsulation can be used in non-database object-oriented applications to guarantee that all

operations are done via the methods that the programmer has defined in the class definition,

insuring that data can not be changed outside of its own pre-defined methods. However,

declarative database languages such as SQL allows what might be called ?declarative" retrieval

and updates of data, and does not follow the rules of encapsulation. This is called an impedance

mismatch, and is inconsistent with object-oriented database management.

As an example, in a relational database we could define a behavior called ADD_ORDER which

will check to see if there is enough product in inventory for the order. The order object will not

be created if there was not enough product in inventory. This behavior will make sure that no

order is placed for product that is unavailable. However in a relational database, you could use

SQL and bypass this validity check and thereby add an invalid order into the database.

INHERITANCE

Inheritance enables you to share attributes between objects such that a subclass inherits attributes

from its parent class. OracleAS TopLink provides several methods to preserve inheritance

relationships, and enables you to override mappings that are specified in a superclass, or to map

attributes that are not mapped in the superclass. Subclasses must include the same database field

(or fields) as the parent class for their primary key (although the primary key can have different

names in these two tables). As a result, when you are mapping relationships to a subclass stored

in a separate table, the subclass table must include the parent table primary key, even if the

subclass primary key differs from the parent primary key.

This section describes OracleAS TopLink inheritance, and introduces several topics and

techniques to leverage inheritance in your own applications, including:

• Understanding Object Inheritance

• Representing Inheritance in the Database

• Class Types

• Class Indicators

• Class Extraction Methods

• Entity Bean Inheritance Restrictions

For more information about implementing inheritance in code, see "Implementing Inheritance in

Java".

Understanding Object Inheritance

Consider a simple database used by a courier company. It contains registration information for

three types of vehicles: trucks, cars, and bicycles. For each vehicle type, your application

requires the following information:

• VID (Vehicle Identification)

• LastMaint (mileage since last maintenance)

• LoadCap (load capacity)

If these are all the attributes shared by all vehicles in the application, then these attributes must

all appear in the super class, Vehicle. You can then build subclasses for each of the vehicle types

that reflects their differences. For example, the Truck class may have an attribute indicating

whether the local department of transportation considers it to be a commercial vehicle

(NumAxles), the Car class may require a NumPass (number of passengers) attribute, and the

Bicycle class, by virtue of its more limited range, may require a Location attribute. Through

inheritance, each vehicle automatically inherits the basic vehicle information, but by being

separate subclasses, also have unique characteristics.

Figure 3-6 Inheritance in a Courier Application

https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1131763https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1143147https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1142056https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1131778https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1131783https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1141997https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping008.htm#i1132200https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping008.htm#i1132200

Description of the illustration inhrex.gif

Representing Inheritance in the Database

You can represent inheritance in the database in one of two ways:

• Multiple tables that represent the parent class and each child class

• A single table that comprises the parent and all child classes

Figure 3-7 Inheritance in the Database in Individual Tables

Description of the illustration dbinhrt1.gif

If your database already represents the objects in the inheritance hierarchy this way, you can map

the objects and relationships without modifying the tables. However, it is most efficient to

represent all classes from a given inheritance hierarchy in a single table, because it substantially

reduces the number of table reads and eliminates joins when querying on objects in the

hierarchy.

Figure 3-8 Inheritance in the Database in a Single Table

Description of the illustration dbinhrt2.gif

To consolidate tables in the database this way, determine the class type of the objects represented

by the rows in the table. There are two ways to determine class type:

• If you can add columns to the database table, add a class indicator column that represents

the vehicle class type (Truck, Car, or Bicycle).

For more information about class indicators, see "Class Indicators".

• If you cannot modify the table, build a class extraction method that executes an

appropriate login to determine the class type.

For more information about class extraction methods, see "Class Extraction Methods".

Class Types

The OracleAS TopLink inheritance hierarchy includes three types of classes:

https://docs.oracle.com/cd/B14099_19/web.1012/b15901/img_text/inhrex.htmhttps://docs.oracle.com/cd/B14099_19/web.1012/b15901/img_text/dbinhrt1.htmhttps://docs.oracle.com/cd/B14099_19/web.1012/b15901/img_text/dbinhrt2.htmhttps://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1131778https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1131783

• Root Class

• Branch Class

• Leaf Class

Figure 3-9 Inheritance Hierarchy Class Types

Description of the illustration rtbrlf.gif

Root Class

The root class stores information for all instantiable classes in its subclass hierarchy. By default,

queries performed on the root class return instances of the root class and its instantiable

subclasses. However, you can also configure the root class to return only instances of itself,

without instances of its subclasses when queried. All class types beneath the root class inherit

from the root class.

Branch Class

Branch classes have a persistent superclass and subclasses. By default, queries performed on the

branch class return instances of the branch class and any of its subclasses. As with the root class,

you can configure the branch class to return only instances of itself, without instances of its

subclasses when queried. All classes below the branch class inherit attributes from the branch

class, including any attributes the branch class inherits from classes above it in the hierarchy.

Leaf Class

Leaf classes have a persistent superclass in the hierarchy, but do not have subclasses. Queries

performed on the leaf class return only instances of the leaf class.

What is Object?

Object consists of entity and attributes which can describe the state of real world object and

action associated with that object.

Characteristics of Object

https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1163782https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1163786https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1163790https://docs.oracle.com/cd/B14099_19/web.1012/b15901/img_text/rtbrlf.htm

Some important characteristics of an object are:

1. Object name

• The name is used to refer different objects in the program.

2. Object identifier

• This is the system generated identifier which is assigned, when a new object is created.

3. Structure of object

• Structure defines, how the object is constructed using constructor.

• In object oriented database the state of complex object can be constructed from other objects by

using certain type of constructor.

• The formal way of representing objects as (i,c,v) where 'i' is object identifier, 'c' is type

constructor and 'v' is current value of an object.

4. Transient object

• In OOPL, objects which are present only at the time of execution are called as transient object.

For example: Variables in OOPL

5. Persistent objects

• An object which exists even after the program is completely executed (or terminated), is called

as persistent objects. Object-oriented databases can store objects in secondary memory.

Object identity

• Every object has unique identity. In an object oriented system, when object is created OID is

assigned to it.

• In RDBMS OID is value based and primary key is used to provide uniqueness of each table in

relation. Primary key is unique only for that relation and not for the entire system. Primary key

is chosen from the attributes of the relation which makes object independent on the object state.

• In OODBMS OID are variable name or pointer.

Properties of OID

1. Uniqueness: OID cannot be same to every object in the system and it is generated

automatically by the system.

2. Invariant: OID cannot be changed throughout its entire lifetime.

3.Invisible: OID is not visible to user.

Attributes

Attributes are nothing but the properties of objects in the system.

Example: Employee can have attribute 'name' and 'address' with assigned values as:

Attribute Value

Name Radha

Address Pune

ID 07

Type of Attributes

The three types of attributes are as follows:

1. Simple attributes

Attributes can be of primitive data type such as, integer, string, real etc. which can take literal

value.

Example: 'ID' is simple attribute and value is 07.

2. Complex attributes

Attributes which consist of collections or reference of other multiple objects are called as

complex attributes.

Example: Collection of Employees consists of many employee names.

3. Reference attributes

Attributes that represent a relationship between objects and consist of value or collection of

values are called as reference attributes.

Example: Manager is reference of staff object.

The rich variety of data types in an ORDBMS offers a database designer many opportunities for

a more efficient design. As discussed in previous sections, an ORDBMS supports number of

much better solution compared to RDBMS and other databases.

• ORDBMS allows to store the video as an user-defined abstract data type (ADT) object

and write methods that capture any special manipulation that an user wish to perform. Allowing

users to define arbitrary new data types is a key feature of ORDBMs. The ORDBMS allows

users to store and retrieve objects of type jpeg-image which stores a compressed image

representing a single frame of film, just like an object of any other type, such as integer.

Common Implementation Challenges

There are a range of different issues and challenges that need to be addressed for successful

program implementation. Some of these challenges are particularly unique to rural communities.

Common challenges are described below, along with suggestions on how to address these

challenges:

• Resources and sustainability: Funding, technological, and human resources are

typically limited in rural communities. It can be particularly difficult to generate enough

start-up funds to sustain the program as it begins. Having a network of stakeholders and

partners in the community may be beneficial for providing resources and support for a

program.

• Geographic limitations: Geography influences a number of factors that can challenge

program implementation and operations (e.g., isolation and weather). Depending on the

type of program, setting, frequency of participation, and type of activities involved, these

challenges can become significant. This becomes a particularly important issue when

there is limited transportation access for the target population. This requires changes in

approaches and program design that take into account lengthy travel times, availability of

transportation, and opportunity to offer the program remotely or through other

technologies.

• Recruiting staff: Rural communities that are implementing rural health programs that

require physicians, dietitians, or physical therapists for example have faced barriers to

recruiting appropriately trained staff. Some programs work with volunteer or retired

practitioners, or students.

• Hard-to-reach populations: The priority population may be highly mobile. For

example, one rural health program was striving to provide care to two hard-to-reach

populations: Hispanic poultry workers and migrant farm workers. These populations

travel from camp to camp during different times each year, making it challenging to

reach them. Several rural health programs use mobile vans to provide traveling health

services.

• Cultural and social issues: A number of challenges to program success arise out of

unique cultural and social norms that influence expectations about the program and its

likelihood of success. Examples of these types of issues include:

o Deeply rooted traditions and cultures around food

o Lack of trust for medical professionals and outsiders

o Social beliefs around certain behaviors

It is critical for program implementers to make a conscious effort to recognize and

understand the population their program will serve, so they can develop appropriate

strategies. Involving members from the target population throughout the whole process

can help achieve cultural competency, encourage participation, and reduce social stigmas.

Implementers also may need to adapt materials, such as information packets, to ensure all

program materials are culturally appropriate.

• Language: Rural health programs may target communities with a large Hispanic or

immigrant population. Such programs need to ensure that their staff understands the

importance of providing services or public health education in a culturally appropriate

manner. In addition, programs may need to either employ staff proficient in Spanish or

other languages.

• Keeping the community motivated: Regardless of the community and populations

targeted in the program efforts, an awareness of health concerns needs to exist and

individual and organizational commitments are necessary toward making the changes

needed to address those concerns. It’s important for program planners to understand that

success will depend on conducting education and outreach efforts to determine

community members’ expectations about program impact and to motivate them to

achieve better health outcomes.

Difference between RDBMS , ORDBMS and OODBMS

RDBMS ,ORDBMS AND OODBMS

Compare RDBMS with ORDBMS.

S.No RDBMS ORDBMS

1 Relational Database Management Systems Object – Relational

Database Systems

2 Based on Relational Data Model Based on Object Data Model

(ODM)

3 Dominant model Gaining popularity

4

ORDBMS is an attempt to extend

relational database systems to

provide a bridge between the

relational and object-oriented

paradigms.

5 RDBMS support a small, fixed collection

of data types ( eg. Integers, dates, strings )

which has proven adequate for traditional

application domains such as administrative

data processing

ORDBMS is based on Object-

Oriented Database systems and

Relational Database systems and

are aimed at application domains

where complex objects play a

central role.

6 Supports Structured Query Language

( SQL )

Supports Object Query Language

( OQL )

SQL : 1999 standard extends SQL

to incorporate support for the

object-relational model of data

7 RDBMS products :

• IBM’s DB2

• Informix

• Oracle

• Sybase

Object-oriented model products:

• Objectstore

• Versant

Object-relational model products:

Used in DBMS products from

• Microsoft’s Access

• Fox Base

• Paradox

• Tandem

• Teradata

• IBM

• Informix

• Objectstore

• Oracle

• Versant

• Others

8 Supports Standard data types and additional

data types

Supports standard data types and

new richer data types.

The new richer data types

supported are

• User-defined data types that

supports image, voice and video

footage and these must be stored in

the database

• Inheritance data types to

inherit the commonality between

different types (eg. To inherit

some features of image objects

while defining compressed image

objects and low-resolution image

objects

• Object Identity data types

like references or pointers to

objects (eg video) for giving

objects a unique object identity,

which can be used to refer or point

to them from elsewhere in the data.

9 Case Scenario : Case Scenario :

9. Compare the similarities and differences between OODBMS and ORDBMS. In particular

compare OQL and SQL : 1999 and discuss the underlying data model.

OODBMS : Object-Oriented Database Management Systems

ORDBMS : Object-Relational Database Management Systems

Similarities

Both supports user-defined ADTs, structured types, object identity and reference types and

inheritance.

Both supports an extended form of SQL. OODBMS support ODL/OQL. ORDBMS support an

extended form of SQL.

ORDBMS consciously try to add OODBMS features to an RDBMS and OODBMS in turn have

developed query language based on relational query languages.

Both provide DBMS functionality such as concurrency control and recovery.

Differences

S.No OODBMS ORDBMS

1 OODBMSs aim to achieve seamless

integration with a programming language

such as C++, Java.

Such integration is not an important

goal for an ORDBMS.

2 An OODBMS is aimed at applications

where an object-centric viewpoint is

appropriate.

An ORDBMS is optimized for

applications in which large data

collections are the focus, even though

objects may have a rich structure and

be fairly large,

3 The query facilities of OQL are not

supported efficiently in most OODBMSs.

The query facilities are the

centerpiece of an ORDBMS.

XML

XML stands for Extensible Markup Language. It is a set of rules that define tags that break a

document into parts and identify the parts of the document. These tags define a syntax that can

then be used in combination with an XSL stylesheet to reconstruct the document.

The tags that are defined must follow the XML rules, but their content and arrangement can be

anything the developer wants. A file of XML text, arranged to represent a certain document, is

called an XML application. Oracle Access Manager OutputXML is an XML application,

designed to create HTML which will in turn present Oracle Access Manager pages to a browser.

Oracle Access Manager also uses XML as a structured way to provide some parameters that

control its operation. This is a different use than for OutputXML, but since the applications are

much shorter and the XML syntax rules are followed here as well, one of these files will serve as

an example. For example, frontpageadminparams.xml has the following content:

This indented presentation, showing the tag levels, is an automatic feature of Microsoft's Internet

Explorer. XML editors will also show the file in this way.

Some important parts of this file are the following:

This, the XML declaration, is the first line of any well-formed XML application. Internet

Explorer and some editors will not show the file as formatted XML unless this line is present.

The starting and ending ? make this an XML processing instruction. version="1.0" is an

attribute. Attributes are name-value pairs separated by an equals sign, which provide additional

information for the instruction. Currently there is only one version of XML.

ParamsCtlg is a tag, which starts the definition of the first element in the XML application. The

definition ends with the matching closing tag, which has the same form except it uses a / before

the tag name:

Everything between the starting and ending tags defines the element ParamsCtlg. Nested within

it is the element CompoundList, which has elements nested within it, and so on. An important

attribute is xmlns, which stands for XML namespace.This specifies an owner and possible

reference source for this XML application. We identify ourselves as creators of this application.

The technically precise way to write this element would have been

ParamName="top_frame" Value="_top"

However, when the definition is a short one like this, the XML rules allow use of an abbreviated

closing tag. /> indicates the closing tag for the immediately preceding start tag.

The attributes ParamName="top_frame" and Value="_top" provide the useful content of the file,

which is the name of a variable used by Oracle Access Manager and its value.

XML Schema is commonly known as XML Schema Definition (XSD). It is used to describe

and validate the structure and the content of XML data. XML schema defines the elements,

attributes and data types. Schema element supports Namespaces. It is similar to a database

schema that describes the data in a database.

Syntax

You need to declare a schema in your XML document as follows −

Example

The following example shows how to use schema −

The basic idea behind XML Schemas is that they describe the legitimate format that an XML

document can take.

Elements

As we saw in the XML - Elements chapter, elements are the building blocks of XML document.

An element can be defined within an XSD as follows −

https://www.tutorialspoint.com/xml/xml_elements.htm

Definition Types

You can define XML schema elements in the following ways −

Simple Type

Simple type element is used only in the context of the text. Some of the predefined simple types

are: xs:integer, xs:boolean, xs:string, xs:date. For example −

Complex Type

A complex type is a container for other element definitions. This allows you to specify which

child elements an element can contain and to provide some structure within your XML

documents. For example −

In the above example, Address element consists of child elements. This is a container for

other definitions, that allows to build a simple hierarchy of elements in the XML

document.

Global Types

With the global type, you can define a single type in your document, which can be used by all

other references. For example, suppose you want to generalize the person and company for

different addresses of the company. In such case, you can define a general type as follows −

Now let us use this type in our example as follows −

Instead of having to define the name and the company twice (once for Address1 and once

for Address2), we now have a single definition. This makes maintenance simpler, i.e., if you

decide to add "Postcode" elements to the address, you need to add them at just one place.

Querying and Transformation Given the increasing number of applications that use XML to exchange, mediate, and store data,

tools for effective management of XML data are becoming increasingly important. In particular,

tools for querying and transformation of XML data are essential to extract information from

large bodies of XML data, and to convert data between different representations (schemas) in

XML. Just as the output of a relational query is a relation, the output of an XML query can be an

XML document. As a result, querying and transformation can be combined into a single tool.

Several languages provide increasing degrees of querying and transformation capabilities:

• XPath is a language for path expressions, and is actually a building block for the remaining two

query languages.

• XSLT was designed to be a transformation language, as part of the XSL style sheet system,

which is used to control the formatting of XML data into HTML or other print or display

languages. Although designed for formatting, XSLT can generate XML as output, and can

express many interesting queries. Furthermore, it is currently the most widely available language

for manipulating XML data.

• XQuery has been proposed as a standard for querying of XML data. XQuery combines features

from many of the earlier proposals for querying XML, in particular the language Quilt.

A tree model of XML data is used in all these languages. An XML document is modeled as

a tree, with nodes corresponding to elements and attributes. Element nodes can have children

nodes, which can be subelements or attributes of the element. Correspondingly, each node

(whether attribute or element), other than the root element, has a parent node, which is an

element. The order of elements and attributes in the XML document is modeled by the ordering

of children of nodes of the tree. The terms parent, child, ancestor, descendant, and siblings are

interpreted in the tree model of XML data.

The text content of an element can be modeled as a text node child of the element. Elements

containing text broken up by intervening subelements can have multiple text node children. For

instance, an element containing “this is a wonderful book” would have a

subelement child corresponding to the element bold and two text node children corresponding to

“this is a” and “book”. Since such structures are not commonly used in database data, we shall

assume that elements do not contain both text and subelements.

XPath addresses parts of an XML document by means of path expressions. The language can

be viewed as an extension of the simple path expressions in object-oriented and object-relational

databases (See Section 9.5.1).

A path expression in XPath is a sequence of location steps separated by “/” (instead of the “.”

operator that separates steps in SQL:1999). The result of a path expression is a set of values.

For instance, on the document in Figure 10.8, the XPath expression

would return the same names, but without the enclosing tags.

Like a directory hierarchy, the initial ’/’ indicates the root of the document. (Note that this is an

abstract root “above” that is the document tag.) Path expressions are evaluated from

left to right. As a path expression is evaluated, the result of the path at any point consists of a set

of nodes from the document.

When an element name, such as customer, appears before the next ’/’, it refers to all elements of

the specified name that are children of elements in the current element set. Since multiple

children can have the same name, the number of nodes in the node set can increase or decrease

with each step. Attribute values may also be accessed, using the “@” symbol. For instance,

/bank-2/account/@account-number returns a set of all values of account-number attributes of

account elements. By default, IDREF links are not followed; we shall see how to deal with

IDREFs later.

XPath supports a number of other features:

• Selection predicates may follow any step in a path, and are contained in square brackets. For

example,

http://lh3.googleusercontent.com/-2tEcjprd65w/VUpPP1g2fdI/AAAAAAABqXU/lACKL-Kqt8E/s1600-h/image%255B5%255D.png

The Application program interface

An Application Programming Interface (API) contains software building tools, subroutine

definitions as well as communication protocols that facilitate interaction between systems. An

API may be for a database system, operating system, computer hardware or a web-based system.

An Application Programming Interface makes it simpler to use certain technologies to build

applications for the programmers. API can include specifications for data structures, variables,

routines, object classes, remote calls etc.

A diagram that shows the API in the system is as follows −

Uses of Application Programming Interfaces

API’s are useful in many scenarios. Some of these are given in detail as follows −

Operating Systems

The interface between an operating system and an application is specified with an API. For

example- Posix has API’s that can convert an application written for one POSIX Operating

System to one that can be used on another POSIX operating system.

Libraries and Frameworks

Often API’s are related to software libraries. The API describes the behaviour of the system

while the libraries actually implement that behaviour. A single API can have multiple libraries as

it can have many different implementations. Sometimes, an API can be linked to a software

framework as well. A framework is based on many libraries that implement different API’s

whose behaviour is built into the framework.

Web APIs

The application programming interfaces for web servers or web browsers are known as web

API’s. These web API’s can be server side or client side.

Server side web APIs have an interface that contains endpoints which lead to request-response

message systems that are written in JSON or XML. Most of this is achieved using a HTTP web

server. Client side web API’s are used to extend the functionality of a web browser. Earlier they

were in the form of plug-in browser extensions but now JavaScript bindings are used.

Remote APIs

The remote application programming interfaces allow the programmers to manipulate remote

resources. Most remote API’s are required to maintain object abstraction in object oriented

programming. This can be done by executing a method call locally which then invokes the

corresponding method call on a remote object and gets the result locally as a return value.

Release policies for API

The policies for releasing API’s are private, partner and public. Details about these are given as

follows −

Private release policies

The application programming interfaces released under this policy are for private internal use by

the company.

Partner release policies

The application programming interfaces released under this policy can be used by the company

and its specific business partners. This means that the companies can control the quality of the

API, by monitoring the apps which have access to it.

Public release policies

The application programming interfaces released under public release policies are freely

available to the public. Some examples of this are Microsoft Windows API, Apple’s Cocoa and

Carbon API’s etc.

Storage of xml data

Character

Relational (shredded)

Native XML

Character

Storage options

◼ Large character fields in DBMS

◼ Flat files

◼ .xml files

Fast insert & retrieval

Poor search

RelationalData still stored as character

Portions of the data extracted into additional relational tables

Increased parse time

Increased search capabilities

Native XML

Exclusive XML DBMS

◼ Sedna

◼ Timber

Integrated XML DBMS

◼ DB2

◼ Oracle

Native XML Benefits

XML messages stored in their original format

Documents can be transformed straight from the database via XPath or XSLT.

Increased search capabilities for documents that must be stored as XML.

XML Applications

We've seen a lot of theory in this chapter, so I'm going to spend the rest of this chapter taking a

look at how XML is used today in the real world. The world of XML is huge these days; in fact,

XML is now used internally even in Netscape and Microsoft products, as well as installations of

programming languages such as Perl. You can find a good list of organizations that produce their

own XML-based languages.

It's useful and encouraging to see how XML is being used today in these XML-based languages.

Here's a new piece of terminology: As you know, XML is a metamarkup language, so it's

actually used to create languages. The languages so created are applications of XML; as a result,

they're called XML applications.

Note that the term XML application means an application of XML to a specific domain, such as

MathML, the mathematics markup language; it does not refer to a program that uses XML (a fact

that causes a lot of confusion among people who know nothing about XML).

Thousands of XML applications are around today, and we'll see some of them here. You can see

the advantage to various groups when defining their own markup languages. For example,

physicists or chemists can use the symbols and graphics of their discipline in customized

browsers. In fact, I'll start with Chemical Markup Language (CML) .

• Root element is .

• contains a and an element.

• contains one or more element.

• contains one , at least 2 s, and no more than 6 s.

• One of the answer must have an attribute "correct=y" which indicates the correct

answer.

• might appear before of after all the s.

The Example Quiz

In which continent is the country Japan located?

Asia

Europe

Africa

America

Tuna

Cow

Whale

Lobster

Which one cannot swim?

How many points are on a hexagon?

5

6

7

8

• A DTD declaration for that XML spec:

question))>

XML with DTD

Problems:

• Hard to limit the number of s to 6 maximum. One way will be to declare it

like:

(answer, answer, answer)|

(answer, answer, answer, answer)|

(answer, answer, answer, answer, answer)|

(answer, answer, answer, answer,answer,answer)))>

but do you want to?

Even with that, you still have to handle the requirement where might appear

after the s, so it will be:




(answer, answer, answer, answer,answer,answer)))|

(((answer, answer)|



https://www.permadi.com/tutorial/xmlExamples/quizDTD.xml


(answer, answer, answer, answer, answer, answer)), question))>

• The DTD does not limit the number of s that has the "correct" attribute. So

there might be an with 2 or more correct answers. Can this be solved without

changing the structure of the XML? Probably not.

• Create a Schema declaration for that XML spec:

Attempt 1:

Problems:

• must appear before s in this schema because it's declared in

.

• The attribute "correct" can be assigned any values, while we only want to

accept "y".

• This schema allows more than 1 correct s. Again, it might not be possible to

create a schema which prevents this.

Attempt 2:

This schema is much cleaner that before. We used a lot of unnamed types because we don't

need to reuse the types. "correctType" is a type that contains only 1 valid value, which is "y".

The element is declared within to allow to appear before or

after the s.

Problems:

• This schema allows more than 1 correct s.

UNIT-5

NOSQL

INTRODUCTION TO NOSQL:

A NoSQL originally referring to non SQL or non relational is a database that provides a mechanism

for storage and retrieval of data. This data is modeled in means other than the tabular relations used

in relational databases. Such databases came into existence in the late 1960s, but did not obtain the

NoSQL moniker until a surge of popularity in the early twenty-first century. NoSQL databases are

used in real-time web applications and big data and their use are increasing over time. NoSQL

systems are also sometimes called Not only SQL to emphasize the fact that they may support SQL-

like query languages.

A NoSQL database includes simplicity of design, simpler horizontal scaling to clusters of machines

and finer control over availability. The data structures used by NoSQL databases are different from

those used by default in relational databases which makes some operations faster in NoSQL. The

suitability of a given NoSQL database depends on the problem it should solve. Data structures used

by NoSQL databases are sometimes also viewed as more flexible than relational database tables.

Many NoSQL stores compromise consistency in favor of availability, speed and partition tolerance.

Barriers to the greater adoption of NoSQL stores include the use of low-level query languages, lack

of standardized interfaces, and huge previous investments in existing relational databases. Most

NoSQL stores lack true ACID(Atomicity, Consistency, Isolation, Durability) transactions but a few

databases, such as MarkLogic, Aerospike, FairCom c-treeACE, Google Spanner (though technically

a NewSQL database), Symas LMDB, and OrientDB have made them central to their designs.

Most NoSQL databases offer a concept of eventual consistency in which database changes are

propagated to all nodes so queries for data might not return updated data immediately or might result

in reading data that is not accurate which is a problem known as stale reads. Also some NoSQL

systems may exhibit lost writes and other forms of data loss. Some NoSQL systems provide

concepts such as write-ahead logging to avoid data loss. For distributed transaction processing across

multiple databases, data consistency is an even bigger challenge. This is difficult for both NoSQL

and relational databases. Even current relational databases do not allow referential integrity

constraints to span databases. There are few systems that maintain bo