open source database management system€¦ · dbms is a software tool to organize (create,...
TRANSCRIPT
-
Open source DATABASE
MANAGEMENT SYSTEM
-
UNIT- I
DATABASE MANAGEMENT SYSTEM INTRODUCTION
What is DBMS?
A database management system (DBMS) refers to the technology for creating and managing
databases. DBMS is a software tool to organize (create, retrieve, update, and manage) data in a
database.Knowledge refers to the useful use of information. As you know, that information can
be transported, stored, and shared without any problems and difficulties, but the same cannot be
said about knowledge. Knowledge necessarily involves personal experience and
practice.Database systems are meant to handle an extensive collection of information.
Management of data involves both defining structures for storage of information and providing
mechanisms that can do the manipulation of those stored information. Moreover, the database
system must ensure the safety of the information stored, despite system crashes or attempts at
unauthorized access.
USES OF DBMS
• To develop software applications In less time.
• Data independence and efficient use of data.
• For uniform data administration.
• For data integrity and security.
• For concurrent access to data, and data recovery from crashes.
• To use user-friendly declarative query language.
Where is a Database Management System (DBMS) being Used?
• Airlines: reservations, schedules, etc
• Telecom: calls made, customer details, network usage, etc
• Universities: registration, results, grades, etc
• Sales: products, purchases, customers, etc
• Banking: all transactions etc
-
Advantages of DBMS
A DBMS manages data and has many benefits. These are:
• Data independence: Application programs should be as free or independent as possible from
details of data representation and storage. DBMS can supply an abstract view of the data for
insulating application code from such facts.
• Efficient data access: DBMS utilizes a mixture of sophisticated concepts and techniques for
storing and retrieving data competently. This feature becomes important in cases where the data
is stored on external storage devices.
• Data integrity and security: If data is accessed through the DBMS, the DBMS can enforce
integrity constraints on the data.
• Data administration: When several users share the data, integrating the administration of data
can offer significant improvements. Experienced professionals understand the nature of the data
being managed and can be responsible for organizing the data representation to reduce
redundancy and make the data to retrieve efficiently.
Components of DBMS
• Users: Users may be of any kind such as DB administrator, System developer, or database
users.
• Database application: Database application may be Departmental, Personal, organization's and /
or Internal.
• DBMS: Software that allows users to create and manipulate database access,
• Database: Collection of logical data as a single unit.
-
Data Models
Data Model is the modeling of the data description, data semantics, and consistency
constraints of the data. It provides the conceptual tools for describing the design of a database at
each level of data abstraction. Therefore, there are following four data models used for
understanding the structure of the database:
1) Relational Data Model: This type of model designs the data in the form of rows and columns
within a table. Thus, a relational model uses tables for representing data and in-between
relationships. Tables are also called relations. This model was initially described by Edgar F.
Codd, in 1969. The relational data model is the widely used model which is primarily used by
commercial data processing applications.
2) Entity-Relationship Data Model: An ER model is the logical representation of data as
objects and relationships among them. These objects are known as entities, and relationship is an
association among these entities. This model was designed by Peter Chen and published in 1976
papers. It was widely used in database designing. A set of attributes describe the entities. For
example, student_name, student_id describes the 'student' entity. A set of the same type of
entities is known as an 'Entity set', and the set of the same type of relationships is known as
'relationship set'.
3) Object-based Data Model: An extension of the ER model with notions of functions,
encapsulation, and object identity, as well. This model supports a rich type system that includes
structured and collection types. Thus, in 1980s, various database systems following the object-
oriented approach were developed. Here, the objects are nothing but the data carrying its
properties.
-
4) Semistructured Data Model: This type of data model is different from the other three data
models (explained above). The semistructured data model allows the data specifications at places
where the individual data items of the same type may have different attributes sets. The
Extensible Markup Language, also known as XML, is widely used for representing the
semistructured data. Although XML was initially designed for including the markup information
to the text document, it gains importance because of its application in the exchange of data.
INTRODUCTION TO SQL
SQL
o SQL stands for Structured Query Language. It is used for storing and managing data in
relational database management system (RDMS).
o It is a standard language for Relational Database System. It enables a user to create, read,
update and delete relational databases and tables.
o All the RDBMS like MySQL, Informix, Oracle, MS Access and SQL Server use SQL as
their standard database language.
o SQL allows users to query the database in a number of ways, using English-like
statements.
Rules:
SQL follows the following rules:
o Structure query language is not case sensitive. Generally, keywords of SQL are written in
uppercase.
o Statements of SQL are dependent on text lines. We can use a single SQL statement on
one or multiple text line.
o Using the SQL statements, you can perform most of the actions in a
o depends on tuple relational calculus and relational algebra.
SQL process:
o When an SQL command is executing for any RDBMS, then the system figure out the
best way to carry out the request and the SQL engine determines that how to interpret the
task.
o In the process, various components are included. These components can be optimization
Engine, Query engine, Query dispatcher, classic, etc.
o All the non-SQL queries are handled by the classic query engine, but SQL query engine
won't handle logical files.
-
Characteristics of SQL
o SQL is easy to learn.
o SQL is used to access data from relational database management systems.
o SQL can execute queries against the database.
o SQL is used to describe the data.
o SQL is used to define the data in the database and manipulate it when needed.
o SQL is used to create and drop the database and table.
o SQL is used to create a view, stored procedure, function in a database.
o SQL allows users to set permissions on tables, procedures, and views.
-
Integrity Constraints
o Integrity constraints are a set of rules. It is used to maintain the quality of information.
o Integrity constraints ensure that the data insertion, updating, and other processes have to
be performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the database.
Types of Integrity Constraint
1. Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for an
attribute.
o The data type of domain includes string, character, integer, time, date, currency, etc. The
value of the attribute must be available in the corresponding domain.
-
Example:
2. Entity integrity constraints
o The entity integrity constraint states that primary key value can't be null.
o This is because the primary key value is used to identify individual rows in relation and if
the primary key has a null value, then we can't identify those rows.
o A table can contain a null value other than the primary key field.
Example:
3. Referential Integrity Constraints
o A referential integrity constraint is specified between two tables.
o In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary
Key of Table 2, then every value of the Foreign Key in Table 1 must be null or be
available in Table 2.
-
Example:
4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set uniquely.
o An entity set can have multiple keys, but out of which one key will be the primary key. A
primary key can contain a unique and null value in the relational table.
Example:
-
Relational Model concept
Relational model can represent as a table with columns and rows. Each row is known as a tuple.
Each table of the column has a name or attribute.
Domain: It contains a set of atomic values that an attribute can take.
Attribute: It contains the name of a column in a particular table. Each attribute Ai must have a
domain, dom(Ai)
Relational instance: In the relational database system, the relational instance is represented by a
finite set of tuples. Relation instances do not have duplicate tuples.
Relational schema: A relational schema contains the name of the relation and name of all
columns or attributes.
Relational key: In the relational key, each row has one or more attributes. It can identify the row
in the relation uniquely.
In the given table, NAME, ROLL_NO, PHONE_NO, ADDRESS, and AGE are the
attributes.
NAME ROLL_NO PHONE_NO ADDRESS AGE
Ram 14795 7305758992 Noida 24
Shyam 12839 9026288936 Delhi 35
Laxman 33289 8583287182 Gurugram 20
Mahesh 27857 7086819134 Ghaziabad 27
Ganesh 17282 9028 9i3988 Delhi 40
-
o The instance of schema STUDENT has 5 tuples.
o t3 =
Properties of Relations
o Name of the relation is distinct from all other relations.
o Each relation cell contains exactly one atomic (single) value
o Each attribute contains a distinct name
o Attribute domain has no significance
o tuple has no duplicate value
o Order of tuple can have a different sequence
-
UNIT -II
What is a Database Transaction?
A transaction is a logical unit of processing in a DBMS which entails one or more database
access operation. In a nutshell, database transactions represent real-world events of any
enterprise.
All types of database access operation which are held between the beginning and end transaction
statements are considered as a single logical transaction. During the transaction the database is
inconsistent. Only once the database is committed the state is changed from one consistent state
to another.
In this tutorial, you will learn:
• What is a Database Transaction?
• Facts about Database Transactions
• Why do you need concurrency in Transactions?
• States of Transactions
• What are ACID Properties?
• Types of Transactions
• What is a Schedule?
Facts about Database Transactions
• A transaction is a program unit whose execution may or may not change the contents of a
database.
• The transaction is executed as a single unit
• If the database operations do not update the database but only retrieve data, this type of
transaction is called a read-only transaction.
• A successful transaction can change the database from one CONSISTENT STATE to
another
• DBMS transactions must be atomic, consistent, isolated and durable
• If the database were in an inconsistent state before a transaction, it would remain in the
inconsistent state after the transaction.
Why do you need concurrency in Transactions?
A database is a shared resource accessed. It is used by many users and processes concurrently.
For example, the banking system, railway, and air reservations systems, stock market
monitoring, supermarket inventory, and checkouts, etc.
Not managing concurrent access may create issues like:
https://www.guru99.com/images/1/100518_0500_DBMSTransac1.pnghttps://www.guru99.com/dbms-transaction-management.html#1https://www.guru99.com/dbms-transaction-management.html#2https://www.guru99.com/dbms-transaction-management.html#3https://www.guru99.com/dbms-transaction-management.html#4https://www.guru99.com/dbms-transaction-management.html#5https://www.guru99.com/dbms-transaction-management.html#6https://www.guru99.com/dbms-transaction-management.html#7
-
• Hardware failure and system crashes
• Concurrent execution of the same transaction, deadlock, or slow performance
States of Transactions
The various states of a Database Transaction are listed below
State Transaction types
Active State A transaction enters into an active state when the execution process
begins. During this state read or write operations can be performed.
Partially
Committed
A transaction goes into the partially committed state after the end of a
transaction.
Committed State When the transaction is committed to state, it has already completed its
execution successfully. Moreover, all of its changes are recorded to the
database permanently.
Failed State A transaction considers failed when any one of the checks fails or if the
transaction is aborted while it is in the active state.
Terminated State State of transaction reaches terminated state when certain transactions
which are leaving the system can't be restarted.
State Transition Diagram for a Database Transaction
Let's study a state transition diagram that highlights how a transaction moves between these
various states.
1. Once a transaction states execution, it becomes active. It can issue READ or WRITE operation.
2. Once the READ and WRITE operations complete, the transactions becomes partially committed state.
3. Next, some recovery protocols need to ensure that a system failure will not result in an inability to record changes in the transaction permanently. If this check is a success, the
transaction commits and enters into the committed state.
https://www.guru99.com/images/1/100518_0500_DBMSTransac6.png
-
4. If the check is a fail, the transaction goes to the Failed state. 5. If the transaction is aborted while it's in the active state, it goes to the failed state. The
transaction should be rolled back to undo the effect of its write operations on the
database.
6. The terminated state refers to the transaction leaving the system.
What are ACID Properties?
For maintaining the integrity of data, the DBMS system you have to ensure ACID properties.
ACID stands for Atomicity, Consistency, Isolation, and Durability.
• Atomicity: A transaction is a single unit of operation. You either execute it entirely or do
not execute it at all. There cannot be partial execution.
• Consistency: Once the transaction is executed, it should move from one consistent state
to another.
• Isolation: Transaction should be executed in isolation from other transactions (no
Locks). During concurrent transaction execution, intermediate transaction results from
simultaneously executed transactions should not be made available to each other. (Level
0,1,2,3)
• Durability: · After successful completion of a transaction, the changes in the database
should persist. Even in the case of system failures.
-
Example of ACID
Transaction 1: Begin X=X+50, Y = Y-50 END
Transaction 2: Begin X=1.1*X, Y=1.1*Y END
Transaction 1 is transferring $50 from account X to account Y.
Transaction 2 is crediting each account with a 10% interest payment.
If both transactions are submitted together, there is no guarantee that the Transaction 1 will
execute before Transaction 2 or vice versa. Irrespective of the order, the result must be as if the
transactions take place serially one after the other.
Types of Transactions
Based on Application areas
• Non-distributed vs. distributed
• Compensating transactions
• Transactions Timing
• On-line vs. batch
Based on Actions
• Two-step
• Restricted
• Action model
Based on Structure
• Flat or simple transactions: It consists of a sequence of primitive operations executed
between a begin and end operations.
• Nested transactions: A transaction that contains other transactions.
• Workflow
What is a Schedule?
A Schedule is a process creating a single group of the multiple parallel transactions and
executing them one by one. It should preserve the order in which the instructions appear in each
transaction. If two transactions are executed at the same time, the result of one transaction may
affect the output of other.
Example
Initial Product Quantity is 10
Transaction 1: Update Product Quantity to 50
-
Transaction 2: Read Product Quantity
If Transaction 2 is executed before Transaction 1, outdated information about the product
quantity will be read. Hence, schedules are required.
Parallel execution in a database is inevitable. But, Parallel execution is permitted when there is
an equivalence relation amongst the simultaneously executing transactions. This equivalence is
of 3 Types.
RESULT EQUIVALENCE:
If two schedules display the same result after execution, it is called result equivalent schedule.
They may offer the same result for some value and different results for another set of values. For
example, one transaction updates the product quantity, while other updates customer details.
What is Concurrency Control?
Concurrency control is the procedure in DBMS for managing simultaneous operations without
conflicting with each another. Concurrent access is quite easy if all users are just reading data.
There is no way they can interfere with one another. Though for any practical database, would
have a mix of reading and WRITE operations and hence the concurrency is a challenge.
Concurrency control is used to address such conflicts which mostly occur with a multi-user
system. It helps you to make sure that database transactions are performed concurrently without
violating the data integrity of respective databases.
Therefore, concurrency control is a most important element for the proper functioning of a
system where two or multiple database transactions that require access to the same data, are
executed simultaneously.
Lock-based Protocols
A lock is a data variable which is associated with a data item. This lock signifies that operations
that can be performed on the data item. Locks help synchronize access to the database items by
concurrent transactions.
All lock requests are made to the concurrency-control manager. Transactions proceed only once
the lock request is granted.
Binary Locks: A Binary lock on a data item can either locked or unlocked states.
Shared/exclusive: This type of locking mechanism separates the locks based on their uses. If a
lock is acquired on a data item to perform a write operation, it is called an exclusive lock.
-
1. Shared Lock (S):
A shared lock is also called a Read-only lock. With the shared lock, the data item can be shared
between transactions. This is because you will never have permission to update data on the data
item.
For example, consider a case where two transactions are reading the account balance of a person.
The database will let them read by placing a shared lock. However, if another transaction wants
to update that account's balance, shared lock prevent it until the reading process is over.
2. Exclusive Lock (X):
With the Exclusive Lock, a data item can be read as well as written. This is exclusive and can't
be held concurrently on the same data item. X-lock is requested using lock-x instruction.
Transactions may unlock the data item after finishing the 'write' operation.
For example, when a transaction needs to update the account balance of a person. You can
allows this transaction by placing X lock on it. Therefore, when the second transaction wants to
read or write, exclusive lock prevent this operation.
3. Simplistic Lock Protocol
This type of lock-based protocols allows transactions to obtain a lock on every object before
beginning operation. Transactions may unlock the data item after finishing the 'write' operation.
4. Pre-claiming Locking
Pre-claiming lock protocol helps to evaluate operations and create a list of required data items
which are needed to initiate an execution process. In the situation when all locks are granted, the
transaction executes. After that, all locks release when all of its operations are over.
Log-Based Recovery
o The log is a sequence of records. Log of each transaction is maintained in some stable
storage so that if any failure occurs, then it can be recovered from there.
o If any operation is performed on the database, then it will be recorded in the log.
o But the process of storing the logs should be done before the actual transaction is applied
in the database.
Let's assume there is a transaction to modify the City of a student. The following logs are written
for this transaction.
o When the transaction is initiated, then it writes 'start' log.
-
1.
o When the transaction modifies the City from 'Noida' to 'Bangalore', then another log is
written to the file.
1.
o When the transaction is finished, then it writes another log to indicate the end of the
transaction.
1.
There are two approaches to modify the database:
1. Deferred database modification:
o The deferred modification technique occurs if the transaction does not modify the
database until it has committed.
o In this method, all the logs are created and stored in the stable storage, and the database is
updated when a transaction commits.
2. Immediate database modification:
o The Immediate modification technique occurs if database modification occurs while the
transaction is still active.
o In this technique, the database is modified immediately after every operation. It follows
an actual database modification.
Recovery using Log records
When the system is crashed, then the system consults the log to find which transactions need to
be undone and which need to be redone.
1. If the log contains the record and or , then the
Transaction Ti needs to be redone.
2. If log contains record but does not contain the record either or
, then the Transaction Ti needs to be undone.
Crash Recovery
DBMS is a highly complex system with hundreds of transactions being executed every second.
The durability and robustness of a DBMS depends on its complex architecture and its
underlying hardware and system software. If it fails or crashes amid transactions, it is expected
that the system would follow some sort of algorithm or techniques to recover lost data.
-
Failure Classification
To see where the problem has occurred, we generalize a failure into various categories, as
follows −
Transaction failure
A transaction has to abort when it fails to execute or when it reaches a point from where it can’t
go any further. This is called transaction failure where only a few transactions or processes are
hurt.
Reasons for a transaction failure could be −
• Logical errors − Where a transaction cannot complete because it has some code error or
any internal error condition.
• System errors − Where the database system itself terminates an active transaction
because the DBMS is not able to execute it, or it has to stop because of some system
condition. For example, in case of deadlock or resource unavailability, the system aborts
an active transaction.
System Crash
There are problems − external to the system − that may cause the system to stop abruptly and
cause the system to crash. For example, interruptions in power supply may cause the failure of
underlying hardware or software failure.
Examples may include operating system errors.
Disk Failure
In early days of technology evolution, it was a common problem where hard-disk drives or
storage drives used to fail frequently.
Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or any
other failure, which destroys all or a part of disk storage.
Storage Structure
We have already described the storage system. In brief, the storage structure can be divided into
two categories −
• Volatile storage − As the name suggests, a volatile storage cannot survive system
crashes. Volatile storage devices are placed very close to the CPU; normally they are
embedded onto the chipset itself. For example, main memory and cache memory are
examples of volatile storage. They are fast but can store only a small amount of
information.
-
• Non-volatile storage − These memories are made to survive system crashes. They are
huge in data storage capacity, but slower in accessibility. Examples may include hard-
disks, magnetic tapes, flash memory, and non-volatile (battery backed up) RAM.
Recovery and Atomicity
When a system crashes, it may have several transactions being executed and various files
opened for them to modify the data items. Transactions are made of various operations, which
are atomic in nature. But according to ACID properties of DBMS, atomicity of transactions as a
whole must be maintained, that is, either all the operations are executed or none.
When a DBMS recovers from a crash, it should maintain the following −
• It should check the states of all the transactions, which were being executed.
• A transaction may be in the middle of some operation; the DBMS must ensure the
atomicity of the transaction in this case.
• It should check whether the transaction can be completed now or it needs to be rolled
back.
• No transactions would be allowed to leave the DBMS in an inconsistent state.
There are two types of techniques, which can help a DBMS in recovering as well as maintaining
the atomicity of a transaction −
• Maintaining the logs of each transaction, and writing them onto some stable storage
before actually modifying the database.
• Maintaining shadow paging, where the changes are done on a volatile memory, and later,
the actual database is updated.
Log-based Recovery
Log is a sequence of records, which maintains the records of actions performed by a transaction.
It is important that the logs are written prior to the actual modification and stored on a stable
storage media, which is failsafe.
Log-based recovery works as follows −
• The log file is kept on a stable storage media.
• When a transaction enters the system and starts execution, it writes a log about it.
• When the transaction modifies an item X, it write logs as follows −
It reads Tn has changed the value of X, from V1 to V2.
• When the transaction finishes, it logs −
-
The database can be modified using two approaches −
• Deferred database modification − All logs are written on to the stable storage and the
database is updated when a transaction commits.
• Immediate database modification − Each log follows an actual database modification.
That is, the database is modified immediately after every operation.
Recovery with Concurrent Transactions
When more than one transaction are being executed in parallel, the logs are interleaved. At the
time of recovery, it would become hard for the recovery system to backtrack all logs, and then
start recovering. To ease this situation, most modern DBMS use the concept of 'checkpoints'.
Checkpoint
Keeping and maintaining logs in real time and in real environment may fill out all the memory
space available in the system. As time passes, the log file may grow too big to be handled at all.
Checkpoint is a mechanism where all the previous logs are removed from the system and stored
permanently in a storage disk. Checkpoint declares a point before which the DBMS was in
consistent state, and all the transactions were committed.
Structures Used for Database Recovery Several structures of an Oracle database safeguard data against possible failures. The following
sections briefly introduce each of these structures and its role in database recovery.
Database Backups A database backup consists of operating system backups of the physical files that constitute an
Oracle database. To begin database recovery from a media failure, Oracle uses file backups to
restore damaged datafiles or control files.
Oracle offers several options in performing database backups; see Chapter 23, "Database
Backup", for more information.
The Redo Log The redo log, present for every Oracle database, records all changes made in an Oracle database.
The redo log of a database consists of at least two redo log files that are separate from the
datafiles (which actually store a database's data). As part of database recovery from an instance
or media failure, Oracle applies the appropriate changes in the database's redo log to the
datafiles, which updates database data to the instant that the failure occurred.
https://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch23.htm
-
A database's redo log can be comprised of two parts: the online redo log and the archived redo
log, discussed in the following sections.
The Online Redo Log Every Oracle database has an associated online redo log. The online redo
log works with the Oracle background process LGWR to immediately record all changes made
through the associated instance. The online redo log consists of two or more pre-allocated files
that are reused in a circular fashion to record ongoing database changes; see "The Online
Redo Log" for more information.
The Archived (Offline) Redo Log Optionally, you can configure an Oracle database to archive
files of the online redo log once they fill. The online redo log files that are archived are uniquely
identified and make up the archived redo log. By archiving filled online redo log files, older redo
log information is preserved for more extensive database recovery operations, while the pre-
allocated online redo log files continue to be reused to store the most current database changes;
see "The Archived Redo Log" page 22-16 for more information.
Rollback Segments
Rollback segments are used for a number of functions in the operation of an Oracle database. In
general, the rollback segments of a database store the old values of data changed by ongoing
transactions (that is, uncommitted transactions). Among other things, the information in a
rollback segment is used during database recovery to "undo" any "uncommitted" changes applied
from the redo log to the datafiles. Therefore, if database recovery is necessary, the data is in a
consistent state after the rollback segments are used to remove all uncommitted data from the
datafiles; see "Rollback Segments" for more information.
Control Files
In general, the control file(s) of a database store the status of the physical structure of the
database. Certain status information in the control file (for example, the current online redo log
file, the names of the datafiles, and so on) guides Oracle during instance or media recovery;
see "Control Files" for more information.
Checkpoint Explanation
o The checkpoint is a type of mechanism where all the previous logs are removed from the
system and permanently stored in the storage disk.
o The checkpoint is like a bookmark. While the execution of the transaction, such
checkpoints are marked, and the transaction is executed then using the steps of the
transaction, the log files will be created.
o When it reaches to the checkpoint, then the transaction will be updated into the database,
and till that point, the entire log file will be removed from the file. Then the log file is
updated with the new step of transaction till next checkpoint and so on.
https://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch22.htm#onlineredologhttps://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch22.htm#onlineredologhttps://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch22.htm#onlineredologhttps://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch22.htm#offlineredologhttps://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch22.htm#offlineredologhttps://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch3.htm#rbsegs%20-%20sectiohttps://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch22.htm#controlsectionhttps://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch3.htm#rbsegs%20-%20sectiohttps://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch22.htm#controlsection
-
o The checkpoint is used to declare a point before which the DBMS was in the consistent
state, and all transactions were committed.
Recovery using Checkpoint
In the following manner, a recovery system recovers the database from this failure:
o The recovery system reads log files from the end to start. It reads log files from T4 to T1.
o Recovery system maintains two lists, a redo-list, and an undo-list.
o The transaction is put into redo state if the recovery system sees a log with
and or just . In the redo-list and their previous list, all the
transactions are removed and then redone before saving their logs.
o For example: In the log file, transaction T2 and T3 will have and . The T1 transaction will have only in the log file. That's why the
transaction is committed after the checkpoint is crossed. Hence it puts T1, T2 and T3
transaction into redo list.
o The transaction is put into undo state if the recovery system sees a log with
but no commit or abort log found. In the undo-list, all the transactions are undone, and
their logs are removed.
o For example: Transaction T4 will have . So T4 will be put into undo list
since this transaction is not yet complete and failed amid.
-
Media Recovery
If you restore the archived redo log files and data files, then you must perform media recovery
before you can open the database. Any database transactions in the archived redo log files not
reflected in the data files are applied to the data files, bringing them to a transaction-consistent
state before the database is opened.
Media recovery requires a control file, data files (typically restored from backup), and online and
archived redo log files containing changes since the time the data files were backed up. Media
recovery is most often used to recover from media failure, such as the loss of a file or disk, or a
user error, such as the deletion of the contents of a table.
Media recovery can be a complete recovery or a point-in-time recovery. Complete recovery can
apply to individual datafiles, tablespaces, or the entire database. Point-in-time recovery applies to
the whole database (and also sometimes to individual tablespaces, with automation help from
Oracle Recover Manager (RMAN)).
In a complete recovery, you restore backup data files and apply all changes from the archived
and online redo log files to the data files. The database is returned to its state at the time of
failure and can be opened with no loss of data.
In a point-in-time recovery, you return a database to its contents at a user-selected time in the
past. You restore a backup of data files created before the target time and a complete set of
archived redo log files from backup creation through the target time. Recovery applies changes
between the backup time and the target time to the data files. All changes after the target time are
discarded.
RMAN enables you to perform both a complete and a point-in-time recovery of your database.
However, this documentation focuses on complete recovery.
-
UNIT 3
OBJECT BASED DATABASE AND XML
What is structure data type?
Structure type
A structured data type is a compound data type which falls under user-defined category and used
for grouping simple data types or other compound data types. This contains a sequence of
member variable names along with their type/attributes and they are enclosed within curl
brackets.
struct < struct name > {
< type > < member >;
};
Need for Struct data types
There are some situations when we need to group different types of variables in one group. Let's
see one situation here- we want to store the name, roll and age of a student.
unsigned int student_roll; char student_name [MAX_STRING]; unsigned int student_age;
Here we have a logical grouping between there three variables but still these three variables are
scattered. We are accessing three different variables for storing attribute values of a single
student. Now how to group this inside one logical entity, like three variables for a single student
grouped inside one variable. This type of grouping is called structure. One point to note here is
that array can group only same type elements and here structure has different types.
Example of Struct data types
Let's define these again with a structure type
struct student_t { unsigned int roll; char name[MAX_STRING]; unsigned int age; };
-
struct student_t student1; student1.roll = ; strcpy (student1.name, ); student1.age =
Structure size and memory layout
Structure is user defined type to group of different type of variables of either compiler
defined legacy types or other user defined types or mixed. Individual entity of a structure
element is called member. Members inside a structure are placed sequentially next to next
in the memory layout. Thus minimum size of a structure is the sum total of all sizes of
members, with considering padding.
Struct data types syntax
struct { ; ; ... ; };
Structure demo example
/* Structure type demo example program */ #include /* Structure type student */ struct student { char name[100]; char dept[100]; int rollno; float marks; }; /* Structure type main routine */
-
int main (int argc, char *argv[]) { /* declare struct variable */ struct student s1; printf("\nEnter the name, dept, roll number and marks of student:\n"); scanf("%s %s %d %f", s1.name, s1.dept, &s1.rollno, &s1.marks); printf("\nThe name, dept, roll number and marks of the student are:"); printf("\n%s %s %d %.2f",s1.name,s1.dept,s1.rollno,s1.marks); }
Program output
Enter the name, dept, roll number and marks of student:
Student1 ECE 1 96.5
The name, dept, roll number and marks of the student are:
Student1 ECE 1 96.50
OPERATIONS ON STRUCTURED DATA
Structured data is the data which conforms to a data model, has a well define structure, follows
a consistent order and can be easily accessed and used by a person or a computer program.
Structured data is usually stored in well-defined schemas such as Databases. It is generally
tabular with column and rows that clearly define its attributes.
SQL (Structured Query language) is often used to manage structured data stored in databases.
Characteristics of Structured Data: • Data conforms to a data model and has easily identifiable structure
• Data is stored in the form of rows and columns
Example : Database
• Data is well organised so, Definition, Format and Meaning of data is explicitly known
• Data resides in fixed fields within a record or file
• Similar entities are grouped together to form relations or classes
• Entities in the same group have same attributes
• Easy to access and query, So data can be easily used by other programs
• Data elements are addressable, so efficient to analyse and process
-
Sources of Structured Data: • SQL Databases
• Spreadsheets such as Excel
• OLTP Systems
• Online forms
• Sensors such as GPS or RFID tags
• Network and Web server logs
• Medical devices.
Advantages of Structured Data: • Structured data have a well defined structure that helps in easy storage and access of data
• Data can be indexed based on text string as well as attributes. This makes search
operation hassle-free
• Data mining is easy i.e knowledge can be easily extracted from data
• Operations such as Updating and deleting is easy due to well structured form of data
• Business Intelligence operations such as Data warehousing can be easily undertaken
• Easily scalable in case there is an increment of data
• Ensuring security to data is easy
ENCAPSULATION AND ADTs
An object-oriented database must provide support for all data types not just the built in data
types such as character, integer, and float. To understand abstract data types lets take two steps
back by taking off the abstract and then the data from abstract data type. We now have a type, a
type would be defined as a collection of a type values. A simple example of this is the Integer
type, it consists of values 0, 1, 2, 3, etc. If we add the word data back in we would define data
type as a type and the set of operations that will manipulate the type. If we expand off our
integer example, a data type would be an integer variable, an integer variable is a member of the
integer data type. Addition, subtraction, and multiplication are examples of operations that can
be performed on the integer data type.
If we now add the word abstract back in we can define an abstract data type (ADT) as a data
type, that is a type and the set of operations that will manipulate the type. The set of operations
are only defined by their inputs and outputs. The ADT does not specify how the data type will
be implemented, all of the ADT's details are hidden from the user of the ADT. This process of
hiding the details is called encapsulation. If we extend the example for the integer data type to
an abstract data type, the operations might be delete an integer, add an integer, print an integer,
and check to see if a certain integer exists. Notice that we do not care how the operation will be
done but simply how do invoke the operation.
Let's start by looking at traditional programming languages and the data types that they use.
Traditional languages are based on text and numerical data types, and you are limited to what
kinds of data types that the programming language will support. Variables that are used by the
programming language have to be defined using one of the supported data types. OT has done
-
away with the restrictions of just using these built in data types and allows you to create different
data types. Once these new data types are defined they are treated the same way as built in data
types. The ability to create new data types when needed and then use these data types is called
data abstraction, and the new data types are called abstract data types (ADTs).
An abstract data type is more than a set of values. When used to create an object, it can also
have method attached to it, and the details of these methods are hidden from the user. Data
abstraction and ADT's are a cornerstone for OT because they can be created as needed, and this
helps you to think of and design computer systems to more accurately reflect the way data types
are represented in the real world.
One of the main reasons why hierarchical, network and relational databases are being replaced is
their failure to support ADT's. These traditional databases have very strict rules foe the layout of
data and simply are not flexible enough to handle ADT's.
Encapsulation
Encapsulation gathers the data and methods of an object and puts them into a package, creating a
well defined boundary around the object. Encapsulation is often referred to as information
hiding, and encapsulation can be used to restrict which users and what operations can be
performed against the data inside the object.
Classes provide encapsulation or information hiding by access control. A class will grant or
deny access to its objects using the public and private access specifiers. Public members define
an interface between a class and the users of that class. Public members can be accessed by any
function in a program. Objects can contain both public and private variables,
the public variables are used with the objects methods or interfaces.
Private variables are only known to the object, and cannot be accessed by an interface. For
example a private method might be used to compute an internal value.
Encapsulation can be used in non-database object-oriented applications to guarantee that all
operations are done via the methods that the programmer has defined in the class definition,
insuring that data can not be changed outside of its own pre-defined methods. However,
declarative database languages such as SQL allows what might be called ?declarative" retrieval
and updates of data, and does not follow the rules of encapsulation. This is called an impedance
mismatch, and is inconsistent with object-oriented database management.
As an example, in a relational database we could define a behavior called ADD_ORDER which
will check to see if there is enough product in inventory for the order. The order object will not
be created if there was not enough product in inventory. This behavior will make sure that no
order is placed for product that is unavailable. However in a relational database, you could use
SQL and bypass this validity check and thereby add an invalid order into the database.
-
INHERITANCE
Inheritance enables you to share attributes between objects such that a subclass inherits attributes
from its parent class. OracleAS TopLink provides several methods to preserve inheritance
relationships, and enables you to override mappings that are specified in a superclass, or to map
attributes that are not mapped in the superclass. Subclasses must include the same database field
(or fields) as the parent class for their primary key (although the primary key can have different
names in these two tables). As a result, when you are mapping relationships to a subclass stored
in a separate table, the subclass table must include the parent table primary key, even if the
subclass primary key differs from the parent primary key.
This section describes OracleAS TopLink inheritance, and introduces several topics and
techniques to leverage inheritance in your own applications, including:
• Understanding Object Inheritance
• Representing Inheritance in the Database
• Class Types
• Class Indicators
• Class Extraction Methods
• Entity Bean Inheritance Restrictions
For more information about implementing inheritance in code, see "Implementing Inheritance in
Java".
Understanding Object Inheritance
Consider a simple database used by a courier company. It contains registration information for
three types of vehicles: trucks, cars, and bicycles. For each vehicle type, your application
requires the following information:
• VID (Vehicle Identification)
• LastMaint (mileage since last maintenance)
• LoadCap (load capacity)
If these are all the attributes shared by all vehicles in the application, then these attributes must
all appear in the super class, Vehicle. You can then build subclasses for each of the vehicle types
that reflects their differences. For example, the Truck class may have an attribute indicating
whether the local department of transportation considers it to be a commercial vehicle
(NumAxles), the Car class may require a NumPass (number of passengers) attribute, and the
Bicycle class, by virtue of its more limited range, may require a Location attribute. Through
inheritance, each vehicle automatically inherits the basic vehicle information, but by being
separate subclasses, also have unique characteristics.
Figure 3-6 Inheritance in a Courier Application
https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1131763https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1143147https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1142056https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1131778https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1131783https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1141997https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping008.htm#i1132200https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping008.htm#i1132200
-
Description of the illustration inhrex.gif
Representing Inheritance in the Database
You can represent inheritance in the database in one of two ways:
• Multiple tables that represent the parent class and each child class
• A single table that comprises the parent and all child classes
Figure 3-7 Inheritance in the Database in Individual Tables
Description of the illustration dbinhrt1.gif
If your database already represents the objects in the inheritance hierarchy this way, you can map
the objects and relationships without modifying the tables. However, it is most efficient to
represent all classes from a given inheritance hierarchy in a single table, because it substantially
reduces the number of table reads and eliminates joins when querying on objects in the
hierarchy.
Figure 3-8 Inheritance in the Database in a Single Table
Description of the illustration dbinhrt2.gif
To consolidate tables in the database this way, determine the class type of the objects represented
by the rows in the table. There are two ways to determine class type:
• If you can add columns to the database table, add a class indicator column that represents
the vehicle class type (Truck, Car, or Bicycle).
For more information about class indicators, see "Class Indicators".
• If you cannot modify the table, build a class extraction method that executes an
appropriate login to determine the class type.
For more information about class extraction methods, see "Class Extraction Methods".
Class Types
The OracleAS TopLink inheritance hierarchy includes three types of classes:
https://docs.oracle.com/cd/B14099_19/web.1012/b15901/img_text/inhrex.htmhttps://docs.oracle.com/cd/B14099_19/web.1012/b15901/img_text/dbinhrt1.htmhttps://docs.oracle.com/cd/B14099_19/web.1012/b15901/img_text/dbinhrt2.htmhttps://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1131778https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1131783
-
• Root Class
• Branch Class
• Leaf Class
Figure 3-9 Inheritance Hierarchy Class Types
Description of the illustration rtbrlf.gif
Root Class
The root class stores information for all instantiable classes in its subclass hierarchy. By default,
queries performed on the root class return instances of the root class and its instantiable
subclasses. However, you can also configure the root class to return only instances of itself,
without instances of its subclasses when queried. All class types beneath the root class inherit
from the root class.
Branch Class
Branch classes have a persistent superclass and subclasses. By default, queries performed on the
branch class return instances of the branch class and any of its subclasses. As with the root class,
you can configure the branch class to return only instances of itself, without instances of its
subclasses when queried. All classes below the branch class inherit attributes from the branch
class, including any attributes the branch class inherits from classes above it in the hierarchy.
Leaf Class
Leaf classes have a persistent superclass in the hierarchy, but do not have subclasses. Queries
performed on the leaf class return only instances of the leaf class.
What is Object?
Object consists of entity and attributes which can describe the state of real world object and
action associated with that object.
Characteristics of Object
https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1163782https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1163786https://docs.oracle.com/cd/B14099_19/web.1012/b15901/mapping003.htm#i1163790https://docs.oracle.com/cd/B14099_19/web.1012/b15901/img_text/rtbrlf.htm
-
Some important characteristics of an object are:
1. Object name
• The name is used to refer different objects in the program.
2. Object identifier
• This is the system generated identifier which is assigned, when a new object is created.
3. Structure of object
• Structure defines, how the object is constructed using constructor.
• In object oriented database the state of complex object can be constructed from other objects by
using certain type of constructor.
• The formal way of representing objects as (i,c,v) where 'i' is object identifier, 'c' is type
constructor and 'v' is current value of an object.
4. Transient object
• In OOPL, objects which are present only at the time of execution are called as transient object.
For example: Variables in OOPL
5. Persistent objects
• An object which exists even after the program is completely executed (or terminated), is called
as persistent objects. Object-oriented databases can store objects in secondary memory.
Object identity
• Every object has unique identity. In an object oriented system, when object is created OID is
assigned to it.
-
• In RDBMS OID is value based and primary key is used to provide uniqueness of each table in
relation. Primary key is unique only for that relation and not for the entire system. Primary key
is chosen from the attributes of the relation which makes object independent on the object state.
• In OODBMS OID are variable name or pointer.
Properties of OID
1. Uniqueness: OID cannot be same to every object in the system and it is generated
automatically by the system.
2. Invariant: OID cannot be changed throughout its entire lifetime.
3.Invisible: OID is not visible to user.
Attributes
Attributes are nothing but the properties of objects in the system.
Example: Employee can have attribute 'name' and 'address' with assigned values as:
Attribute Value
Name Radha
Address Pune
ID 07
Type of Attributes
The three types of attributes are as follows:
-
1. Simple attributes
Attributes can be of primitive data type such as, integer, string, real etc. which can take literal
value.
Example: 'ID' is simple attribute and value is 07.
2. Complex attributes
Attributes which consist of collections or reference of other multiple objects are called as
complex attributes.
Example: Collection of Employees consists of many employee names.
3. Reference attributes
Attributes that represent a relationship between objects and consist of value or collection of
values are called as reference attributes.
Example: Manager is reference of staff object.
The rich variety of data types in an ORDBMS offers a database designer many opportunities for
a more efficient design. As discussed in previous sections, an ORDBMS supports number of
much better solution compared to RDBMS and other databases.
• ORDBMS allows to store the video as an user-defined abstract data type (ADT) object
and write methods that capture any special manipulation that an user wish to perform. Allowing
users to define arbitrary new data types is a key feature of ORDBMs. The ORDBMS allows
users to store and retrieve objects of type jpeg-image which stores a compressed image
representing a single frame of film, just like an object of any other type, such as integer.
-
Common Implementation Challenges
There are a range of different issues and challenges that need to be addressed for successful
program implementation. Some of these challenges are particularly unique to rural communities.
Common challenges are described below, along with suggestions on how to address these
challenges:
• Resources and sustainability: Funding, technological, and human resources are
typically limited in rural communities. It can be particularly difficult to generate enough
start-up funds to sustain the program as it begins. Having a network of stakeholders and
partners in the community may be beneficial for providing resources and support for a
program.
• Geographic limitations: Geography influences a number of factors that can challenge
program implementation and operations (e.g., isolation and weather). Depending on the
type of program, setting, frequency of participation, and type of activities involved, these
challenges can become significant. This becomes a particularly important issue when
there is limited transportation access for the target population. This requires changes in
-
approaches and program design that take into account lengthy travel times, availability of
transportation, and opportunity to offer the program remotely or through other
technologies.
• Recruiting staff: Rural communities that are implementing rural health programs that
require physicians, dietitians, or physical therapists for example have faced barriers to
recruiting appropriately trained staff. Some programs work with volunteer or retired
practitioners, or students.
• Hard-to-reach populations: The priority population may be highly mobile. For
example, one rural health program was striving to provide care to two hard-to-reach
populations: Hispanic poultry workers and migrant farm workers. These populations
travel from camp to camp during different times each year, making it challenging to
reach them. Several rural health programs use mobile vans to provide traveling health
services.
• Cultural and social issues: A number of challenges to program success arise out of
unique cultural and social norms that influence expectations about the program and its
likelihood of success. Examples of these types of issues include:
o Deeply rooted traditions and cultures around food
o Lack of trust for medical professionals and outsiders
o Social beliefs around certain behaviors
It is critical for program implementers to make a conscious effort to recognize and
understand the population their program will serve, so they can develop appropriate
strategies. Involving members from the target population throughout the whole process
can help achieve cultural competency, encourage participation, and reduce social stigmas.
Implementers also may need to adapt materials, such as information packets, to ensure all
program materials are culturally appropriate.
• Language: Rural health programs may target communities with a large Hispanic or
immigrant population. Such programs need to ensure that their staff understands the
importance of providing services or public health education in a culturally appropriate
manner. In addition, programs may need to either employ staff proficient in Spanish or
other languages.
• Keeping the community motivated: Regardless of the community and populations
targeted in the program efforts, an awareness of health concerns needs to exist and
individual and organizational commitments are necessary toward making the changes
needed to address those concerns. It’s important for program planners to understand that
success will depend on conducting education and outreach efforts to determine
community members’ expectations about program impact and to motivate them to
achieve better health outcomes.
-
Difference between RDBMS , ORDBMS and OODBMS
RDBMS ,ORDBMS AND OODBMS
Compare RDBMS with ORDBMS.
S.No RDBMS ORDBMS
1 Relational Database Management Systems Object – Relational
Database Systems
2 Based on Relational Data Model Based on Object Data Model
(ODM)
3 Dominant model Gaining popularity
4
ORDBMS is an attempt to extend
relational database systems to
provide a bridge between the
relational and object-oriented
paradigms.
5 RDBMS support a small, fixed collection
of data types ( eg. Integers, dates, strings )
which has proven adequate for traditional
application domains such as administrative
data processing
ORDBMS is based on Object-
Oriented Database systems and
Relational Database systems and
are aimed at application domains
where complex objects play a
central role.
6 Supports Structured Query Language
( SQL )
Supports Object Query Language
( OQL )
SQL : 1999 standard extends SQL
to incorporate support for the
object-relational model of data
7 RDBMS products :
• IBM’s DB2
• Informix
• Oracle
• Sybase
Object-oriented model products:
• Objectstore
• Versant
Object-relational model products:
Used in DBMS products from
-
• Microsoft’s Access
• Fox Base
• Paradox
• Tandem
• Teradata
• IBM
• Informix
• Objectstore
• Oracle
• Versant
• Others
8 Supports Standard data types and additional
data types
Supports standard data types and
new richer data types.
The new richer data types
supported are
• User-defined data types that
supports image, voice and video
footage and these must be stored in
the database
• Inheritance data types to
inherit the commonality between
different types (eg. To inherit
some features of image objects
while defining compressed image
objects and low-resolution image
objects
• Object Identity data types
like references or pointers to
objects (eg video) for giving
objects a unique object identity,
which can be used to refer or point
to them from elsewhere in the data.
9 Case Scenario : Case Scenario :
9. Compare the similarities and differences between OODBMS and ORDBMS. In particular
compare OQL and SQL : 1999 and discuss the underlying data model.
OODBMS : Object-Oriented Database Management Systems
ORDBMS : Object-Relational Database Management Systems
Similarities
-
Both supports user-defined ADTs, structured types, object identity and reference types and
inheritance.
Both supports an extended form of SQL. OODBMS support ODL/OQL. ORDBMS support an
extended form of SQL.
ORDBMS consciously try to add OODBMS features to an RDBMS and OODBMS in turn have
developed query language based on relational query languages.
Both provide DBMS functionality such as concurrency control and recovery.
Differences
S.No OODBMS ORDBMS
1 OODBMSs aim to achieve seamless
integration with a programming language
such as C++, Java.
Such integration is not an important
goal for an ORDBMS.
2 An OODBMS is aimed at applications
where an object-centric viewpoint is
appropriate.
An ORDBMS is optimized for
applications in which large data
collections are the focus, even though
objects may have a rich structure and
be fairly large,
3 The query facilities of OQL are not
supported efficiently in most OODBMSs.
The query facilities are the
centerpiece of an ORDBMS.
XML
XML stands for Extensible Markup Language. It is a set of rules that define tags that break a
document into parts and identify the parts of the document. These tags define a syntax that can
then be used in combination with an XSL stylesheet to reconstruct the document.
The tags that are defined must follow the XML rules, but their content and arrangement can be
anything the developer wants. A file of XML text, arranged to represent a certain document, is
called an XML application. Oracle Access Manager OutputXML is an XML application,
designed to create HTML which will in turn present Oracle Access Manager pages to a browser.
-
Oracle Access Manager also uses XML as a structured way to provide some parameters that
control its operation. This is a different use than for OutputXML, but since the applications are
much shorter and the XML syntax rules are followed here as well, one of these files will serve as
an example. For example, frontpageadminparams.xml has the following content:
This indented presentation, showing the tag levels, is an automatic feature of Microsoft's Internet
Explorer. XML editors will also show the file in this way.
Some important parts of this file are the following:
This, the XML declaration, is the first line of any well-formed XML application. Internet
Explorer and some editors will not show the file as formatted XML unless this line is present.
The starting and ending ? make this an XML processing instruction. version="1.0" is an
attribute. Attributes are name-value pairs separated by an equals sign, which provide additional
information for the instruction. Currently there is only one version of XML.
ParamsCtlg is a tag, which starts the definition of the first element in the XML application. The
definition ends with the matching closing tag, which has the same form except it uses a / before
the tag name:
Everything between the starting and ending tags defines the element ParamsCtlg. Nested within
it is the element CompoundList, which has elements nested within it, and so on. An important
attribute is xmlns, which stands for XML namespace.This specifies an owner and possible
reference source for this XML application. We identify ourselves as creators of this application.
-
The technically precise way to write this element would have been
ParamName="top_frame" Value="_top"
However, when the definition is a short one like this, the XML rules allow use of an abbreviated
closing tag. /> indicates the closing tag for the immediately preceding start tag.
The attributes ParamName="top_frame" and Value="_top" provide the useful content of the file,
which is the name of a variable used by Oracle Access Manager and its value.
XML Schema is commonly known as XML Schema Definition (XSD). It is used to describe
and validate the structure and the content of XML data. XML schema defines the elements,
attributes and data types. Schema element supports Namespaces. It is similar to a database
schema that describes the data in a database.
Syntax
You need to declare a schema in your XML document as follows −
Example
The following example shows how to use schema −
The basic idea behind XML Schemas is that they describe the legitimate format that an XML
document can take.
Elements
As we saw in the XML - Elements chapter, elements are the building blocks of XML document.
An element can be defined within an XSD as follows −
https://www.tutorialspoint.com/xml/xml_elements.htm
-
Definition Types
You can define XML schema elements in the following ways −
Simple Type
Simple type element is used only in the context of the text. Some of the predefined simple types
are: xs:integer, xs:boolean, xs:string, xs:date. For example −
Complex Type
A complex type is a container for other element definitions. This allows you to specify which
child elements an element can contain and to provide some structure within your XML
documents. For example −
In the above example, Address element consists of child elements. This is a container for
other definitions, that allows to build a simple hierarchy of elements in the XML
document.
Global Types
With the global type, you can define a single type in your document, which can be used by all
other references. For example, suppose you want to generalize the person and company for
different addresses of the company. In such case, you can define a general type as follows −
Now let us use this type in our example as follows −
-
Instead of having to define the name and the company twice (once for Address1 and once
for Address2), we now have a single definition. This makes maintenance simpler, i.e., if you
decide to add "Postcode" elements to the address, you need to add them at just one place.
Querying and Transformation Given the increasing number of applications that use XML to exchange, mediate, and store data,
tools for effective management of XML data are becoming increasingly important. In particular,
tools for querying and transformation of XML data are essential to extract information from
large bodies of XML data, and to convert data between different representations (schemas) in
XML. Just as the output of a relational query is a relation, the output of an XML query can be an
XML document. As a result, querying and transformation can be combined into a single tool.
Several languages provide increasing degrees of querying and transformation capabilities:
• XPath is a language for path expressions, and is actually a building block for the remaining two
query languages.
• XSLT was designed to be a transformation language, as part of the XSL style sheet system,
which is used to control the formatting of XML data into HTML or other print or display
languages. Although designed for formatting, XSLT can generate XML as output, and can
express many interesting queries. Furthermore, it is currently the most widely available language
for manipulating XML data.
• XQuery has been proposed as a standard for querying of XML data. XQuery combines features
from many of the earlier proposals for querying XML, in particular the language Quilt.
A tree model of XML data is used in all these languages. An XML document is modeled as
a tree, with nodes corresponding to elements and attributes. Element nodes can have children
-
nodes, which can be subelements or attributes of the element. Correspondingly, each node
(whether attribute or element), other than the root element, has a parent node, which is an
element. The order of elements and attributes in the XML document is modeled by the ordering
of children of nodes of the tree. The terms parent, child, ancestor, descendant, and siblings are
interpreted in the tree model of XML data.
The text content of an element can be modeled as a text node child of the element. Elements
containing text broken up by intervening subelements can have multiple text node children. For
instance, an element containing “this is a wonderful book” would have a
subelement child corresponding to the element bold and two text node children corresponding to
“this is a” and “book”. Since such structures are not commonly used in database data, we shall
assume that elements do not contain both text and subelements.
XPath addresses parts of an XML document by means of path expressions. The lan- guage can
be viewed as an extension of the simple path expressions in object-oriented and object-relational
databases (See Section 9.5.1).
A path expression in XPath is a sequence of location steps separated by “/” (in- stead of the “.”
operator that separates steps in SQL:1999). The result of a path ex- pression is a set of values.
For instance, on the document in Figure 10.8, the XPath expression
would return the same names, but without the enclosing tags.
Like a directory hierarchy, the initial ’/’ indicates the root of the document. (Note that this is an
abstract root “above” that is the document tag.) Path expressions are evaluated from
left to right. As a path expression is evaluated, the result of the path at any point consists of a set
of nodes from the document.
When an element name, such as customer, appears before the next ’/’, it refers to all elements of
the specified name that are children of elements in the current element set. Since multiple
children can have the same name, the number of nodes in the node set can increase or decrease
with each step. Attribute values may also be accessed, using the “@” symbol. For instance,
/bank-2/account/@account-number returns a set of all values of account-number attributes of
account elements. By default, IDREF links are not followed; we shall see how to deal with
IDREFs later.
XPath supports a number of other features:
• Selection predicates may follow any step in a path, and are contained in square brackets. For
example,
http://lh3.googleusercontent.com/-2tEcjprd65w/VUpPP1g2fdI/AAAAAAABqXU/lACKL-Kqt8E/s1600-h/image%255B5%255D.png
-
The Application program interface
An Application Programming Interface (API) contains software building tools, subroutine
definitions as well as communication protocols that facilitate interaction between systems. An
API may be for a database system, operating system, computer hardware or a web-based system.
An Application Programming Interface makes it simpler to use certain technologies to build
applications for the programmers. API can include specifications for data structures, variables,
routines, object classes, remote calls etc.
A diagram that shows the API in the system is as follows −
Uses of Application Programming Interfaces
API’s are useful in many scenarios. Some of these are given in detail as follows −
Operating Systems
The interface between an operating system and an application is specified with an API. For
example- Posix has API’s that can convert an application written for one POSIX Operating
System to one that can be used on another POSIX operating system.
Libraries and Frameworks
Often API’s are related to software libraries. The API describes the behaviour of the system
while the libraries actually implement that behaviour. A single API can have multiple libraries as
it can have many different implementations. Sometimes, an API can be linked to a software
framework as well. A framework is based on many libraries that implement different API’s
whose behaviour is built into the framework.
Web APIs
The application programming interfaces for web servers or web browsers are known as web
API’s. These web API’s can be server side or client side.
Server side web APIs have an interface that contains endpoints which lead to request-response
message systems that are written in JSON or XML. Most of this is achieved using a HTTP web
-
server. Client side web API’s are used to extend the functionality of a web browser. Earlier they
were in the form of plug-in browser extensions but now JavaScript bindings are used.
Remote APIs
The remote application programming interfaces allow the programmers to manipulate remote
resources. Most remote API’s are required to maintain object abstraction in object oriented
programming. This can be done by executing a method call locally which then invokes the
corresponding method call on a remote object and gets the result locally as a return value.
Release policies for API
The policies for releasing API’s are private, partner and public. Details about these are given as
follows −
Private release policies
The application programming interfaces released under this policy are for private internal use by
the company.
Partner release policies
The application programming interfaces released under this policy can be used by the company
and its specific business partners. This means that the companies can control the quality of the
API, by monitoring the apps which have access to it.
Public release policies
The application programming interfaces released under public release policies are freely
available to the public. Some examples of this are Microsoft Windows API, Apple’s Cocoa and
Carbon API’s etc.
Storage of xml data
Character
Relational (shredded)
Native XML
Character
Storage options
◼ Large character fields in DBMS
◼ Flat files
◼ .xml files
-
Fast insert & retrieval
Poor search
RelationalData still stored as character
Portions of the data extracted into additional relational tables
Increased parse time
Increased search capabilities
Native XML
Exclusive XML DBMS
◼ Sedna
◼ Timber
Integrated XML DBMS
◼ DB2
◼ Oracle
Native XML Benefits
XML messages stored in their original format
Documents can be transformed straight from the database via XPath or XSLT.
Increased search capabilities for documents that must be stored as XML.
XML Applications
We've seen a lot of theory in this chapter, so I'm going to spend the rest of this chapter taking a
look at how XML is used today in the real world. The world of XML is huge these days; in fact,
XML is now used internally even in Netscape and Microsoft products, as well as installations of
-
programming languages such as Perl. You can find a good list of organizations that produce their
own XML-based languages.
It's useful and encouraging to see how XML is being used today in these XML-based languages.
Here's a new piece of terminology: As you know, XML is a metamarkup language, so it's
actually used to create languages. The languages so created are applications of XML; as a result,
they're called XML applications.
Note that the term XML application means an application of XML to a specific domain, such as
MathML, the mathematics markup language; it does not refer to a program that uses XML (a fact
that causes a lot of confusion among people who know nothing about XML).
Thousands of XML applications are around today, and we'll see some of them here. You can see
the advantage to various groups when defining their own markup languages. For example,
physicists or chemists can use the symbols and graphics of their discipline in customized
browsers. In fact, I'll start with Chemical Markup Language (CML) .
• Root element is .
• contains a and an element.
• contains one or more element.
• contains one , at least 2 s, and no more than 6 s.
• One of the answer must have an attribute "correct=y" which indicates the correct
answer.
• might appear before of after all the s.
The Example Quiz
In which continent is the country Japan located?
Asia
Europe
Africa
America
Tuna
Cow
Whale
-
Lobster
Which one cannot swim?
How many points are on a hexagon?
5
6
7
8
• A DTD declaration for that XML spec:
question))>
XML with DTD
Problems:
• Hard to limit the number of s to 6 maximum. One way will be to declare it
like:
(answer, answer, answer)|
(answer, answer, answer, answer)|
(answer, answer, answer, answer, answer)|
(answer, answer, answer, answer,answer,answer)))>
but do you want to?
Even with that, you still have to handle the requirement where might appear
after the s, so it will be:
(answer, answer, answer)|
(answer, answer, answer, answer)|
(answer, answer, answer, answer, answer)|
(answer, answer, answer, answer,answer,answer)))|
(((answer, answer)|
(answer, answer, answer)|
(answer, answer, answer, answer)|
https://www.permadi.com/tutorial/xmlExamples/quizDTD.xml
-
(answer, answer, answer, answer, answer)|
(answer, answer, answer, answer, answer, answer)), question))>
• The DTD does not limit the number of s that has the "correct" attribute. So
there might be an with 2 or more correct answers. Can this be solved without
changing the structure of the XML? Probably not.
• Create a Schema declaration for that XML spec:
Attempt 1:
-
Problems:
• must appear before s in this schema because it's declared in
.
• The attribute "correct" can be assigned any values, while we only want to
accept "y".
• This schema allows more than 1 correct s. Again, it might not be possible to
create a schema which prevents this.
Attempt 2:
-
This schema is much cleaner that before. We used a lot of unnamed types because we don't
need to reuse the types. "correctType" is a type that contains only 1 valid value, which is "y".
The element is declared within to allow to appear before or
after the s.
Problems:
• This schema allows more than 1 correct s.
-
UNIT-5
NOSQL
INTRODUCTION TO NOSQL:
A NoSQL originally referring to non SQL or non relational is a database that provides a mechanism
for storage and retrieval of data. This data is modeled in means other than the tabular relations used
in relational databases. Such databases came into existence in the late 1960s, but did not obtain the
NoSQL moniker until a surge of popularity in the early twenty-first century. NoSQL databases are
used in real-time web applications and big data and their use are increasing over time. NoSQL
systems are also sometimes called Not only SQL to emphasize the fact that they may support SQL-
like query languages.
A NoSQL database includes simplicity of design, simpler horizontal scaling to clusters of machines
and finer control over availability. The data structures used by NoSQL databases are different from
those used by default in relational databases which makes some operations faster in NoSQL. The
suitability of a given NoSQL database depends on the problem it should solve. Data structures used
by NoSQL databases are sometimes also viewed as more flexible than relational database tables.
-
Many NoSQL stores compromise consistency in favor of availability, speed and partition tolerance.
Barriers to the greater adoption of NoSQL stores include the use of low-level query languages, lack
of standardized interfaces, and huge previous investments in existing relational databases. Most
NoSQL stores lack true ACID(Atomicity, Consistency, Isolation, Durability) transactions but a few
databases, such as MarkLogic, Aerospike, FairCom c-treeACE, Google Spanner (though technically
a NewSQL database), Symas LMDB, and OrientDB have made them central to their designs.
Most NoSQL databases offer a concept of eventual consistency in which database changes are
propagated to all nodes so queries for data might not return updated data immediately or might result
in reading data that is not accurate which is a problem known as stale reads. Also some NoSQL
systems may exhibit lost writes and other forms of data loss. Some NoSQL systems provide
concepts such as write-ahead logging to avoid data loss. For distributed transaction processing across
multiple databases, data consistency is an even bigger challenge. This is difficult for both NoSQL
and relational databases. Even current relational databases do not allow referential integrity
constraints to span databases. There are few systems that maintain bo