©silberschatz, korth and sudarshan23.1database system concepts, 5 th ed. 1 chapter 23: advanced...

58
1 ©Silberschatz, Korth and Sudarsha 23.1 atabase System Concepts, 5 th Ed. Chapter 23: Chapter 23: Advanced Application Development Advanced Application Development

Upload: bathsheba-thompson

Post on 25-Dec-2015

224 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

1

©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5th Ed.

Chapter 23: Chapter 23: Advanced Application DevelopmentAdvanced Application Development

Page 2: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

2

©Silberschatz, Korth and Sudarshan23.2Database System Concepts, 5th Ed.

Chapter 1: Introduction Part 1: Relational databases

Chapter 2: Relational Model Chapter 3: SQL Chapter 4: Advanced SQL Chapter 5: Other Relational Languages

Part 2: Database Design Chapter 6: Database Design and the E-R Model Chapter 7: Relational Database Design Chapter 8: Application Design and Development

Part 3: Object-based databases and XML Chapter 9: Object-Based Databases Chapter 10: XML

Part 4: Data storage and querying Chapter 11: Storage and File Structure Chapter 12: Indexing and Hashing Chapter 13: Query Processing Chapter 14: Query Optimization

Part 5: Transaction management Chapter 15: Transactions Chapter 16: Concurrency control Chapter 17: Recovery System

Database System ConceptsDatabase System Concepts

Part 6: Data Mining and Information Retrieval Chapter 18: Data Analysis and Mining Chapter 19: Information Retreival

Part 7: Database system architecture Chapter 20: Database-System Architecture Chapter 21: Parallel Databases Chapter 22: Distributed Databases

Part 8: Other topics Chapter 23: Advanced Application Development Chapter 24: Advanced Data Types and New Applications Chapter 25: Advanced Transaction Processing

Part 9: Case studies Chapter 26: PostgreSQL Chapter 27: Oracle Chapter 28: IBM DB2 Chapter 29: Microsoft SQL Server

Online Appendices Appendix A: Network Model Appendix B: Hierarchical Model Appendix C: Advanced Relational Database Model

Page 3: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

3

©Silberschatz, Korth and Sudarshan23.3Database System Concepts, 5th Ed.

Part 8: Other topics Part 8: Other topics (Chapters 23 through 25).(Chapters 23 through 25).

Chapter 23: Advanced Application Development

covers performance benchmarks, performance tuning and standardization.

Chapter 24: Advanced Data Types and New Applications

covers advanced data types and new applications, including temporal data, spatial and geographic data, multimedia data, and issues in the management of mobile and personal databases.

Chapter 25: Advanced Transaction Processing

deals with advanced transaction processing. We discuss transaction-processing monitors, high-performance transaction systems, real-time transaction systems, and transactional workflows.

Page 4: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

4

©Silberschatz, Korth and Sudarshan23.4Database System Concepts, 5th Ed.

Table of ContentsTable of Contents

23.1 Performance Tuning

23.2 Performance Benchmarks

23.3 Standardization

23.4 Application Migration

23.5 Summary

Page 5: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

5

©Silberschatz, Korth and Sudarshan23.5Database System Concepts, 5th Ed.

DBMS

Application

Monitoring DBMS using Trigger

Limited Access by Authorization

WEB Interface: Servlet

JSP

Protecting DBMS by Security

User User

Web Browser

• Ch 8: Application Design & Development

Ch 23: Advanced Application Development

Performance

Standardization

Application Migration

Page 6: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

6

©Silberschatz, Korth and Sudarshan23.6Database System Concepts, 5th Ed.

Performance TuningPerformance Tuning Adjusting various parameters and design choices to improve system performance

for a specific application.

Tuning is best done by identifying bottlenecks & eliminating them. Can tune a database system at 3 levels:

Hardware

add disks to speed up I/O

add memory to increase buffer hits

move to a faster processor

Database system parameters

set buffer size to avoid paging of buffer

set checkpointing intervals to limit log size

system may have automatic tuning.

Higher level database design

such as the schema, indices and transactions (more later)

Page 7: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

7

©Silberschatz, Korth and Sudarshan23.7Database System Concepts, 5th Ed.

BottlenecksBottlenecks

Performance of most systems (at least before they are tuned) usually limited by performance of one or a few components These are called bottlenecks E.g. 80% of the code may take up 20% of time and 20% of code takes up 80%

of time Worth spending most time on 20% of code that take 80% of time Finding bottlenecks is very important

Bottlenecks may be in hardware or in software e.g. disks are very busy, CPU is idle

Removing one bottleneck often exposes another bottleneck De-bottlenecking consists of repeatedly finding bottlenecks, and removing them This is a heuristic

Page 8: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

8

©Silberschatz, Korth and Sudarshan23.8Database System Concepts, 5th Ed.

Identifying BottlenecksIdentifying Bottlenecks

Transactions request a sequence of services e.g. CPU, Disk I/O, locks

With concurrent transactions, transactions may have to wait for a requested service while other transactions are being served

Can model database as a queueing system with a queue for each service transactions repeatedly do the following

request a service wait in queue for the service get serviced

Bottlenecks in a database system typically show up as very high utilizations of a particular service Bottlenecks shows very long queues E.g. Disk utilization vs CPU utilization 100% utilization leads to very long waiting time:

Rule of thumb:

– design a system for about 70% utilization at peak load

– utilization over 90% should be avoided

Page 9: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

9

©Silberschatz, Korth and Sudarshan23.9Database System Concepts, 5th Ed.

Finding Bottlenecks: Finding Bottlenecks: Queues in aQueues in a Database SystemDatabase System

Page 10: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

10

©Silberschatz, Korth and Sudarshan23.10Database System Concepts, 5th Ed.

Tunable ParametersTunable Parameters

The lowest level is at the hardware level

Adding disks or using a RAID system

Adding more memory

Moving to a faster processor

The second level consists of the DBMS parameters

Buffer size and Checkpointing intervals

The third level is the highest level

Schema and Transactions

Indices and Materialized views

Page 11: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

11

©Silberschatz, Korth and Sudarshan23.11Database System Concepts, 5th Ed.

Tuning of HardwareTuning of Hardware

Even well-tuned transactions typically require a few I/O operations

Typical disk supports about 100 random I/O operations per second

Suppose each transaction requires just 2 random I/O operations.

Then to support n transactions per second, we need to stripe data across n/50 disks (ignoring skew)

If 1000 transactions / 1 sec is required, 20 disks are needed

Number of I/O operations per transaction can be reduced by keeping more data in memory

If all data is in memory, I/O would be needed only for writes

Keeping frequently used data in memory reduces disk accesses, reducing number of disks required, but has a memory cost

Page 12: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

12

©Silberschatz, Korth and Sudarshan23.12Database System Concepts, 5th Ed.

Hardware Tuning: MemoryHardware Tuning: Memory Question: which data to keep in memory:

The disk cost for 1 I/0 per second is price-per-disk-drive

accesses-per-second-per-disk Reducing 1 I/O saves this cost

If a page is accessed n times per second, the value of keeping the page in memory can be thought of as much as the disk cost for n I/Os per second n * price-per-disk-drive

accesses-per-second-per-disk Cost of keeping a page in memory (buying more memory)

price-per-megabyte-of-memory

pages-per-megabyte-of-memory

Break-even point: value of n for which above two costs are equal If accesses to the papge are more than n times better to keep the page in

the memory (buying more memory)

– saving by buying memory is greater than buying cost

Page 13: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

13

©Silberschatz, Korth and Sudarshan23.13Database System Concepts, 5th Ed.

Hardware Tuning: 5-Min & 1-Min RuleHardware Tuning: 5-Min & 1-Min Rule

Solving above equation with current disk and memory prices leads to:

5-minute rule: if a page that is randomly accessed is used more frequently than once in 5 minutes it should be kept in memory (by buying sufficient memory!)

For sequentially accessed data, more pages can be read per second

Assuming sequential reads of 1MB of data at a time:

1-minute rule: sequentially accessed data that is accessed once or more in a minute should be kept in memory (by buying sufficient memory!)

Prices of disk and memory have changed greatly over the years, but the ratios have not changed much

rules remain as 5 minute and 1 minute rules, not 1 hour or 1 second rules!

Page 14: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

14

©Silberschatz, Korth and Sudarshan23.14Database System Concepts, 5th Ed.

Hardware Tuning: Choice of RAID LevelHardware Tuning: Choice of RAID Level

To use RAID 1 or RAID 5?

Depends on ratio of reads and writes

RAID 5 requires 2 block reads and 2 block writes to write out one data block

If an application requires r reads and w writes per second

RAID 1 requires r + 2w I/O operations per second

RAID 5 requires: r + 4w I/O operations per second

For reasonably large r and w, this requires lots of disks to handle workload

RAID 5 may require more disks than RAID 1 to handle load!

Apparent saving of number of disks by RAID 5 (by using parity, as opposed to the mirroring done by RAID 1) may be illusory!

Thumb rule: RAID 5 is fine when writes are rare and data is very large, but RAID 1 is preferable otherwise

If you need more disks to handle I/O load, just mirror them since disk capacities these days are enormous!

Page 15: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

15

©Silberschatz, Korth and Sudarshan23.15Database System Concepts, 5th Ed.

Tuning of the SchemaTuning of the Schema

Vertically partition relations to isolate the data that is accessed most often -- only fetch needed information.

• E.g., split account (account-number, branch-name, balance) into two relations, (account-number, branch-name) and (account-number, balance).

• Branch-name need not be fetched unless required

Improve performance by storing a denormalized relation

• E.g., store join of account relation and depositor relation; branch-name and balance information is repeated for each holder of an account, but join need not be computed repeatedly.

• Price paid: more space and more work for programmer to keep relation consistent on updates

• better to use materialized views (more on this later..)

Cluster together on the same disk page records that would match in a frequently required join,

compute join very efficiently when required.

Page 16: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

16

©Silberschatz, Korth and Sudarshan23.16Database System Concepts, 5th Ed.

Tuning of IndicesTuning of Indices

Index tuning

Tradeoff between queries and updates

Create appropriate indices to speed up slow queries/updates

Speed up slow updates by removing excess indices

Choose type of index (B-tree / Hash) appropriate for most frequent queries

Choose which index to make clustered

Index tuning wizards (SW Tool)

Look at past history of queries and updates (the workload)

Recommend which indices would be best for the workload

Page 17: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

17

©Silberschatz, Korth and Sudarshan23.17Database System Concepts, 5th Ed.

Tuning using Materialized ViewsTuning using Materialized Views

Materialized views can help speed up certain queries

Particularly aggregate queries

Overheads

Space

Time for view maintenance

Immediate view maintenance: done as part of update transaction

– time overhead paid by update transaction

Deferred view maintenance: done only when required

– update transaction is not affected, but system time is spent on view maintenance

» until updated, the view may be out-of-date

Preferable to denormalized schema since view maintenance is systems responsibility, not programmers

Avoids inconsistencies caused by errors in update programs

Page 18: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

18

©Silberschatz, Korth and Sudarshan23.18Database System Concepts, 5th Ed.

Tuning using Materialized Views (Cont.)Tuning using Materialized Views (Cont.)

How to choose a set of materialized views

Helping one transaction type by introducing a materialized view may hurt others

Choice of materialized views depends on costs

Users often have no idea of actual cost of operations

Overall, manual selection of materialized views is tedious

Some database systems provide tools to help DBA choose views to materialize

“Materialized view selection wizards”

Page 19: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

19

©Silberschatz, Korth and Sudarshan23.19Database System Concepts, 5th Ed.

Automated Tuning of Physical DesignAutomated Tuning of Physical Design

Automated Tools

Index selection

Materialized View selection

Partition data in a parallel database system

Users specify information about the size of database and related statistics

MicroSoft’s Database Tuning Assistant

Allow users to pose “what-if” questions for selecting materialized view

Workload compression

Generate workload using a small number of queries and updates

Greedy heuristics and other various techniques

Page 20: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

20

©Silberschatz, Korth and Sudarshan23.20Database System Concepts, 5th Ed.

Tuning of TransactionsTuning of Transactions

Basic approaches to tuning of transactions Improve set orientation Reduce lock contention Split a large transaction

Rewriting of queries to improve performance was important in the past, but smart optimizers have made this less important

Improving set orientation Communication and query handling overhead of each call are significant

Set orientation fewer calls to database Combine multiple embedded SQL/ODBC/JDBC queries into a single set-

oriented query E.g. tune program that computes total salary for each department using a

separate SQL query by instead using a single query that computes total salaries for all department at once (using group by)

Use stored-procedures: avoids re-parsing and re-optimization of query

Page 21: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

21

©Silberschatz, Korth and Sudarshan23.21Database System Concepts, 5th Ed.

Tuning of Transactions (Cont.)Tuning of Transactions (Cont.)

Reducing lock contention

Long transactions (typically read-only) that examine large parts of a relation result in lock contention with update transactions

E.g. large query to compute bank statistics and regular bank transactions

Use multi-version concurrency control

E.g. Oracle “snapshots” which support multi-version 2PL

Use degree-two consistency (cursor-stability) for long transactions

Drawback: result may be approximate

Page 22: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

22

©Silberschatz, Korth and Sudarshan23.22Database System Concepts, 5th Ed.

Tuning of Transactions (Cont.)Tuning of Transactions (Cont.) Long update transactions cause several problems

Exhaust lock space Exhaust log space Also greatly increase recovery time after a crash

may exhaust log space even more during recovery if recovery algorithm is badly designed!

If a single large transaction updates every record of a very large relation, log may grow too big

• Split large transaction into a batch of ``mini-transactions'‘• Each mini transaction is suppose do perform a limited number of the updates

• Hold locks across transactions in a mini-batch to ensure serializability• If lock table size is a problem can release locks, but at the cost of

serializability

• In case of failure during a mini-batch• must complete its remaining portion on recovery, to ensure atomicity• support the concept of nested transaction

Page 23: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

23

©Silberschatz, Korth and Sudarshan23.23Database System Concepts, 5th Ed.

Performance SimulationPerformance Simulation

Performance simulation using queuing model useful to predict bottlenecks as well as the effects of tuning changes, even without access to real system

Queuing model as we saw earlier Models activities that go on in parallel

Simulation model is quite detailed, but usually omits some low level details Model service time, but disregard details of service E.g. approximate disk read time by using an average disk read time

Experiments can be run on model, and provide an estimate of measures such as average throughput/response time

After simulation, parameters can be tuned in model and then replicated in real system E.g. number of disks, memory, algorithms, etc

Page 24: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

24

©Silberschatz, Korth and Sudarshan23.24Database System Concepts, 5th Ed.

Table of ContentsTable of Contents

23.1 Performance Tuning

23.2 Performance Benchmarks

23.3 Standardization

23.4 Application Migration

23.5 Summary

Page 25: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

25

©Silberschatz, Korth and Sudarshan23.25Database System Concepts, 5th Ed.

Performance BenchmarksPerformance Benchmarks Suites of tasks used to quantify the performance of software systems

single task not enough for complex systems Important in comparing database systems, especially as systems become more

standards compliant. Commonly used performance measures:

Throughput (transactions per second, or TPS) Response time (delay from submission of transaction to return of result) Availability or mean time to failure (MTTF)

Need to beware when computing average throughput of different transaction types Suppose that a system A runs transaction type T1 at 99 tps and transaction

type T2 at 1 tps and another system B runs both T1 and T2 at 50 tps. The idea of simple averaging TPS is wrong!

– equal performance of two systems A and B (50 tps)! If we ran 50 transactions of each type

– System A: T1 (0.01 sec), T2 (1 sec) need 50.5 secs

– System B: T1 (0.02 sec), T2 (0.02) need 2 secs

Page 26: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

26

©Silberschatz, Korth and Sudarshan23.26Database System Concepts, 5th Ed.

Performance Benchmarks (Cont.)Performance Benchmarks (Cont.)

Instead, harmonic mean of n throughputs t1, t2, … tn:

n

system A (T1: 99 tps. T2: 1 tps), system B (T1: 50 tps. T2: 50 tps)

The harmonic mean for system A is 1.98 ( 2 / ((1/99) + (1/1)) ) while for system B, it is 50 (2 / ((1/ 50) + (1 / 50)) )

System B is 25 times faster than system A

Interference (e.g. lock contention) makes even this incorrect if different transaction types run concurrently

1/t1 + 1/t2 + … + 1/tn

Page 27: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

27

©Silberschatz, Korth and Sudarshan23.27Database System Concepts, 5th Ed.

Database Application ClassesDatabase Application Classes

Online transaction processing (OLTP)

requires high concurrency and clever techniques to speed up commit processing, to support a high rate of update transactions.

Decision support applications

including online analytical processing, or OLAP applications

require good query evaluation algorithms and query optimization.

Architecture of some database systems tuned to one of the two classes

E.g. Teradata DBMS is tuned to decision support

Others try to balance the two requirements

E.g. Oracle, with snapshot support for long read-only transaction

Page 28: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

28

©Silberschatz, Korth and Sudarshan23.28Database System Concepts, 5th Ed.

Transaction Benchmarks SuitesTransaction Benchmarks Suites The Transaction Processing Council (TPC) benchmark suites are widely used.

TPC-A and TPC-B: simple OLTP application modeling a bank teller application with and without communication Not used anymore

TPC-C: complex OLTP application modeling an inventory system Current standard for OLTP benchmarking

TPC-D: complex decision support application Superceded by TPC-H and TPC-R

TPC-H: (H for ad hoc) based on TPC-D with some extra queries Models ad hoc queries which are not known beforehand

– Total of 22 queries with emphasis on aggregation prohibits materialized views permits indices only on primary and foreign keys

TPC-R: (R for reporting) same as TPC-H, but without any restrictions on materialized views and indices

TPC-W: (W for Web) End-to-end Web service benchmark modeling a Web bookstore, with combination of static and dynamically generated pages

Page 29: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

29

©Silberschatz, Korth and Sudarshan23.29Database System Concepts, 5th Ed.

TPC Performance MeasuresTPC Performance Measures

TPC performance measures

transactions-per-second with specified constraints on response time

transactions-per-second-per-dollar accounts for cost of owning system

TPC benchmark requires database sizes to be scaled up with increasing transactions-per-second

reflects real world applications where more customers means more database size and more transactions-per-second

External audit of TPC performance numbers is mandatory

TPC performance claims can be trusted

Page 30: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

30

©Silberschatz, Korth and Sudarshan23.30Database System Concepts, 5th Ed.

Sample TPC Performance MeasuresSample TPC Performance Measures

Two types of tests for TPC-H (ad hoc) and TPC-R (report)

Power metric test

runs queries and updates sequentially

then takes mean to find queries per hour

Throughput metric test

runs queries and updates concurrently

multiple streams running in parallel: each stream generates queries, with one parallel update stream

the total time of the entire run

Composite “query per hour” metric

overall metric

square root of ( power metric X throughput metric )

Composite “price vs. performance” metric

system price / composite metric

Page 31: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

31

©Silberschatz, Korth and Sudarshan23.31Database System Concepts, 5th Ed.

Other BenchmarksOther Benchmarks

OODB transactions require a different set of benchmarks.

Reason: hard to define what is a typical OODB application

The object operational benchmark, version 1: 001 benchmark

OO7 benchmark has several different operations

provides a separate benchmark number for each kind of operation

– Traversing a connected objects

– Retrieving all objects in a class

Benchmarks for XML databases are now being discussed

Page 32: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

32

©Silberschatz, Korth and Sudarshan23.32Database System Concepts, 5th Ed.

Table of ContentsTable of Contents

23.1 Performance Tuning

23.2 Performance Benchmarks

23.3 Standardization

23.4 Application Migration

23.5 Summary

Page 33: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

33

©Silberschatz, Korth and Sudarshan23.33Database System Concepts, 5th Ed.

StandardizationStandardization

The complexity of contemporary software systems and the need for their interoperation require a variety of standards.

syntax and semantics of programming languages

functions in application program interfaces

data models (e.g. object-oriented / object-relational databases)

Formal standards

De facto standards

Anticipatory standards

Reactionary standards

Page 34: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

34

©Silberschatz, Korth and Sudarshan23.34Database System Concepts, 5th Ed.

Standardization (Cont.)Standardization (Cont.) Formal standards are standards developed by a standards organization (ANSI,

ISO), or by industry groups, through a public process.

De facto standards are generally accepted as standards without any formal process of recognition

Standards defined by dominant vendors (IBM, Microsoft) often become de facto standards

De facto standards often go through a formal process of recognition and become formal standards

Anticipatory standards lead the market place, defining features that vendors then implement

Ensure compatibility of future products

But at times become very large and unwieldy since standards bodies may not pay enough attention to ease of implementation (e.g.,SQL-92 or SQL:1999)

Reactionary standards attempt to standardize features that vendors have already implemented, possibly in different ways.

Can be hard to convince vendors to change already implemented features. E.g. OODB systems

Page 35: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

35

©Silberschatz, Korth and Sudarshan23.35Database System Concepts, 5th Ed.

SQL Standards HistorySQL Standards History

SQL developed by IBM in late 70s / early 80s

SQL-86: The first formal standard

IBM SAA standard for SQL in 1987

SQL-89 added features to SQL-86 that were already implemented in many systems

SQL-89 was a reactionary standard

SQL-92 added many new features to SQL-89 (anticipatory standard)

Defines levels of compliance (entry, intermediate and full)

Even now few database vendors have full SQL-92 implementation

SQL 1999

Adds variety of new features --- extended data types, object orientation, procedures, triggers, etc.

SQL 2003

Page 36: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

36

©Silberschatz, Korth and Sudarshan23.36Database System Concepts, 5th Ed.

SQL Standards History (Cont.)SQL Standards History (Cont.) The SQL: 2003 Standard

Part 1: SQL/Framework; overview Part 2: SQL/Foundation; types, schemas, DDL, DML, security, etc Part 3: SQL/CLI (Call Level Interface); API interface Part 4: SQL/PSM (Persistent Stored Modules); procedural extensions Part 9: SQL/MED (Management of External Data)

Interfacing of database to external data sources

– other databases and even files can be viewed as part of the database Part 10: SQL/OLB (Object Language Bindings); embedding SQL in Java Part 11: SQL Schemata(Information and Definition Schema) defines a standard

catalog interface Part 13: SQL/JRT (Java Routines and Types) defines standards for accessing

routines and types in Java Part 14: SQL/XML defines XML-related specifications Missing parts 5,6,7, 8, 12 cover features that are not near standardization yet

SQL/Bindings, Temporal Data, Distributed Transaction, Multimedia Data

Page 37: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

37

©Silberschatz, Korth and Sudarshan23.37Database System Concepts, 5th Ed.

Database Connectivity StandardsDatabase Connectivity Standards Open DataBase Connectivity (ODBC) standard for database interconnectivity

based on Call Level Interface (CLI) developed by X/Open consortium and the SQL Access Group

defines application programming interface (API), and SQL features that must be supported at different levels of compliance

JDBC standard used for interconnectivity between Java and Database X/Open XA standards by X/Open consortium

define transaction management standards for supporting distributed 2PC OLE-DB by Microsoft

API like ODBC, but to support non-db sources such as flat files or email stores OLE-DB program can negotiate with data source to find what features are

supported Interface language may be a subset of SQL

ADO (Active Data Objects) by Microsoft Easy-to-use interface to OLE-DB functionality Can be called from scripting languages such as VBScript and Jscript ADO.NET API is for application in the .NET languages such as C# and Visual

Basic.NET

Page 38: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

38

©Silberschatz, Korth and Sudarshan23.38Database System Concepts, 5th Ed.

Xopen / XA Xopen / XA 작동 그림예제작동 그림예제

Page 39: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

39

©Silberschatz, Korth and Sudarshan23.39Database System Concepts, 5th Ed.

OLE-DB & ADO OLE-DB & ADO 작동 그림예제작동 그림예제

Page 40: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

40

©Silberschatz, Korth and Sudarshan23.40Database System Concepts, 5th Ed.

Object Oriented Databases StandardsObject Oriented Databases Standards

Object Database Management Group (ODMG) standard for OODB

version 1 in 1993, version 2 in 1997, version 3 in 2000

provides language independent Object Definition Language (ODL) as well as several language-specific bindings

Object Management Group (OMG) standard for distributed SW based on objects

Object Request Broker (ORB) provides transparent message dispatch to distributed objects

Interface Definition Language (IDL) for defining language-independent data types

Common Object Request Broker Architecture (CORBA) defines specifications of ORB and IDL

Page 41: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

41

©Silberschatz, Korth and Sudarshan23.41Database System Concepts, 5th Ed.

CORBA CORBA 작동 그림예제작동 그림예제

Page 42: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

42

©Silberschatz, Korth and Sudarshan23.42Database System Concepts, 5th Ed.

XML-Based Application StandardsXML-Based Application Standards

Several XML based Standards for E-commerce E.g. RosettaNet (supply chain), BizTalk Define catalogs, service descriptions, invoices, purchase orders, etc. XML wrappers are used to export information from relational databases to XML

Simple Object Access Protocol (SOAP): XML based remote procedure call (RPC) standard

Uses XML to encode data (both parameters and results) HTTP as transport protocol (procedure call becomes HTTP request)

Endorsed by W3C Popular in B2B E-commerce Standards based on SOAP for specific applications

E.g. OLAP and Data Mining standards from Microsoft

DOM & SAX API are for general programming language to connect to XML Data

Page 43: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

43

©Silberschatz, Korth and Sudarshan23.43Database System Concepts, 5th Ed.

SOAP SOAP 작동 그림예제작동 그림예제

Page 44: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

44

©Silberschatz, Korth and Sudarshan23.44Database System Concepts, 5th Ed.

Table of ContentsTable of Contents

23.1 Performance Tuning

23.2 Performance Benchmarks

23.3 Standardization

23.4 Application Migration

23.5 Summary

Page 45: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

45

©Silberschatz, Korth and Sudarshan23.45Database System Concepts, 5th Ed.

Legacy SystemsLegacy Systems

Legacy systems are older-generation systems that are incompatible with current generation standards and systems but still in production use E.g. applications written in Cobol that run on mainframes

Today’s hot new system is tomorrows legacy system! Porting legacy system applications to a more modern environment is problematic

Rewriting code Very expensive, since legacy system may involve millions of lines of code,

written over decades Original programmers usually no longer available

Switching over from old system to new system is a problem Big-Bang Approach Chicken-little Approach

One approach: build a wrapper layer on top of legacy application to allow interoperation between newer systems and legacy application E.g. use ODBC or OLE-DB as wrapper

Page 46: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

46

©Silberschatz, Korth and Sudarshan23.46Database System Concepts, 5th Ed.

Legacy Systems (Cont.)Legacy Systems (Cont.)

Rewriting legacy application requires a first phase of understanding what it does

Often legacy code has no documentation or outdated documentation

reverse engineering: process of going over legacy code to

Come up with schema designs in ER or OO model

Find out what procedures and processes are implemented, to get a high level view of system

Re-engineering: reverse engineering followed by design of new system

Improvements are made on existing system design in this process

Page 47: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

47

©Silberschatz, Korth and Sudarshan23.47Database System Concepts, 5th Ed.

Migrating to a New SystemMigrating to a New System

Switching over from old to new system is a major problem

Production systems are in every day, generating new data

Stopping the system may bring all of a company’s activities to a halt, causing enormous losses

Big-bang approach:

1. Implement complete new system

2. Populate it with data from old system

1. No transactions while this step is executed

2. scripts are created to do this quickly

3. Shut down old system and start using new system

Danger with this approach: what if new code has bugs or performance problems, or missing features

Company may be brought to a halt

Page 48: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

48

©Silberschatz, Korth and Sudarshan23.48Database System Concepts, 5th Ed.

Migrating to a New System (cont.)Migrating to a New System (cont.)

Chicken-little approach:

Replace legacy system one piece at a time

Use wrappers to interoperate between legacy and new code

E.g. replace front end first, with wrappers on legacy backend

– Old front end can continue working in this phase in case of problems with new front end

Replace back end, one functional unit at a time

– All parts that share a database may have to be replaced together, or wrapper is needed on database also

Drawback: significant extra development effort to build wrappers and ensure smooth interoperation

Still worth it if company’s life depends on system

Page 49: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

49

©Silberschatz, Korth and Sudarshan23.49Database System Concepts, 5th Ed.

Table of ContentsTable of Contents

23.1 Performance Tuning

23.2 Performance Benchmarks

23.3 Standardization

23.4 Application Migration

23.5 Summary

Page 50: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

50

©Silberschatz, Korth and Sudarshan23.50Database System Concepts, 5th Ed.

Ch 21: Summary (1)Ch 21: Summary (1)

The Web browser has emerged as the most widely used user interface to databases.

HTML provides the ability to define interfaces that combine hyperlinks with forms facilities.

Web browsers communicate with Web servers by the HTTP protocol.

Web servers can pass on request to application programs, and return the results to the browser.

There are several client-side scripting languages - Java script is the most widely used-that provide richer user interaction at the browser end.

Web servers execute applications programs to implement desired functionality.

Servlets are a widely used mechanism to write application programs that run as part of the Web server process, in order to reduce overheads.

There are also many server-side scripting languages that are interpreted by the Web server and provide application program functionality as part of Web server.

Page 51: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

51

©Silberschatz, Korth and Sudarshan23.51Database System Concepts, 5th Ed.

Ch 21: Summary (2)Ch 21: Summary (2)

Tuning of the database (system parameters, as well as the higher-level database design-such as the schema, indices, and transactions) is important for good performance.

Tuning is best done by identifying bottlenecks and eliminating them.

Performance benchmarks play an important role in comparisons of database systems, especially as systems become more standards compliant.

The TPC benchmark suites are widely used, and the different TPC benchmarks are useful for our comparing the performance of databases under different workloads.

Standards are important because of the complexity of database systems and their need for interoperation.

Formal standards exist for SQL. Defacto standards, such as ODBC and JDBC, and standard adopted by industry groups, such as CORBA, have played an important role in the growth of client-server database systems.

Standards for object-oriented databases, such as ODMG, are being developed by industry groups.

Page 52: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

52

©Silberschatz, Korth and Sudarshan23.52Database System Concepts, 5th Ed.

Ch 21: Summary (3)Ch 21: Summary (3) E-commerce systems are fast becoming a core part of how commerce is performed.

There are several database issues in e-commerce systems. Catalog management, especially personalization of the catalog, is done with the databases.

Electronic marketplaces help in pricing of products through auctions, reverse auctions, or exchanges.

High-performance database systems are needed to handle such trading.

Orders are settled by electronic payment systems, which also need high-performance database systems to handle very high transaction rates.

Legacy systems are systems based on older-generation technologies such as nonrelational databases or even directly on file systems.

Interfacing legacy systems with new-generation systems is often important when the run mission-critical systems.

Migrating from legacy systems to new-generation systems must be done carefully to avoid disruptions, which can be very expensive.

Page 53: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

53

©Silberschatz, Korth and Sudarshan23.53Database System Concepts, 5th Ed.

Ch 21: Bibliographical Notes (1)Ch 21: Bibliographical Notes (1)

Information about servlets, including tutorials, standard specifications, and software, is available on java.sun.com/products/servlet.

Information about JSP is available at java.sun.com/products/jsp.

An early proposal for a database-system benchmark (the Wisconsin benchmark) was made by Bitton et al.[1983].

The TPC-A,-B, and –C benchmarks are described in Gray[1991].

An online version of all the TPC benchmarks descriptions, as well as benchmark results, is available on the World Wide Web at the URL www.tpc.org

the site also contains up-to-date information about new benchmark proposals.

O’Neil and O’Neil[2000] provides a very good textbook coverage of performance measurement and tuning.

The five minute and one minute rules are described in Gray and Putzolu[1987] and Gray and Graefe[1997].

Brown et al.[1994] describes an approach to automated tuning.

Page 54: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

54

©Silberschatz, Korth and Sudarshan23.54Database System Concepts, 5th Ed.

Ch 21: Bibliographical Notes (2)Ch 21: Bibliographical Notes (2) Index selection and materialized view selection are addressed by Ross et al.[1996],

Labio et al. [1997], Gupta [1997], Chaudhuri and Narasayya [1997], Agrawa et al. [2000] and Mistry et al. [2001].

Poess and Floyd[2000] gives an overview of the TPC-H, TPC-R, and TPC-W benchmarks.

The OO1 benchmark for OODBs is described in Cattell and Skeen [1992]; the OO7 benchmark is described in Carey et al. [1993].

Kleinrock [1975] and Kleinrock [1976] is a popular two-volume textbook on queuing theory.

Shasha [1992] provides a good overview of database tuning.

The American National Standard SQL-86 is described in ANSI[1986].

The IBM Systems Application Architecture definition of SQL is specified by IBM [1987].

The standards for SQL-89 and SQL-92 are available as ANSI [1989] and ANSI [1992] respectively.

For reference in the SQL:1999 standard, see bibliographical notes of Chapter 9.

Page 55: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

55

©Silberschatz, Korth and Sudarshan23.55Database System Concepts, 5th Ed.

Ch 21: Bibliographical Notes (3)Ch 21: Bibliographical Notes (3)

The X/Open SQL call-level interface is defined in X/Open [1993]; the ODBC API is described in Microsoft [1997] and Sanders [1998].

The X/Open XA interface is defined in X/Open[1991]. More information about ODBC, OLE-DB, and ADO can be found on the Web site www.microsoft.com.data, and in a number of books on the subject can be found through www.amazon.com.

The ODMG 3.0 standard is defined in Cattell [2000]. ACM Sigmod Record, which is published quarterly, has a regular section on standards in databases, including benchmark standards.

A wealth of information on XML based standards is available online.

You can use a Web search engine such as Google to search for more up-to-date information about the XML and other standards.

Loeb[1998] provides a detailed description of secure electronic transactions.

Business process reengineering is covered by Cook [1996]. Kirchmer [1999] describes application implementation using standard software such as Enterprise Resource Planning (ERP) packages.

Umar [1997] covers reengineering and issues in dealing with legacy systems.

Page 56: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

56

©Silberschatz, Korth and Sudarshan23.56Database System Concepts, 5th Ed.

Ch 21: ToolsCh 21: Tools

There are many Web development tools that support database connectivity through servlets, JSP, Javascript, or other mechanisms.

We list a few of the better-known ones here:

JAVA SDK from SUN (java.sun.com)

Apache’s Tomcat (jakarta.apache.org) and Webserver (apache.org)

IBM WebSphere (www.software.ibm.com)

Microsoft’s ASP tools (www.microsoft.com)

Allaire’s Coldfusion and JRun products (www.allaire.com)

Caucho’s Resin (www.caucho.com)

Zope (www.zope.org).

A few of these, such as Apache, are free for any use, some are free for noncommercial use or for personal use, while others need to be paid for

See the respective Web sites for more information.

Page 57: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

57

©Silberschatz, Korth and Sudarshan23.57Database System Concepts, 5th Ed.

Table of ContentsTable of Contents

23.1 Performance Tuning

23.2 Performance Benchmarks

23.3 Standardization

23.4 Application Migration

23.5 Summary

Page 58: ©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Advanced Application Development

58

Database System Concepts

©Silberschatz, Korth and SudarshanSee www.db-book.com for conditions on re-use

©Silberschatz, Korth and SudarshanDatabase System Concepts

End of ChapterEnd of Chapter