©silberschatz, korth and sudarshan23.1database system concepts, 5 th ed. 1 chapter 23: advanced...
TRANSCRIPT
1
©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5th Ed.
Chapter 23: Chapter 23: Advanced Application DevelopmentAdvanced Application Development
2
©Silberschatz, Korth and Sudarshan23.2Database System Concepts, 5th Ed.
Chapter 1: Introduction Part 1: Relational databases
Chapter 2: Relational Model Chapter 3: SQL Chapter 4: Advanced SQL Chapter 5: Other Relational Languages
Part 2: Database Design Chapter 6: Database Design and the E-R Model Chapter 7: Relational Database Design Chapter 8: Application Design and Development
Part 3: Object-based databases and XML Chapter 9: Object-Based Databases Chapter 10: XML
Part 4: Data storage and querying Chapter 11: Storage and File Structure Chapter 12: Indexing and Hashing Chapter 13: Query Processing Chapter 14: Query Optimization
Part 5: Transaction management Chapter 15: Transactions Chapter 16: Concurrency control Chapter 17: Recovery System
Database System ConceptsDatabase System Concepts
Part 6: Data Mining and Information Retrieval Chapter 18: Data Analysis and Mining Chapter 19: Information Retreival
Part 7: Database system architecture Chapter 20: Database-System Architecture Chapter 21: Parallel Databases Chapter 22: Distributed Databases
Part 8: Other topics Chapter 23: Advanced Application Development Chapter 24: Advanced Data Types and New Applications Chapter 25: Advanced Transaction Processing
Part 9: Case studies Chapter 26: PostgreSQL Chapter 27: Oracle Chapter 28: IBM DB2 Chapter 29: Microsoft SQL Server
Online Appendices Appendix A: Network Model Appendix B: Hierarchical Model Appendix C: Advanced Relational Database Model
3
©Silberschatz, Korth and Sudarshan23.3Database System Concepts, 5th Ed.
Part 8: Other topics Part 8: Other topics (Chapters 23 through 25).(Chapters 23 through 25).
Chapter 23: Advanced Application Development
covers performance benchmarks, performance tuning and standardization.
Chapter 24: Advanced Data Types and New Applications
covers advanced data types and new applications, including temporal data, spatial and geographic data, multimedia data, and issues in the management of mobile and personal databases.
Chapter 25: Advanced Transaction Processing
deals with advanced transaction processing. We discuss transaction-processing monitors, high-performance transaction systems, real-time transaction systems, and transactional workflows.
4
©Silberschatz, Korth and Sudarshan23.4Database System Concepts, 5th Ed.
Table of ContentsTable of Contents
23.1 Performance Tuning
23.2 Performance Benchmarks
23.3 Standardization
23.4 Application Migration
23.5 Summary
5
©Silberschatz, Korth and Sudarshan23.5Database System Concepts, 5th Ed.
DBMS
Application
Monitoring DBMS using Trigger
Limited Access by Authorization
WEB Interface: Servlet
JSP
Protecting DBMS by Security
User User
Web Browser
• Ch 8: Application Design & Development
Ch 23: Advanced Application Development
Performance
Standardization
Application Migration
6
©Silberschatz, Korth and Sudarshan23.6Database System Concepts, 5th Ed.
Performance TuningPerformance Tuning Adjusting various parameters and design choices to improve system performance
for a specific application.
Tuning is best done by identifying bottlenecks & eliminating them. Can tune a database system at 3 levels:
Hardware
add disks to speed up I/O
add memory to increase buffer hits
move to a faster processor
Database system parameters
set buffer size to avoid paging of buffer
set checkpointing intervals to limit log size
system may have automatic tuning.
Higher level database design
such as the schema, indices and transactions (more later)
7
©Silberschatz, Korth and Sudarshan23.7Database System Concepts, 5th Ed.
BottlenecksBottlenecks
Performance of most systems (at least before they are tuned) usually limited by performance of one or a few components These are called bottlenecks E.g. 80% of the code may take up 20% of time and 20% of code takes up 80%
of time Worth spending most time on 20% of code that take 80% of time Finding bottlenecks is very important
Bottlenecks may be in hardware or in software e.g. disks are very busy, CPU is idle
Removing one bottleneck often exposes another bottleneck De-bottlenecking consists of repeatedly finding bottlenecks, and removing them This is a heuristic
8
©Silberschatz, Korth and Sudarshan23.8Database System Concepts, 5th Ed.
Identifying BottlenecksIdentifying Bottlenecks
Transactions request a sequence of services e.g. CPU, Disk I/O, locks
With concurrent transactions, transactions may have to wait for a requested service while other transactions are being served
Can model database as a queueing system with a queue for each service transactions repeatedly do the following
request a service wait in queue for the service get serviced
Bottlenecks in a database system typically show up as very high utilizations of a particular service Bottlenecks shows very long queues E.g. Disk utilization vs CPU utilization 100% utilization leads to very long waiting time:
Rule of thumb:
– design a system for about 70% utilization at peak load
– utilization over 90% should be avoided
9
©Silberschatz, Korth and Sudarshan23.9Database System Concepts, 5th Ed.
Finding Bottlenecks: Finding Bottlenecks: Queues in aQueues in a Database SystemDatabase System
10
©Silberschatz, Korth and Sudarshan23.10Database System Concepts, 5th Ed.
Tunable ParametersTunable Parameters
The lowest level is at the hardware level
Adding disks or using a RAID system
Adding more memory
Moving to a faster processor
The second level consists of the DBMS parameters
Buffer size and Checkpointing intervals
The third level is the highest level
Schema and Transactions
Indices and Materialized views
11
©Silberschatz, Korth and Sudarshan23.11Database System Concepts, 5th Ed.
Tuning of HardwareTuning of Hardware
Even well-tuned transactions typically require a few I/O operations
Typical disk supports about 100 random I/O operations per second
Suppose each transaction requires just 2 random I/O operations.
Then to support n transactions per second, we need to stripe data across n/50 disks (ignoring skew)
If 1000 transactions / 1 sec is required, 20 disks are needed
Number of I/O operations per transaction can be reduced by keeping more data in memory
If all data is in memory, I/O would be needed only for writes
Keeping frequently used data in memory reduces disk accesses, reducing number of disks required, but has a memory cost
12
©Silberschatz, Korth and Sudarshan23.12Database System Concepts, 5th Ed.
Hardware Tuning: MemoryHardware Tuning: Memory Question: which data to keep in memory:
The disk cost for 1 I/0 per second is price-per-disk-drive
accesses-per-second-per-disk Reducing 1 I/O saves this cost
If a page is accessed n times per second, the value of keeping the page in memory can be thought of as much as the disk cost for n I/Os per second n * price-per-disk-drive
accesses-per-second-per-disk Cost of keeping a page in memory (buying more memory)
price-per-megabyte-of-memory
pages-per-megabyte-of-memory
Break-even point: value of n for which above two costs are equal If accesses to the papge are more than n times better to keep the page in
the memory (buying more memory)
– saving by buying memory is greater than buying cost
13
©Silberschatz, Korth and Sudarshan23.13Database System Concepts, 5th Ed.
Hardware Tuning: 5-Min & 1-Min RuleHardware Tuning: 5-Min & 1-Min Rule
Solving above equation with current disk and memory prices leads to:
5-minute rule: if a page that is randomly accessed is used more frequently than once in 5 minutes it should be kept in memory (by buying sufficient memory!)
For sequentially accessed data, more pages can be read per second
Assuming sequential reads of 1MB of data at a time:
1-minute rule: sequentially accessed data that is accessed once or more in a minute should be kept in memory (by buying sufficient memory!)
Prices of disk and memory have changed greatly over the years, but the ratios have not changed much
rules remain as 5 minute and 1 minute rules, not 1 hour or 1 second rules!
14
©Silberschatz, Korth and Sudarshan23.14Database System Concepts, 5th Ed.
Hardware Tuning: Choice of RAID LevelHardware Tuning: Choice of RAID Level
To use RAID 1 or RAID 5?
Depends on ratio of reads and writes
RAID 5 requires 2 block reads and 2 block writes to write out one data block
If an application requires r reads and w writes per second
RAID 1 requires r + 2w I/O operations per second
RAID 5 requires: r + 4w I/O operations per second
For reasonably large r and w, this requires lots of disks to handle workload
RAID 5 may require more disks than RAID 1 to handle load!
Apparent saving of number of disks by RAID 5 (by using parity, as opposed to the mirroring done by RAID 1) may be illusory!
Thumb rule: RAID 5 is fine when writes are rare and data is very large, but RAID 1 is preferable otherwise
If you need more disks to handle I/O load, just mirror them since disk capacities these days are enormous!
15
©Silberschatz, Korth and Sudarshan23.15Database System Concepts, 5th Ed.
Tuning of the SchemaTuning of the Schema
Vertically partition relations to isolate the data that is accessed most often -- only fetch needed information.
• E.g., split account (account-number, branch-name, balance) into two relations, (account-number, branch-name) and (account-number, balance).
• Branch-name need not be fetched unless required
Improve performance by storing a denormalized relation
• E.g., store join of account relation and depositor relation; branch-name and balance information is repeated for each holder of an account, but join need not be computed repeatedly.
• Price paid: more space and more work for programmer to keep relation consistent on updates
• better to use materialized views (more on this later..)
Cluster together on the same disk page records that would match in a frequently required join,
compute join very efficiently when required.
16
©Silberschatz, Korth and Sudarshan23.16Database System Concepts, 5th Ed.
Tuning of IndicesTuning of Indices
Index tuning
Tradeoff between queries and updates
Create appropriate indices to speed up slow queries/updates
Speed up slow updates by removing excess indices
Choose type of index (B-tree / Hash) appropriate for most frequent queries
Choose which index to make clustered
Index tuning wizards (SW Tool)
Look at past history of queries and updates (the workload)
Recommend which indices would be best for the workload
17
©Silberschatz, Korth and Sudarshan23.17Database System Concepts, 5th Ed.
Tuning using Materialized ViewsTuning using Materialized Views
Materialized views can help speed up certain queries
Particularly aggregate queries
Overheads
Space
Time for view maintenance
Immediate view maintenance: done as part of update transaction
– time overhead paid by update transaction
Deferred view maintenance: done only when required
– update transaction is not affected, but system time is spent on view maintenance
» until updated, the view may be out-of-date
Preferable to denormalized schema since view maintenance is systems responsibility, not programmers
Avoids inconsistencies caused by errors in update programs
18
©Silberschatz, Korth and Sudarshan23.18Database System Concepts, 5th Ed.
Tuning using Materialized Views (Cont.)Tuning using Materialized Views (Cont.)
How to choose a set of materialized views
Helping one transaction type by introducing a materialized view may hurt others
Choice of materialized views depends on costs
Users often have no idea of actual cost of operations
Overall, manual selection of materialized views is tedious
Some database systems provide tools to help DBA choose views to materialize
“Materialized view selection wizards”
19
©Silberschatz, Korth and Sudarshan23.19Database System Concepts, 5th Ed.
Automated Tuning of Physical DesignAutomated Tuning of Physical Design
Automated Tools
Index selection
Materialized View selection
Partition data in a parallel database system
Users specify information about the size of database and related statistics
MicroSoft’s Database Tuning Assistant
Allow users to pose “what-if” questions for selecting materialized view
Workload compression
Generate workload using a small number of queries and updates
Greedy heuristics and other various techniques
20
©Silberschatz, Korth and Sudarshan23.20Database System Concepts, 5th Ed.
Tuning of TransactionsTuning of Transactions
Basic approaches to tuning of transactions Improve set orientation Reduce lock contention Split a large transaction
Rewriting of queries to improve performance was important in the past, but smart optimizers have made this less important
Improving set orientation Communication and query handling overhead of each call are significant
Set orientation fewer calls to database Combine multiple embedded SQL/ODBC/JDBC queries into a single set-
oriented query E.g. tune program that computes total salary for each department using a
separate SQL query by instead using a single query that computes total salaries for all department at once (using group by)
Use stored-procedures: avoids re-parsing and re-optimization of query
21
©Silberschatz, Korth and Sudarshan23.21Database System Concepts, 5th Ed.
Tuning of Transactions (Cont.)Tuning of Transactions (Cont.)
Reducing lock contention
Long transactions (typically read-only) that examine large parts of a relation result in lock contention with update transactions
E.g. large query to compute bank statistics and regular bank transactions
Use multi-version concurrency control
E.g. Oracle “snapshots” which support multi-version 2PL
Use degree-two consistency (cursor-stability) for long transactions
Drawback: result may be approximate
22
©Silberschatz, Korth and Sudarshan23.22Database System Concepts, 5th Ed.
Tuning of Transactions (Cont.)Tuning of Transactions (Cont.) Long update transactions cause several problems
Exhaust lock space Exhaust log space Also greatly increase recovery time after a crash
may exhaust log space even more during recovery if recovery algorithm is badly designed!
If a single large transaction updates every record of a very large relation, log may grow too big
• Split large transaction into a batch of ``mini-transactions'‘• Each mini transaction is suppose do perform a limited number of the updates
• Hold locks across transactions in a mini-batch to ensure serializability• If lock table size is a problem can release locks, but at the cost of
serializability
• In case of failure during a mini-batch• must complete its remaining portion on recovery, to ensure atomicity• support the concept of nested transaction
23
©Silberschatz, Korth and Sudarshan23.23Database System Concepts, 5th Ed.
Performance SimulationPerformance Simulation
Performance simulation using queuing model useful to predict bottlenecks as well as the effects of tuning changes, even without access to real system
Queuing model as we saw earlier Models activities that go on in parallel
Simulation model is quite detailed, but usually omits some low level details Model service time, but disregard details of service E.g. approximate disk read time by using an average disk read time
Experiments can be run on model, and provide an estimate of measures such as average throughput/response time
After simulation, parameters can be tuned in model and then replicated in real system E.g. number of disks, memory, algorithms, etc
24
©Silberschatz, Korth and Sudarshan23.24Database System Concepts, 5th Ed.
Table of ContentsTable of Contents
23.1 Performance Tuning
23.2 Performance Benchmarks
23.3 Standardization
23.4 Application Migration
23.5 Summary
25
©Silberschatz, Korth and Sudarshan23.25Database System Concepts, 5th Ed.
Performance BenchmarksPerformance Benchmarks Suites of tasks used to quantify the performance of software systems
single task not enough for complex systems Important in comparing database systems, especially as systems become more
standards compliant. Commonly used performance measures:
Throughput (transactions per second, or TPS) Response time (delay from submission of transaction to return of result) Availability or mean time to failure (MTTF)
Need to beware when computing average throughput of different transaction types Suppose that a system A runs transaction type T1 at 99 tps and transaction
type T2 at 1 tps and another system B runs both T1 and T2 at 50 tps. The idea of simple averaging TPS is wrong!
– equal performance of two systems A and B (50 tps)! If we ran 50 transactions of each type
– System A: T1 (0.01 sec), T2 (1 sec) need 50.5 secs
– System B: T1 (0.02 sec), T2 (0.02) need 2 secs
26
©Silberschatz, Korth and Sudarshan23.26Database System Concepts, 5th Ed.
Performance Benchmarks (Cont.)Performance Benchmarks (Cont.)
Instead, harmonic mean of n throughputs t1, t2, … tn:
n
system A (T1: 99 tps. T2: 1 tps), system B (T1: 50 tps. T2: 50 tps)
The harmonic mean for system A is 1.98 ( 2 / ((1/99) + (1/1)) ) while for system B, it is 50 (2 / ((1/ 50) + (1 / 50)) )
System B is 25 times faster than system A
Interference (e.g. lock contention) makes even this incorrect if different transaction types run concurrently
1/t1 + 1/t2 + … + 1/tn
27
©Silberschatz, Korth and Sudarshan23.27Database System Concepts, 5th Ed.
Database Application ClassesDatabase Application Classes
Online transaction processing (OLTP)
requires high concurrency and clever techniques to speed up commit processing, to support a high rate of update transactions.
Decision support applications
including online analytical processing, or OLAP applications
require good query evaluation algorithms and query optimization.
Architecture of some database systems tuned to one of the two classes
E.g. Teradata DBMS is tuned to decision support
Others try to balance the two requirements
E.g. Oracle, with snapshot support for long read-only transaction
28
©Silberschatz, Korth and Sudarshan23.28Database System Concepts, 5th Ed.
Transaction Benchmarks SuitesTransaction Benchmarks Suites The Transaction Processing Council (TPC) benchmark suites are widely used.
TPC-A and TPC-B: simple OLTP application modeling a bank teller application with and without communication Not used anymore
TPC-C: complex OLTP application modeling an inventory system Current standard for OLTP benchmarking
TPC-D: complex decision support application Superceded by TPC-H and TPC-R
TPC-H: (H for ad hoc) based on TPC-D with some extra queries Models ad hoc queries which are not known beforehand
– Total of 22 queries with emphasis on aggregation prohibits materialized views permits indices only on primary and foreign keys
TPC-R: (R for reporting) same as TPC-H, but without any restrictions on materialized views and indices
TPC-W: (W for Web) End-to-end Web service benchmark modeling a Web bookstore, with combination of static and dynamically generated pages
29
©Silberschatz, Korth and Sudarshan23.29Database System Concepts, 5th Ed.
TPC Performance MeasuresTPC Performance Measures
TPC performance measures
transactions-per-second with specified constraints on response time
transactions-per-second-per-dollar accounts for cost of owning system
TPC benchmark requires database sizes to be scaled up with increasing transactions-per-second
reflects real world applications where more customers means more database size and more transactions-per-second
External audit of TPC performance numbers is mandatory
TPC performance claims can be trusted
30
©Silberschatz, Korth and Sudarshan23.30Database System Concepts, 5th Ed.
Sample TPC Performance MeasuresSample TPC Performance Measures
Two types of tests for TPC-H (ad hoc) and TPC-R (report)
Power metric test
runs queries and updates sequentially
then takes mean to find queries per hour
Throughput metric test
runs queries and updates concurrently
multiple streams running in parallel: each stream generates queries, with one parallel update stream
the total time of the entire run
Composite “query per hour” metric
overall metric
square root of ( power metric X throughput metric )
Composite “price vs. performance” metric
system price / composite metric
31
©Silberschatz, Korth and Sudarshan23.31Database System Concepts, 5th Ed.
Other BenchmarksOther Benchmarks
OODB transactions require a different set of benchmarks.
Reason: hard to define what is a typical OODB application
The object operational benchmark, version 1: 001 benchmark
OO7 benchmark has several different operations
provides a separate benchmark number for each kind of operation
– Traversing a connected objects
– Retrieving all objects in a class
Benchmarks for XML databases are now being discussed
32
©Silberschatz, Korth and Sudarshan23.32Database System Concepts, 5th Ed.
Table of ContentsTable of Contents
23.1 Performance Tuning
23.2 Performance Benchmarks
23.3 Standardization
23.4 Application Migration
23.5 Summary
33
©Silberschatz, Korth and Sudarshan23.33Database System Concepts, 5th Ed.
StandardizationStandardization
The complexity of contemporary software systems and the need for their interoperation require a variety of standards.
syntax and semantics of programming languages
functions in application program interfaces
data models (e.g. object-oriented / object-relational databases)
Formal standards
De facto standards
Anticipatory standards
Reactionary standards
34
©Silberschatz, Korth and Sudarshan23.34Database System Concepts, 5th Ed.
Standardization (Cont.)Standardization (Cont.) Formal standards are standards developed by a standards organization (ANSI,
ISO), or by industry groups, through a public process.
De facto standards are generally accepted as standards without any formal process of recognition
Standards defined by dominant vendors (IBM, Microsoft) often become de facto standards
De facto standards often go through a formal process of recognition and become formal standards
Anticipatory standards lead the market place, defining features that vendors then implement
Ensure compatibility of future products
But at times become very large and unwieldy since standards bodies may not pay enough attention to ease of implementation (e.g.,SQL-92 or SQL:1999)
Reactionary standards attempt to standardize features that vendors have already implemented, possibly in different ways.
Can be hard to convince vendors to change already implemented features. E.g. OODB systems
35
©Silberschatz, Korth and Sudarshan23.35Database System Concepts, 5th Ed.
SQL Standards HistorySQL Standards History
SQL developed by IBM in late 70s / early 80s
SQL-86: The first formal standard
IBM SAA standard for SQL in 1987
SQL-89 added features to SQL-86 that were already implemented in many systems
SQL-89 was a reactionary standard
SQL-92 added many new features to SQL-89 (anticipatory standard)
Defines levels of compliance (entry, intermediate and full)
Even now few database vendors have full SQL-92 implementation
SQL 1999
Adds variety of new features --- extended data types, object orientation, procedures, triggers, etc.
SQL 2003
36
©Silberschatz, Korth and Sudarshan23.36Database System Concepts, 5th Ed.
SQL Standards History (Cont.)SQL Standards History (Cont.) The SQL: 2003 Standard
Part 1: SQL/Framework; overview Part 2: SQL/Foundation; types, schemas, DDL, DML, security, etc Part 3: SQL/CLI (Call Level Interface); API interface Part 4: SQL/PSM (Persistent Stored Modules); procedural extensions Part 9: SQL/MED (Management of External Data)
Interfacing of database to external data sources
– other databases and even files can be viewed as part of the database Part 10: SQL/OLB (Object Language Bindings); embedding SQL in Java Part 11: SQL Schemata(Information and Definition Schema) defines a standard
catalog interface Part 13: SQL/JRT (Java Routines and Types) defines standards for accessing
routines and types in Java Part 14: SQL/XML defines XML-related specifications Missing parts 5,6,7, 8, 12 cover features that are not near standardization yet
SQL/Bindings, Temporal Data, Distributed Transaction, Multimedia Data
37
©Silberschatz, Korth and Sudarshan23.37Database System Concepts, 5th Ed.
Database Connectivity StandardsDatabase Connectivity Standards Open DataBase Connectivity (ODBC) standard for database interconnectivity
based on Call Level Interface (CLI) developed by X/Open consortium and the SQL Access Group
defines application programming interface (API), and SQL features that must be supported at different levels of compliance
JDBC standard used for interconnectivity between Java and Database X/Open XA standards by X/Open consortium
define transaction management standards for supporting distributed 2PC OLE-DB by Microsoft
API like ODBC, but to support non-db sources such as flat files or email stores OLE-DB program can negotiate with data source to find what features are
supported Interface language may be a subset of SQL
ADO (Active Data Objects) by Microsoft Easy-to-use interface to OLE-DB functionality Can be called from scripting languages such as VBScript and Jscript ADO.NET API is for application in the .NET languages such as C# and Visual
Basic.NET
38
©Silberschatz, Korth and Sudarshan23.38Database System Concepts, 5th Ed.
Xopen / XA Xopen / XA 작동 그림예제작동 그림예제
39
©Silberschatz, Korth and Sudarshan23.39Database System Concepts, 5th Ed.
OLE-DB & ADO OLE-DB & ADO 작동 그림예제작동 그림예제
40
©Silberschatz, Korth and Sudarshan23.40Database System Concepts, 5th Ed.
Object Oriented Databases StandardsObject Oriented Databases Standards
Object Database Management Group (ODMG) standard for OODB
version 1 in 1993, version 2 in 1997, version 3 in 2000
provides language independent Object Definition Language (ODL) as well as several language-specific bindings
Object Management Group (OMG) standard for distributed SW based on objects
Object Request Broker (ORB) provides transparent message dispatch to distributed objects
Interface Definition Language (IDL) for defining language-independent data types
Common Object Request Broker Architecture (CORBA) defines specifications of ORB and IDL
41
©Silberschatz, Korth and Sudarshan23.41Database System Concepts, 5th Ed.
CORBA CORBA 작동 그림예제작동 그림예제
42
©Silberschatz, Korth and Sudarshan23.42Database System Concepts, 5th Ed.
XML-Based Application StandardsXML-Based Application Standards
Several XML based Standards for E-commerce E.g. RosettaNet (supply chain), BizTalk Define catalogs, service descriptions, invoices, purchase orders, etc. XML wrappers are used to export information from relational databases to XML
Simple Object Access Protocol (SOAP): XML based remote procedure call (RPC) standard
Uses XML to encode data (both parameters and results) HTTP as transport protocol (procedure call becomes HTTP request)
Endorsed by W3C Popular in B2B E-commerce Standards based on SOAP for specific applications
E.g. OLAP and Data Mining standards from Microsoft
DOM & SAX API are for general programming language to connect to XML Data
43
©Silberschatz, Korth and Sudarshan23.43Database System Concepts, 5th Ed.
SOAP SOAP 작동 그림예제작동 그림예제
44
©Silberschatz, Korth and Sudarshan23.44Database System Concepts, 5th Ed.
Table of ContentsTable of Contents
23.1 Performance Tuning
23.2 Performance Benchmarks
23.3 Standardization
23.4 Application Migration
23.5 Summary
45
©Silberschatz, Korth and Sudarshan23.45Database System Concepts, 5th Ed.
Legacy SystemsLegacy Systems
Legacy systems are older-generation systems that are incompatible with current generation standards and systems but still in production use E.g. applications written in Cobol that run on mainframes
Today’s hot new system is tomorrows legacy system! Porting legacy system applications to a more modern environment is problematic
Rewriting code Very expensive, since legacy system may involve millions of lines of code,
written over decades Original programmers usually no longer available
Switching over from old system to new system is a problem Big-Bang Approach Chicken-little Approach
One approach: build a wrapper layer on top of legacy application to allow interoperation between newer systems and legacy application E.g. use ODBC or OLE-DB as wrapper
46
©Silberschatz, Korth and Sudarshan23.46Database System Concepts, 5th Ed.
Legacy Systems (Cont.)Legacy Systems (Cont.)
Rewriting legacy application requires a first phase of understanding what it does
Often legacy code has no documentation or outdated documentation
reverse engineering: process of going over legacy code to
Come up with schema designs in ER or OO model
Find out what procedures and processes are implemented, to get a high level view of system
Re-engineering: reverse engineering followed by design of new system
Improvements are made on existing system design in this process
47
©Silberschatz, Korth and Sudarshan23.47Database System Concepts, 5th Ed.
Migrating to a New SystemMigrating to a New System
Switching over from old to new system is a major problem
Production systems are in every day, generating new data
Stopping the system may bring all of a company’s activities to a halt, causing enormous losses
Big-bang approach:
1. Implement complete new system
2. Populate it with data from old system
1. No transactions while this step is executed
2. scripts are created to do this quickly
3. Shut down old system and start using new system
Danger with this approach: what if new code has bugs or performance problems, or missing features
Company may be brought to a halt
48
©Silberschatz, Korth and Sudarshan23.48Database System Concepts, 5th Ed.
Migrating to a New System (cont.)Migrating to a New System (cont.)
Chicken-little approach:
Replace legacy system one piece at a time
Use wrappers to interoperate between legacy and new code
E.g. replace front end first, with wrappers on legacy backend
– Old front end can continue working in this phase in case of problems with new front end
Replace back end, one functional unit at a time
– All parts that share a database may have to be replaced together, or wrapper is needed on database also
Drawback: significant extra development effort to build wrappers and ensure smooth interoperation
Still worth it if company’s life depends on system
49
©Silberschatz, Korth and Sudarshan23.49Database System Concepts, 5th Ed.
Table of ContentsTable of Contents
23.1 Performance Tuning
23.2 Performance Benchmarks
23.3 Standardization
23.4 Application Migration
23.5 Summary
50
©Silberschatz, Korth and Sudarshan23.50Database System Concepts, 5th Ed.
Ch 21: Summary (1)Ch 21: Summary (1)
The Web browser has emerged as the most widely used user interface to databases.
HTML provides the ability to define interfaces that combine hyperlinks with forms facilities.
Web browsers communicate with Web servers by the HTTP protocol.
Web servers can pass on request to application programs, and return the results to the browser.
There are several client-side scripting languages - Java script is the most widely used-that provide richer user interaction at the browser end.
Web servers execute applications programs to implement desired functionality.
Servlets are a widely used mechanism to write application programs that run as part of the Web server process, in order to reduce overheads.
There are also many server-side scripting languages that are interpreted by the Web server and provide application program functionality as part of Web server.
51
©Silberschatz, Korth and Sudarshan23.51Database System Concepts, 5th Ed.
Ch 21: Summary (2)Ch 21: Summary (2)
Tuning of the database (system parameters, as well as the higher-level database design-such as the schema, indices, and transactions) is important for good performance.
Tuning is best done by identifying bottlenecks and eliminating them.
Performance benchmarks play an important role in comparisons of database systems, especially as systems become more standards compliant.
The TPC benchmark suites are widely used, and the different TPC benchmarks are useful for our comparing the performance of databases under different workloads.
Standards are important because of the complexity of database systems and their need for interoperation.
Formal standards exist for SQL. Defacto standards, such as ODBC and JDBC, and standard adopted by industry groups, such as CORBA, have played an important role in the growth of client-server database systems.
Standards for object-oriented databases, such as ODMG, are being developed by industry groups.
52
©Silberschatz, Korth and Sudarshan23.52Database System Concepts, 5th Ed.
Ch 21: Summary (3)Ch 21: Summary (3) E-commerce systems are fast becoming a core part of how commerce is performed.
There are several database issues in e-commerce systems. Catalog management, especially personalization of the catalog, is done with the databases.
Electronic marketplaces help in pricing of products through auctions, reverse auctions, or exchanges.
High-performance database systems are needed to handle such trading.
Orders are settled by electronic payment systems, which also need high-performance database systems to handle very high transaction rates.
Legacy systems are systems based on older-generation technologies such as nonrelational databases or even directly on file systems.
Interfacing legacy systems with new-generation systems is often important when the run mission-critical systems.
Migrating from legacy systems to new-generation systems must be done carefully to avoid disruptions, which can be very expensive.
53
©Silberschatz, Korth and Sudarshan23.53Database System Concepts, 5th Ed.
Ch 21: Bibliographical Notes (1)Ch 21: Bibliographical Notes (1)
Information about servlets, including tutorials, standard specifications, and software, is available on java.sun.com/products/servlet.
Information about JSP is available at java.sun.com/products/jsp.
An early proposal for a database-system benchmark (the Wisconsin benchmark) was made by Bitton et al.[1983].
The TPC-A,-B, and –C benchmarks are described in Gray[1991].
An online version of all the TPC benchmarks descriptions, as well as benchmark results, is available on the World Wide Web at the URL www.tpc.org
the site also contains up-to-date information about new benchmark proposals.
O’Neil and O’Neil[2000] provides a very good textbook coverage of performance measurement and tuning.
The five minute and one minute rules are described in Gray and Putzolu[1987] and Gray and Graefe[1997].
Brown et al.[1994] describes an approach to automated tuning.
54
©Silberschatz, Korth and Sudarshan23.54Database System Concepts, 5th Ed.
Ch 21: Bibliographical Notes (2)Ch 21: Bibliographical Notes (2) Index selection and materialized view selection are addressed by Ross et al.[1996],
Labio et al. [1997], Gupta [1997], Chaudhuri and Narasayya [1997], Agrawa et al. [2000] and Mistry et al. [2001].
Poess and Floyd[2000] gives an overview of the TPC-H, TPC-R, and TPC-W benchmarks.
The OO1 benchmark for OODBs is described in Cattell and Skeen [1992]; the OO7 benchmark is described in Carey et al. [1993].
Kleinrock [1975] and Kleinrock [1976] is a popular two-volume textbook on queuing theory.
Shasha [1992] provides a good overview of database tuning.
The American National Standard SQL-86 is described in ANSI[1986].
The IBM Systems Application Architecture definition of SQL is specified by IBM [1987].
The standards for SQL-89 and SQL-92 are available as ANSI [1989] and ANSI [1992] respectively.
For reference in the SQL:1999 standard, see bibliographical notes of Chapter 9.
55
©Silberschatz, Korth and Sudarshan23.55Database System Concepts, 5th Ed.
Ch 21: Bibliographical Notes (3)Ch 21: Bibliographical Notes (3)
The X/Open SQL call-level interface is defined in X/Open [1993]; the ODBC API is described in Microsoft [1997] and Sanders [1998].
The X/Open XA interface is defined in X/Open[1991]. More information about ODBC, OLE-DB, and ADO can be found on the Web site www.microsoft.com.data, and in a number of books on the subject can be found through www.amazon.com.
The ODMG 3.0 standard is defined in Cattell [2000]. ACM Sigmod Record, which is published quarterly, has a regular section on standards in databases, including benchmark standards.
A wealth of information on XML based standards is available online.
You can use a Web search engine such as Google to search for more up-to-date information about the XML and other standards.
Loeb[1998] provides a detailed description of secure electronic transactions.
Business process reengineering is covered by Cook [1996]. Kirchmer [1999] describes application implementation using standard software such as Enterprise Resource Planning (ERP) packages.
Umar [1997] covers reengineering and issues in dealing with legacy systems.
56
©Silberschatz, Korth and Sudarshan23.56Database System Concepts, 5th Ed.
Ch 21: ToolsCh 21: Tools
There are many Web development tools that support database connectivity through servlets, JSP, Javascript, or other mechanisms.
We list a few of the better-known ones here:
JAVA SDK from SUN (java.sun.com)
Apache’s Tomcat (jakarta.apache.org) and Webserver (apache.org)
IBM WebSphere (www.software.ibm.com)
Microsoft’s ASP tools (www.microsoft.com)
Allaire’s Coldfusion and JRun products (www.allaire.com)
Caucho’s Resin (www.caucho.com)
Zope (www.zope.org).
A few of these, such as Apache, are free for any use, some are free for noncommercial use or for personal use, while others need to be paid for
See the respective Web sites for more information.
57
©Silberschatz, Korth and Sudarshan23.57Database System Concepts, 5th Ed.
Table of ContentsTable of Contents
23.1 Performance Tuning
23.2 Performance Benchmarks
23.3 Standardization
23.4 Application Migration
23.5 Summary
58
Database System Concepts
©Silberschatz, Korth and SudarshanSee www.db-book.com for conditions on re-use
©Silberschatz, Korth and SudarshanDatabase System Concepts
End of ChapterEnd of Chapter