qiowp

L a r g e •• S y s t e m s •• S u p p o r t

VERITAS Quick I/OEquivalent to Raw Volumes, Yet Easier

Authors: Mohan Bhyravabhotla, Oracle CorporationBob Rader, VERITAS SoftwareVern Wagman, Oracle Corporation

Project Sponsor: Darryl Presley, Oracle Corporation

Created: 07/28/98Last Updated: 11/20/98 5:38 PMVersion: 1.10Status: Approved for Distribution

Copyright 1998 Oracle CorporationAll Rights Reserved

Large Systems Support VERITAS Quick I/O

2

Table of Contents

Executive Summary.......................................................................................................4

Preface ............................................................................................................................5

Abstract...............................................................................................................................................5Audience .............................................................................................................................................5Disclaimer...........................................................................................................................................5

Introduction....................................................................................................................6

New Option for Database Storage......................................................................................................6Traditional UNIX File Systems..........................................................................................................6Journaled File Systems ......................................................................................................................6Raw Volumes ......................................................................................................................................6VERITAS Quick I/O............................................................................................................................7

Project Overview ............................................................................................................8

Selecting the Model ............................................................................................................................8System Configuration.........................................................................................................................8

Database Server Hardware.............................................................................................................9Database Server Software...............................................................................................................9Simulated Load Server....................................................................................................................9

Database Configuration.................................................................................................................... 10Performance Concerns when Using Large Capacity Disks.......................................................... 10

Data Collection Techniques.............................................................................................................. 11TestFrame ..................................................................................................................................... 11Custom .......................................................................................................................................... 11

Observed Results..........................................................................................................12

Introduction...................................................................................................................................... 12Response Time.................................................................................................................................. 13CPU State Results ............................................................................................................................ 14Memory Consumption Results ......................................................................................................... 16I/O Measurements ............................................................................................................................ 16Database Measurements .................................................................................................................. 18

Database Buffer Cache Hit Ratio.................................................................................................. 19Overall Tablespace I/O Requests .................................................................................................. 20Overall Oracle Block Transfer Rate.............................................................................................. 22

QIO vs. Raw Volumes Limiting Factor ............................................................................................ 23Comparative Measurements ............................................................................................................ 25

Memory Consumption................................................................................................................... 25CPU Waiting on I/O (%wio) .......................................................................................................... 26

Administrative Issues ..................................................................................................28

Finding Information ......................................................................................................................... 28Selecting Storage Attributes ............................................................................................................ 28Storage Layout ................................................................................................................................. 29Backup .............................................................................................................................................. 29File Naming ...................................................................................................................................... 29

Best Practices ...............................................................................................................31


3

Conversion Techniques..................................................................................................................... 31Current Limitations ...................................................................................................................... 31Conversion Theory ........................................................................................................................ 31Possible Techniques ...................................................................................................................... 32Risk Assessment............................................................................................................................ 34Conversion Recommendations ...................................................................................................... 34

Backup and Restore Techniques ...................................................................................................... 35

Concluding Remarks....................................................................................................37

Acknowledgments ........................................................................................................38

Appendix A: VERITAS Quick I/O Overview..............................................................39

Why Quick I/O? ................................................................................................................................ 39How Quick I/O Works....................................................................................................................... 39

For More Information ..................................................................................................40


4

Executive SummaryVERITAS Quick I/O has been developed to combine the performanceadvantages of raw volumes with the maintenance advantages of a journaledfile system. Studies conducted at the Large Systems Support Belmont datacenter showed performance benefits similar to raw volumes. The availabilityof the traditional UNIX file commands greatly improves the ability of thetechnical staff to manage the database files correctly and efficiently.VERITAS Quick I/O deserves serious consideration for implementation inany data center running Oracle databases on UNIX.


5

Preface

Abstract

The purpose of the study was to test the assertion that VERITAS Quick I/Ofile systems perform comparably to raw volumes, while providing greaterflexibility and ease of administration.

A series of experiments was performed, driven by a simulator modeled after alarge production online transaction processing (OLTP) database application.Collected performance data was analyzed for both raw volumes andVERITAS Quick I/O file systems to identify similarities or significantvariations. Administrative differences between using raw volumes andVERITAS Quick I/O file systems were examined, and Best Practicerecommendations were studied.

Audience

Database Administrators and System Administrators using Oracle 8.0.4 onSolaris 2.5.1 or higher.

Disclaimer

This paper is educational by nature; any concepts or techniques discussedherein should be thoroughly tested prior to implementation. The authorsassume no liability whatsoever for any implementation of the materialdiscussed herein.


6

Introduction

New Option for Database Storage

The introduction of the VERITAS Database Edition for Oracle, whichcontains the Quick I/O feature studied herein, changes the traditional set oftradeoffs between using raw volumes or file systems for an Oracle database.In many cases, it is now possible to implement the performancecharacteristics of raw volumes without the associated maintenancedifficulties. In the paragraphs that follow, we outline some of the majorissues of the data storage options currently available.

Traditional UNIX File Systems

There are a variety of well-known UNIX commands available for supportingthe traditional UNIX file system, such as ls, cp, mv, cpio, tar, anddump/restore. There are, however, a number of drawbacks to using thetraditional UNIX file system (UFS) for database files. Included are: asusceptibility to damage the database during a system crash; a performancebottleneck on the UNIX per-file write lock; double-caching operations;double-copy operations; and extended boot times after a system crash as thefile system checks its integrity.

Double-caching is the duplication of memory in use for data in an Oraclebuffer and a copy of the same data in an O/S level file system data cache.Double-copying is the extra data movement required to move data betweenan Oracle buffer, an O/S buffer, and the disk, rather than just moving thedata between the Oracle buffer and the disk.

Journaled File Systems

The advent of journaled file systems greatly improved the ability of thetraditional UNIX file system to survive system crashes without damage andto reduce the amount of time required to verify the integrity of the file systemduring the following boot phase. However, journaled file systems are stillsubject to the performance bottleneck associated with the UNIX per-file writelock, double-copy operations, and double-caching operations.

Raw Volumes

Raw volumes use the raw device or “character mode” non-buffered interface.The raw device interface is the simplest, most basic method, for performingI/O between an application buffer and a physical device. Using the rawdevice interface eliminates performance bottlenecks due to buffering andcopying or due to serialization on files or buffers. In addition, asynchronous


7

I/O is only supported for the raw device interface that underlies raw volumes.As a consequence, using a raw volume can significantly reduce the processtime spent waiting for I/O to complete. However, raw volumes are not files ina file system, so the usual UNIX commands for allocating or removing files,listing files, finding file sizes, and backing up files do not apply. They need amore skilled and well-trained staff to be used successfully.

VERITAS Quick I/O

VERITAS Quick I/O has been developed to combine the performanceadvantages of raw volumes with the maintenance advantages of a journaledfile system. The performance benefits are similar to raw volumes, and all thetraditional UNIX file commands are usable. The availability of thesecommands greatly improves the ability of the technical staff to manage thedatabase files correctly and efficiently. See Appendix A, “VERITAS QuickI/O Overview”, for further information regarding the Quick I/O feature.


8

Project OverviewThis study had two main goals:

• To compare the performance of VERITAS Quick I/O files and rawvolumes.

• To examine the administrative differences between using Quick I/Ofiles and raw volumes.

The study was divided in two parts: performance and administration. Forthe performance portion we used a near-realistic OLTP workload model andsystem. For the administrative part of the study, we explored the range oftasks and questions affected by using file systems versus using raw volumes.

Selecting the Model

For our model, we selected a “near-production-realistic” Incident TrackingSystem (call-tracking application) OLTP workload model previously used invarious studies conducted at the Oracle Large System Support (LSS)Belmont center. The model runs on two machines -- a database server and aload server producing a simulated load. The model used an 18 GB databasewith a maximum load of 1200 simulated users. The simulated users weregenerated on the load server using the preVue-C/S™(v5) tool from RationalSoftware, Inc.

The load simulation model was built in previous work. It was constructedfrom actual message traffic collected, analyzed, and coded into programsequences using PreVue tools. The simulated users’ behavior wasparameterized (wait time, think time) to provide some control over theapplied load. The model included a “Reference User” that performed largequeries on the state of database entries. The response time of the ReferenceUser (measured on the load server machine) was taken as a measure of theresponsiveness and performance of the OLTP system.

In this study, we decreased the wait and think times to increase the appliedload for a given number of simulated users. The absolute values of simulatedusers, response time, or throughput measures reported in this paper are allarbitrary – only the comparison of the two operating environments (QuickI/O file system vs. raw volumes) is significant.

System Configuration

The study required a database, a network resident load-generating machine,and several pieces of software. We describe here the hardware and softwareused in this study.


9

Database Server Hardware

We wanted a server machine capable of running the OLTP model applicationwith a large load. We selected a Sun™ Ultra Enterprise 4000 with fourprocessors running at 250 MHz, each with a 4 MB L2 cache, and 2 GB ofmemory. Two Sun™ Enterprise Network Array™ A5000 on Fibre Channelwere used. Each A5000 had fourteen 9.1 GB disks connected over a singleFibre Channel Arbitrated Loop (FC-AL) to the Ultra Enterprise 4000. TheUltra Enterprise 4000 was connected to the local area network using a10BaseT (10 Mbps) link.

Database Server Software

We selected the Solaris version 2.5.1 operating system.

Oracle 8.0.4 was configured in a Dedicated Server Model using the table andindex partitioning features.

We used VERITAS Database Edition for Oracle, Release 1.1.2, whichcontains the following software:

• VERITAS File System (VxFS) release 3.2.2, a high-performance, fast-recovery file system that is optimized for business-critical databaseapplications and data-intensive workloads.

• VERITAS Volume Manager (VxVM) release 2.5.2, a disk managementsubsystem that supports disk striping, mirroring, and simplified diskmanagement for improved data availability and higher performance.

• VERITAS Visual Administrator (VxVA) release 2.5, an easy to usegraphical user interface for VxVM and VxFS.

• VERITAS Quick I/O™ for Databases release 1.1.2, a device driver thatallows VxFS to support databases with raw disk performance.

TestFrame , an LSS in-house developed software tool, was used for datacollection and data reduction. The tool consists of sets of csh scripts used incollecting system data, SQL scripts used in collecting Oracle database data,and csh scripts used in reducing the data into forms that can be easilyimported into spreadsheet programs. The testFrame package was extendedwith some custom scripts for this study. Microsoft Excel™ was used forspreadsheet calculations and plotting graphs.

Simulated Load Server

The simulated load server was an HP K400 with four processors and 1 GB ofmemory, running HP-UX 10.20. This was the machine running the preVuesoftware. The near-production realistic work model was ported to this serverand played back for each test.


10

The “Reference User” performed read-only queries at a variable rate. Theresponse time for these queries was taken to measure the load stress on thedatabase server while the remaining simulated users were performing theirvarious transactions on the database. We monitored the machine to ensure itdid not limit the number or activity of the simulated users, but we did notstudy its performance.

The network connecting the simulated load server to the database server wasa 10BaseT (10 Mbps) Ethernet. Due to the small amount of data transferbetween the load server and the database server, network traffic was notmeasured.

Database Configuration

The database considered for this study was a call tracking applicationdatabase running on Oracle 8.0.4. The data tablespaces, index tablespaces,redo log files and control files were created on separate volumes to avoid diskI/O contention problems. The database was created with the Oracle 8partitioning feature. The tables were partitioned into 2 to 15 partsdepending on their type and size.

For good performance, the database was created on volumes striped 4 waysacross two A5000 disk arrays with a stripe width of 64 KB; however, the redologs and control files were only striped 2 ways. When making file systems,we used the default file system parameters. The default VxFS block size was1 KB.

We set OPTIMIZER_MODE = RULE, necessitated by the SQL used by theapplication model. The Oracle DB_BLOCK parameter was set at 2 KB, avalue determined to provide good performance in previous LSS studies usingthis database. The total System Global Area was 386 MB in theexperimental runs reported in this study. The database was configuredwithout an archive log.

Performance Concerns when Using Large Capacity Disks

The 9 GB capacity drives used in the A5000 have a 50% difference in transferrate when using the outer and inner cylinders, due to the difference in linearspeed of the disk under the disk head. Due to this large performancedifference, care was taken to ensure each tablespace used the same physicaldisk space, as closely as possible, when loaded in a file system or in a rawvolume.

We saved the disk configuration information for the raw volume and filesystem configurations. We were then able to reconfigure between rawvolumes and file systems by deleting the volumes for one configuration andre-making the volumes for the other configuration, using a shell script.


11

Once the database was successfully created, we took a backup to disk. Foreach reported experimental run, we refreshed the database with its originalcontents from the backup.

Data Collection Techniques

The performance of the “raw volume” (“RAW”) and “Quick I/O file system”(“QIO”) based databases was measured by collecting statistical data on thedatabase server. Data was collected for CPU usage, memory consumption,I/O activity, other UNIX metrics, and database activity.

Data was collected every $CYCLETIME seconds. We set CYCLETIME=300seconds, for a 5 minute data collection period. The scripts timestamped eachdata point so the data could be correlated in time.

The preVue model was run with new batches of users being added at definedtime intervals. In these experiments, we selected a user increment of 100users and a time interval of 30 minutes. We measured the system with aconstant number of simulated users for 30 minutes, from 100 users up to ourmaximum load of 1200 users.

The preVue model programs also measured the Reference User responsetime. We set the measurement interval to 5 minutes and the data wascollected on the simulated load server. After the run, a data reductionprogram on the load server was used to produce a file of timestamps andresponse times for the Reference User.

TestFrame

The testFrame tool consists of data collection scripts and data reductionscripts. The data collection scripts collect system data (from sar, ps anduptime) and Oracle data (tablespace I/O counts, file I/O counts, latch, SGA,and other statistics).

The data reduction scripts produce files formatted in a regular text structurefor easy importing into a spreadsheet. Reports were produced for CPU usage,CPU queue, tablespace I/O, file I/O, and database latch statistics.

Custom

We added a session count script to count the simulated users connected to thedatabase, an SQL script to collect SGA hit ratio data, and a script to collectstatistics on I/O operations at the level of VxVM volumes. This script usedthe vxstat utility.


12

Observed Results

Introduction

During the course of this study, we produced a minimum of 80 graphedmetrics for each execution of the model. A complete discussion of these isbeyond the scope of this paper.

For most of the metrics studied, the graphs were so close as to be called“identical”. For supporting analysis, we will present only those graphs thatshow at a high level the similarities needed to assure the reader of accuracy.In addition, a few graphs are presented which relate findings across all fourstorage methodologies: UFS, VxFS, QIO, and RAW.

Regarding tuning, each combination of storage method and model run wasnot extensively optimized for performance. During the initial stages of thestudy, a combination of UNIX parameters, Oracle parameters, and modelparameters was developed which generated “reasonable” metrics not relatedto the storage method in use. This mix of parameters was then held constantfor all data runs with only the type of file system being varied.

In the following sections, we present our chosen graphs in the followingorder:

• Response Time• CPU Use• Memory Consumption• I/O Measurements• Database Measurements• Comparative Measurements

For each graph we present relevant observations and summarize ourfindings.


13

Response Time

In the following graph, the Average Response Time for the Reference User isplotted against the number of active users. Since the modeled workloadincreased the number of users in increments of 100 at 30-minute intervals,the data was adjusted to exclude any samples that were taken during thebuildup to the next user count level. The effect of this was to take responsesamples when the load had stabilized at 100, 200, 300, etc., users. Thistechnique was adopted as a strategy to minimize the time “drift” effectbetween the two systems as the server became busier, and the amount oftime required for the server to execute the Reference User query increased.

Reference User Average Response Time: QIO vs RAW

1

10

100

1000

100 200 300 400 500 600 700 800 900 1000 1100 1200

Simulated Users

Res

po

nse

Sec

on

ds,

Lo

gar

ith

mic

Sca

le

QIO

RAW

Figure 1 — Reference User Average Response Time: QIO vs RAW

As shown in Figure 1, the curves are quite similar. Up to the 1000th user, thecurves are fairly linear and after the 1000th user, they change pitch at arapidly increasing rate. This rapid change in pitch near the 1000th usercorresponds to the resource bottleneck discussed later in the ComparativeMeasurements section.


14

CPU State Results

One of the metrics often used when discussing system performance is theamount of time the CPU spends in various processing modes. Most UNIXsar commands break this data into four categories: %usr, %sys, %wio, and%idle. %usr is the percent of time the CPU is executing user leveloperations. %sys is the percent of time the CPU is executing system leveloperations, such as pre- and post- I/O processing. %wio is the percentage oftime the CPU is waiting for I/O to complete with no runnable processes.%idle is the percentage of time the CPU is available for additional work.

For presentation purposes, this area is divided into two graphs. The firstgraph shows %usr and %sys together, while the second graph shows %wioand %idle together.

CPU %usr , %sys : QIO vs RAW

0

10

20

30

40

50

60

70

100

200

300

400

500

600

700

800

900

1000

1100

1200

Simulated Users

Per

cen

t

QIO %usr

RAW %usr

QIO %sys

RAW %sys

Figure 2a — CPU %usr, %sys: QIO vs RAW

As displayed in Figure 2a, above, the curves for %usr and %sys CPU activityare quite similar.


15

CPU %wio , %idle : QIO vs RAW

0

10

20

30

40

50

60

70

80

90

100

100

200

300

400

500

600

700

800

900

1000

1100

1200

Simulated Users

Per

cen

t

QIO %wio

RAW %wio

QIO %idle

RAW %idle

Figure 2b — CPU %wio, %idle: QIO vs RAW

In Figure 2b above, below 800 users QIO appears to spend slightly less timein %wio state, which also gives it a slightly higher %idle value. Due to timeconstraints, the team was unable to investigate this any further.


16

Memory Consumption Results

Another often-used metric is the rate of memory consumption. Memory isusually divided into real physical memory and virtual memory, or “swap”space in one or more files on disk. In Figure 3 below, the “swap” space isreduced by a scale factor of 10 (SF10) to fit the two curves on the graph.

Free Memory and Swapspace: QIO vs RAW

0

200

400

600

800

1000

1200

1400

1600

1800

100

200

300

400

500

600

700

800

900

1000

1100

1200

Simulated Users

Mb

ytes

QIO-fmem

RAW-fmem

QIO-fswap (SF10)

RAW-fswap (SF10)

Figure 3 — Free Memory and Swapspace: QIO vs RAW

The memory consumption curves are quite similar. Once the real memorywas used up after the 1000th user, the system began paging real memory,but did not reach the point of process swapping.

I/O Measurements

In general, the disk farm used by the test system was not a bottleneck in anyof the simulations. The Device Queues were practically non-existent, theDevice Percent Busy values were below 30%, and the Device Average Servicetime was in an expected range for a modern disk system.


17

Rather than present the graphs necessary to cover all aspects of diskperformance, we will use the Device Average Service Time metric andpresent two graphs contrasting the top four disk devices to illustrate ourpoint (Figures 4a and 4b).

Device Average Service Time is the average amount of time it takes an I/O(both read and write) to complete once it has been handed to the disk deviceby the disk controller. As disk devices become more saturated, these timestend to increase.

As you review these graphs, please note that while there are spikes ortroughs due to variations and timings of the simulation workload, the overalltrend for these lines is relatively flat and close together. The relativeflatness of the lines indicates there was no performance problem in the diskfarm as the simulation load increased.

Device Average Service Time: QIO vs RAW

0.00

5.00

10.00

15.00

20.00

25.00

30.00

100 200 300 400 500 600 700 800 900 1000 1100 1200

Simulated Users

Mill

isec

on

ds QIO-data1

RAW-data1

QIO-index1

RAW-index1

Figure 4a — Device Average Service Time: QIO vs RAW


18

Device Average Service Time: QIO vs RAW

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

16.00

100 200 300 400 500 600 700 800 900 1000 1100 1200

Simulated Users

Mill

seco

nd

s QIO-cust_data

RAW-cust_data

QIO-system

RAW-system

Figure 4b — Device Average Service Time: QIO vs RAW

Database Measurements

Within the database, the team looked at metrics for File I/O activity,Tablespace I/O activity, Latch Contention, and some of the Cache Ratios forstructures in the System Global Area (SGA).

In general, once again the various metrics were remarkably similar. Sincethis study is primarily to examine the differences between QIO and rawvolumes, we will present three metrics closely associated with the movementof data to and from disk devices. Those metrics are Database Buffer CacheHit Ratio, Overall Tablespace I/O Request Rate, and Overall Oracle BlockTransfer Rate. The Overall Oracle Block Transfer Rate will also be referredto as database “Throughput”.


19

Database Buffer Cache Hit Ratio

The database buffer cache hit ratio is a measure of how many times arequested Oracle data block was found to be available in the database blockbuffers. The higher this ratio, the more efficient the cache. Typically, thedatabase buffer cache hit ratio starts out low and increases as the systemworkload populates the database block buffers with data. As can be seen inFigure 5 below, there is little difference between the curves.

Database Buffer Cache Hit Rate: QIO vs RAW

0

10

20

30

40

50

60

70

80

90

100

100

200

300

400

500

600

700

800

900

1000

1100

1200

Users

Per

cen

t

QIO

RAW

Figure 5 — Database Buffer Cache Hit Rate: QIO vs RAW


20

Overall Tablespace I/O Requests

Another way to measure the similarity of I/O activity is to examine the totalnumber of I/O requests per tablespace. In Figures 6a and 6b below, we havegraphed the sum of Read and Write I/O Requests for the four most activetablespaces in our simulation database. The data is presented with twotablespaces per graph, using a logarithmic scale on the Y-axis. Once again,the important point to note is the similarity of the curves for the varioustablespaces in both the QIO and raw volume simulations.

Tablespace Read & Write I/O Requests: QIO vs RAW

1

10

100

1000

10000

100000

100

200

300

400

500

600

700

800

900

1000

1100

1200

Simulated Users

R &

W I/

O R

equ

ests

, Lo

gar

ith

mic

Sca

le

QIO - TEMP

RAW - TEMP

QIO - TS_TAR13

RAW - TS_TAR13

Figure 6a — Tablespace Read & Write I/O Requests: QIO vs RAW


21

Tablespace Read & Write I/O Requests: QIO vs RAW

1

10

100

1000

10000

100000

100

200

300

400

500

600

700

800

900

1000

1100

1200

Simulated Users

R &

W I/

O R

equ

ests

, Lo

gar

ith

mic

Sca

le

QIO - TS_TAR14

RAW - TS_TAR14

QIO - TS_TAR15

RAW - TS_TAR15

Figure 6b — Tablespace Read & Write I/O Requests: QIO vs RAW

As shown in Figure 6a, TEMP is the busiest tablespace, and it follows agrowth curve through the increasing number of simulated users. The otherthree tablespaces, TAR13, TAR14, and TAR15 appear to be accessed at aconstant rate. This may seem to be unusual behavior for a load simulation.

The explanation is as follows: TEMP was the busiest tablespace, and showsin the graph as expected. The next three tablespaces were accessed by theSQL query used by the Reference User. Since the Reference User performedlong scans at regular intervals, the activity was relatively constant untiloverall system performance starts to degrade. The varying values of the lineson the Y axis were because TAR14 had the most data of interest to theReference User, while TAR13 held less and TAR15 held the least. Also, a flatpattern appears because all the remaining tablespaces, which showedvarying curve patterns, were not selected for these graphs.


22

Overall Oracle Block Transfer Rate

It is valuable to look at some measure of database “Throughput” as a meansof comparing various simulation runs. For our purposes, we definethroughput to be the sum of all Oracle Data Blocks Read and Written.

Database Throughput: QIO vs RAW

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

100

200

300

400

500

600

700

800

900

1000

1100

1200

Simulated Users

Ora

cle

Blo

cks

Rea

d &

Wri

tten

QIO

RAW

Figure 7 — Database Throughput: QIO vs RAW

In Figure 7, raw volumes appear to have a slightly higher throughputmeasurement than QIO. This is more evident when we fit a linear trendlineto the data as shown in Figure 8.


23

Database Throughput, Linear Trendline: QIO vs RAW

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

100

200

300

400

500

600

700

800

900

1000

1100

1200

Simulated Users

Ora

cle

Blo

cks

Rea

d &

Wri

tten

Linear (RAW)

Linear (QIO)

Figure 8 — Database Throughput, Linear Trendline: QIO vs RAW

If you examine the average throughput across all sample intervals, there wasa 5.1% advantage to raw volumes.

Given there are a large number of potential tuning combinations, it ispossible the same simulation running with tunable parameters optimizedspecifically for each storage methodology would produce different results.This particular simulation was run with “reasonable” parameters, not“optimized” parameters. Readers are encouraged to base any judgmentsconcerning performance upon their own specific tests. Tuning each filesystem to the “nth” degree was outside the scope of this investigation.

QIO vs. Raw Volumes Limiting Factor

We present in this section what we believe to be the limiting factor of oursimulations. For each simulation run, a point was eventually reached wheresome resource was depleted and the overall system performance degradedsubstantially, and at increasing rates. This was done deliberately to examinethe behavior of storage methods on a spectrum of machine performance fromlightly loaded to saturated.


24

During our simulations, the type of I/O subsystem was not the limitingfactor. In all the various simulations conducted, the availability of realmemory was found to be the limiting factor most affecting the results of thesimulations.

Figure 9 presents the scaled curves for Reference User Average Response,Average Database Throughput, and Average CPU %idle against the numberof simulated users for the QIO and raw volume simulations.

Limiting Factor Correlation: QIO & RAW Averaged

0

20

40

60

80

100

120

140

160

180

200

100 200 300 400 500 600 700 800 900 1000 1100 1200

Simulated Users

Sca

led

Un

its Avg Resp

Avg %idle

Avg fmem (SF10)

Avg Thput (SF1000)

Figure 9 — Limiting Factor Correlation: Averaged QIO vs Averaged RAW

As real memory begins to become scarce around 1000 simulated users, all thecurves begin to change pitch. As more load was added past 1000 users,overall system performance and response time degraded at an increasingrate. As the system experienced increasing pressure for real memory, itspent more and more resources moving pages back and forth to the swap file,as well as searching for and rearranging free pages of real memory.Therefore, CPU %idle fell, response time worsened, and overall databasethroughput decreased as well.


25

Comparative Measurements

Though this is primarily a study of QIO versus raw volumes, there werecertain comparisons of interest across all four storage methodologies. Wepresent in this section two metrics measured across all four I/O subsystemtypes tested.

We actually ran the various simulations across all four I/O subsystem types:UNIX File Systems, VxFS journaled file systems, VERITAS Quick I/O, andRaw Volumes. While a complete discussion of all the differences is beyondthe scope of this document, the memory consumption and CPU %wio metricsshow interesting differences.

Memory Consumption

In general, more memory was consumed initially and at a faster rate over therange of the simulated user loads for UNIX File Systems and VxFS journaledfile systems.. As shown in Figure 10, the slope of the Memory Consumptioncurve for UFS/VxFS is steeper and the gap between the storagemethodologies widens over time, causing the UFS and VxFS I/O subsystemsto run out of real memory sooner than QIO or raw volumes.

The team believes this is most likely due to effects of the buffering associatedwith the UFS and VxFS I/O subsystem types. For raw volumes and QIO,memory is consumed only by Oracle processes as the number of databaseusers increases. This implies that more memory will be required tosuccessfully scale to a large number of users for UFS and VxFS I/Osubsystem types than for QIO or raw volumes.


26

Free Memory Comparison: UFS, VxFS, QIO & RAW

0

200

400

600

800

1000

1200

1400

1600

1800

100

200

300

400

500

600

700

800

900

1000

1100

1200

Simulated Users

Mb

ytes

VxFS-fmem

UFS-fmem

QIO-fmem

RAW-fmem

Figure 10 — Free Memory Comparison: UFS, VxFS, QIO & RAW

CPU Waiting on I/O (%wio)

One of the most telling graphs of the entire study is the side-by-sidecomparison of the percentage of time the CPU spends waiting on I/O tocomplete in each of the four I/O subsystem types. Figure 11 depicts this bygraphing CPU %wio for the four I/O subsystem types.


27

%WIO Comparison: UFS, VxFS, QIO, & RAW

0

5

10

15

20

25

30

35

40

100

200

300

400

500

600

700

800

900

1000

1100

1200

Simulated Users

Per

cen

t

UFS %wio

VxFS %wio

QIO %wio

RAW %wio

Figure 11 — Comparison of %wio: UFS, VxFS, QIO & RAW

The system spends significantly more time waiting for I/O to complete inUFS and VxFS, with UFS clearly the worst. The team believes these curvecharacteristics are a result of serializing on the UNIX file write lock andusing synchronous, non-direct I/O.

A cautionary word is in order here. When converting from a file system toraw volumes using Oracle Export and Import utilities, there is some amountof performance improvement gained simply from the reorganization of thetablespace in the conversion process. How much performance gain thereorganization of the tablespace provides has been periodically debated.

In our simulation, the database was reloaded to a consistent state prior toeach data collection run from the same backup copy, via the UNIX “dd”command. This implies the state of the Oracle blocks was exactly the sameat the start of each simulation. Therefore, the numbers above have anyperformance gain due to a tablespace reorganization factored out.


28

Administrative Issues Using a file system is fundamentally different from using raw volumes.Using the Quick I/O feature is a small variation on using a file system. Inthis section we compare the administration issues seen when using filesystems versus raw volumes.

There are several flavors of using raw volumes – using the physical disksdirectly or using volumes defined logically on the physical disks usingvolume manager software, such as VERITAS VxVM or Solaris Online DiskSuite. The use of physical disks is truly fraught with administrativeheadaches, many of which are avoided by using a volume manager. For thepurposes of this study we compare using a raw volume defined under VxVMwith using a VxFS file system.

We compare the administration of database storage tasks for:

♦ Finding Information• listing names and allocation sizes• determining free space in the on-line storage

♦ Selecting Storage Attributes♦ Storage Layout

• avoid “hot spots”• reallocation

♦ Backup Methods♦ File Naming

Finding Information

The vxprint command is used under VxVM to list the names and sizes ofraw volumes, or to list the name, size and other attributes of a selectedvolume. To find the amount of free space available the “vxdg free”command can be issued. Alternatively, the VxVM visual administrator canbe used to see the set of volumes and space allocations.

If a file system is used for database storage, ordinary UNIX commands thatdisplay the size of files and the space available in file systems can be used toget information about the files: ls can be used to list the files in a directory,“ls -l” can be used to list the names, sizes, and date last modified for thefiles in a directory. These tools simplify administrative tasks.

Selecting Storage Attributes

A major concern when designing a database is the reliability of the storageand the performance experienced using the storage. The volume managersoftware can be used to select reliability and performance attributes of avolume, based on the physical devices available and the additional software


29

mirroring, RAID-5, and striping required to provide the desired level ofservice. These features are available to users of raw volumes or file systemsbuilt on volumes.

Storage Layout

A central issue in designing a database is avoiding the creation of “hot spots”with a lot of I/O contention. For a raw volume, a tablespace is allocated onits own volume, and that volume must be allocated with awareness of anyother volumes allocated to the same physical device.

Using a file system provides an additional level of flexibility in the layout ofstorage. Larger storage containers (volumes) can be used, which allows morespindles in striping and usually results in avoiding hot spots due to the waydisk blocks are added to files.

Volume manager tools can be used when a raw volume must be changed insize, striping, location, or mirroring. When a file must be changed, it can begrown or shrunk without concerns about fitting into a particular piece ofstorage, since the file system is a single pool of disk blocks. If a performanceproblem is found with a particular set of files, copying the files to differentlocations is simpler than moving volumes.

Backup

When using raw volumes, backups are done using the dd command, orspecific volume manager features that allow a “mirror-breakoff” snapshot of avolume. Also, some storage devices provide their own vendor-specific facilitiesfor doing backups below the volume manager layer.

When using a file system, backups can be done at the volume level, but thereare additional backup options in the form of the cpio and tar commands,the dump (ufs_dump or vxfs_dump) commands, or vendor-specific backuptools. There are more tools available for managing file system backups thanraw volume backups.

File Naming

The Quick I/O feature references a file using a special extension to the filename to designate it as a Quick I/O file. The extension, adds the string“::cdev:vxfs:” to the end of the file name. A file referenced with thisname extension will be opened as a raw (character) device. This nameextension is most easily accomplished by using a symbolic link as the Oraclefile name, with the symbolic link adding the name extension to the actualdata file name.

This is a simple mechanism, but it introduces about the same amount ofadditional complexity and possibilities for confusion as the use of a raw


30

volume. The VERITAS Database Edition for Oracle provides tools to changeVxFS database files into Quick I/O files. One tool gets the list of databasefiles from the database (“getdbfiles.sh”) and another tool converts the listedfiles into Quick I/O files by renaming each database file and inserting asymbolic link pointing to the new file name (“mkqio.sh”).


31

Best Practices

Conversion Techniques

In this section, we discuss in general terms some of the issues related toconverting existing database files to use VERITAS Quick I/O features. Wedeliberately do not discuss detailed instructions or plans. Such material isbest derived by the people responsible for any specific conversion effort in thesafety and comfort of their own quality assurance or test facilities. We alsodo not discuss extensively the creation and use of VERITAS Quick I/Ofunctionality for new databases, which is adequately covered in theVERITAS documentation set.

Current Limitations

The VERITAS Quick I/O functionality requires the base file system to beconstructed with a VERITAS file system format. A VxFS file system is in thecorrect format, so the Quick I/O feature can be activated with relativelysimple commands not requiring data movement. UNIX file systems and rawvolumes are not in the required format. They require a “read out” of thedata from the old structure and a “read in” of the data to the new structure.

We see this requirement for data movement to be the largest issue affectingany conversion effort. For large, complex, or high-availability databases, thisdata movement will dramatically impact the viability of a conversionattempt.

Conversion Theory

To successfully convert to VERITAS Quick I/O, there are two requirementsthat must be satisfied:

• The underlying data file must be in VxFS format.

• The file must be accessed in the extended namespace format for thefile device driver (fdd) to implement the Quick I/O behavior.

A pre-allocated VxFS file is in the proper format, whereas UNIX file systemsand raw volumes are not. For these two storage methods, a data movementoperation must take place.

In VERITAS terminology, the hierarchy of traditional directories, files, andlinks exist in the file system namespace; that is, they exist as directlyaccessible objects. Extended namespace is an area of the file system notdirectly accessible to users, but which can be accessed by applications. TheVERITAS fdd driver that implements Quick I/O functionality recognizes oneextended namespace: cdev. To activate Quick I/O, a pre-allocated VxFS file


32

is referenced in an application by its extended namespace identifier:“file_name::cdev:vxfs”. Accesses to the file using this naming conventionwill be recognized by VxFS and passed to the fdd driver. The opened file willthen be handled with Quick I/O functionality.

Typically, the extended namespace functionality is implemented by creating ahidden, pre-allocated file in a directory, for example “.cormac”. Then asymbolic link is created that references this hidden file in extendednamespace format, for example “cormac ==> .cormac::cdev:vxfs:”.The symbolic link is then used in the Oracle database file name.

For Quick I/O to work, there can be no physical file on the system namedusing the extended namespace convention “file_name::cdev:vxfs:”. Sincea special pseudo-device is created using this distinct naming conventionwhen the file is opened, a real file with this naming convention will blockQuick I/O functionality from being recognized by the VxFS driver.

VERITAS has provided several utilities to assist in either converting orcreating Quick I/O file systems, which are fully described in the VERITASdocumentation set. There is one caution the team discovered in working withthe utilities: the utility that reads file names out of the database and createsscripts to be used in conversion gets its information from the v$datafileview; this approach leaves out the control files and the redo logs. Therefore,it is important not to forget to include these two groups of files in yourconversion planning.

Possible Techniques

When considering all the various possibilities, the team examined two basicconcepts: Conversion Using Database and Conversion Outside of Database.In the first, we examined possible conversion techniques that might be usedwith the database open and active. In the second, we looked at techniqueswith the database closed.

During our study, once we had our database running “reasonably” with thesimulation, we made a copy of the entire database to alternate disk mediausing the dd command. We then copied this “base” image back to the newfile system each time we converted at the start of a new simulation series.We used the simplest possible approach, since we had the disk spaceavailable.

Conversion Using Database

During this phase, the team explored ways to keep a database up and openfor at least some applications or users while the conversion proceeded. Weexamined a number of possible schemes, for example doing “copy as select”operations, renaming data files, exp/imp, and data unload/reload.


33

It quickly became apparent that for a database with any degree of referentialintegrity constraints, the planning for these efforts becomes nightmarish.Consequently, the team does not generally recommend this approach, sincethe planning and testing alone to may take longer than doing the conversionwith other techniques, and the degree of risk exposure increasessignificantly.

The one technique using the database that we seriously considered was atechnique for high-end databases with strict downtime restrictions runningsome type of disk mirroring. The base plan was as follows:

1) Make and verify alternate backups

2) Separate the mirrors

3) Run the primary side in archive log mode

4) Convert the secondary side database storage

5) Stop the database

6) Save the current redo and control files to alternate media

7) Copy them to the converted secondary mirror

8) Use the converted mirror as primary and archive logs to “recover”the database to the current point in time

9) Perform another backup to alternate media

10) Re-silver the mirror with the converted side of the mirror as theprimary

Unfortunately, we did not have time to test this plan. Therefore, it isimportant to thoroughly test before implementation. There are risks due tothe complexity, but for those databases that have strict downtimerequirements, it may be viable. Outside of the complexity risks, the otherimportant decision point is whether or not the time to roll forward throughthe archive logs for the database in question is in fact less than the timerequired to convert the data files using less exotic techniques. The savinggrace to the plan is that the primary mirror is still intact and could be usedto bring the database online quickly should the convert/restore operationsdisintegrate!

Conversion Outside of Database

In general, the team believes the conversion of data files using operatingsystem utilities, after the applications have been stopped and the databaseclosed, is probably the method of choice for most databases. The mainconstraints are the same for any large data movement operation: type ofalternate storage media used, amount of alternate storage media available,and speed of alternate storage media.


34

Conversion techniques using a copy of the disk farm for conversion will bethe fastest to convert, plus have the advantage of a complete copy of thedatabase available as backup. Conversions that do not have enough diskspace to copy the entire structure will be restricted by how much disk to diskdata movement can be done at once. Plus, if the conversion fails for somereason it may take longer to restore from backup tapes. Conversions thatmust copy all the data to tape, then convert the files, and finally restoreeverything from tape will be strongly constrained by the efficiency of theparticular tape storage system used. If the conversion fails for some reason,the restore from backup tapes is similarly constrained.

Reader Caution

While the available time and sheer number of possible conversion techniquesprevent a lot of detail in this section, the basic advice for a successfulconversion has been said many times before in the realm of computers:

• Plan the conversion and fall back approach carefully• Test everything thoroughly• Backup and verify everything• Execute the conversion plan• Execute the fall back plan if necessary

Risk Assessment

Any decision to convert an existing database should include some measure ofrisk assessment. Factors such as time-to-recover requirements, allowablewindow of downtime, and cost to convert, need to be weighed against thetime required to complete the conversion, possibility of failure, and expectedbenefits. The recommendations below should be examined in the context ofthe risk assessment for a given conversion plan, not simply accepted as a “defacto standard” of any type.

Conversion Recommendations

When considering a conversion to Quick I/O from some other type of storagemethod, we see four general cases to cover: New Database, Existing VxFSDatabase, Existing UFS Database, and Existing Raw Volume Database.Each case is discussed as follows.

New Database

A VERITAS Quick I/O file system is highly recommended for construction ofnew databases. However, for those applications where absolute performanceis more important than ease of administration, some benchmark testing ofthe application on raw volumes and Quick I/O file systems should be used tohelp make the final decision.


35

Existing VxFS Database

Given the ease with which this conversion can be accomplished, thisconversion is also highly recommended. The conversion proceeds quickly,since the only operations are file renames and link creation, therebyrequiring little downtime. The Quick I/O performance is probably closeenough to raw volumes to outweigh the extra data movement operationsinvolved in converting VxFS to raw volumes.

Existing UFS Database

Since converting to any other storage method will require a data movementoperation, the decision to use either Quick I/O or raw volumes will hinge onperformance and administrative considerations. If administrative ease withgood performance is the goal, then Quick I/O is the choice. Again, for thoseapplications where absolute performance is more important than ease ofadministration, some benchmark testing of the application on raw volumesand Quick I/O file systems should be employed to help make the finaldecision.

Existing Raw Volume Database

In this case, one needs to examine a variety of factors, such as databasecomplexity, size, downtime requirements, and training of staff. For thosesites that can tolerate the downtime for data movement and would like theadditional administrative flexibility of Quick I/O, conversion probably makessense. For large mission critical databases with strict downtime goals,trained staff, and stable procedures in place to work with raw volumes, theconversion effort may be too costly in terms of risk, downtime, and procedurerework.

Backup and Restore Techniques

In this section, we focus our discussion on base UNIX issues and commands.We deliberately do not discuss any third party backup systems that may existfor dealing with database backups in some specialized manner.

It has long been an accepted fact that only raw disk volumes can provide thebest performance for an Oracle database, but at the cost of difficulties inmanageability and ease of backup and recovery. We have seen thatVERITAS Quick I/O for Oracle provides the benefit of using UNIX operatingsystem commands, including cpio, tar, and file system dump/restore forbackup and recovery of a database; whereas, only the dd command isavailable at the operating system level for backing up or restoring a databasebuilt on raw volumes.

The dd utility has the following disadvantages over the file system levelbackup and recovery utilities:

• Does not support logging date and contents of backup information


36

• Does not provide selective file backup• Does not provide label mechanism• Can not copy a directory structure• Does not support multi-volume copy operations

All of these are serious problems when managing a large databaseinstallation. In such an environment, the ability to verify what was backedup and when, to verify that physical media contains what it is thought tocontain, and to back up arbitrarily large objects as a unity, are all veryimportant benefits.


37

Concluding Remarks By conducting performance experiments, manageability studies, andadministrative studies we tested the assertion that VERITAS Quick I/Operforms comparably to raw volumes, while providing greater flexibility andease of administration. We are impressed by the detailed level at which theperformance equivalence holds, and pleased with the improvedmanageability evident when using the file system compared to the rawvolume configuration.

We believe that any installation using file systems and experiencingperformance problems should consider using Quick I/O rather than rawvolumes. We also believe that those installing new databases shouldseriously consider using the VERITAS Database Edition for Oracle for thecombination of performance and manageability it provides.


38

Acknowledgments The authors would like to acknowledge the following people without whom,in various ways, this work would not have been possible. From VERITAS,we would like to thank: Helen Cha for her central role in establishing theframework for this joint study and her continuing support; Rei Lee for histime and effort in helping us set up Quick I/O experiments and reviewing ourfindings; John Colgrove for his enthusiastic support for the study and help inanalyzing database performance issues; and Alex Miroschnichenko, whoprovided critical management support.

From Oracle Large Systems Support, we would like to thank: Darryl Presleyfor his role in establishing the joint study and his contribution as a seniorpartner and sponsor of the study; Vasu Manepally for providing assistance inlaying out and organizing the database for the experiment; Teri Bommaritofor her editorial help; Bill Cowden for providing critical managerial support;and Kim McElroy for administrative help.


39

Appendix A: VERITAS Quick I/O Overview

Why Quick I/O?

VERITAS Quick I/O addresses performance problems seen when using filesystem files for a database or administrative issues when using raw volumes.Oracle I/O performance may suffer when using file systems due to any of thefollowing:

• Asynchronous I/O is not supported for files.• “Write” operations must be done synchronously (using O_DSYNC) to avoid

UNIX file write-caching.• File accesses require using a per file read/write serialization lock.• Double-caching of data in both application and system caches wastes

memory.• Double-copying of data wastes CPU time.

The last two of these problems, double-caching and double-copying, areeliminated by using Direct I/O which moves data between the applicationbuffer and the disk. These problems are all solved if the file access is treatedas an access to a raw (character) volume, as done in Quick I/O.

How Quick I/O Works

The Quick I/O product provides a character-mode device driver and a filesystem name space mechanism. Files with names tagged for Quick I/O arerecognized by the VERITAS file system (VxFS) and converted to raw-modedevice opens. On the first access a pseudo-device is created to represent thefile. It is given the major number of the Quick I/O driver and a minornumber identifying the specific file.

The Quick I/O driver and VxFS coordinate to ensure that the block map forthe file is stable and available to the Quick I/O driver. The file system willnot attempt to reference the file without coordination with the Quick I/Odriver. These measures provide for efficient raw mode file access, whileproviding correct file access semantics when the file is accessed with a reador write system call.

When using a file accessed via the Quick I/O interface, Oracle does not haveto deal with the UNIX buffered write problem and the file system cache is notused. There is no duplication of data, and data is moved directly between theapplication buffer and the disk without double-copying of data. In addition,Oracle can use asynchronous I/O to a Quick I/O file, which usually providessignificant performance benefits. The database accesses the database files asif they were raw volumes, yet the database administrator manages them as ifthey were regular files.


40

For More InformationTo receive more information regarding the VERITAS Quick I/O study, pleasecontact:

Large Systems Support Tel: (650) 506-2952Oracle Corporation Fax: (650) 506-758420 Davis Drive Email: [email protected], CA 94002 Web Site: www.oracle.com/support/lss/

Oracle Large Systems Support is the premium service of Oracle SupportServices devoted to mitigating the risk in implementing unproven systemconfigurations. The LSS initiative is a partnership between the servicedivisions at Oracle, enterprise systems platform vendors, and OracleBusiness Alliance Partners. The LSS mission is to proactively ensurecustomers’ large enterprise systems are reliable and supportable.

qiowp

Documents