hp data white paper

Technical white paper

HP Data Accelerator Solution for Oracle Database: A Recommended Architecture for OLTP with Business Continuity

Internal PCIe IO Accelerator flash storage with an option for high availability

Table of contents

Executive summary 3

Overview and benefits 3

Data Accelerator with business continuity 4

Components of the Reference Architecture 4

Components tested 5

HP PCIe IO Accelerator 6

IO Accelerator types: SLC and MLC 7

Single-server test results 8

Low-level I/O stress tests 8

Device loading levels and the so-called “write cliff” effect 9

OLTP workload tests 10

Evaluating the optimum placement of Oracle redo logs 10

Tuning by favoring the Oracle log writer process 11

Single-server results analysis: conclusions and comparisons 12

Discussion and analysis of findings 12

Comparing to “HP High-Performance Oracle Database Solution” 12

Data Accelerator Solution: some comparable performance data points 12

Hybrid storage: a “best of both” approach 13

Hybrid configuration 1: use flash for “hot” database files 13

Hybrid configuration 2: Oracle Database Smart Flash Cache 14

Continuity considerations: increasing availability 14

Supporting Oracle Real Application Clusters (RAC) 15

Data replication to a standby server using a disaster-tolerant approach 15

An HP Data Accelerator architecture with Oracle Data Guard 15

A brief introduction to Oracle Data Guard and its pre-requisites 16

Data Guard and application failover: not all apps are created equal 18

Tested approach to incorporating Data Guard with Data Accelerator 19

Results summary (Data Guard testing) 21

Conclusions from Data Guard results 22

Summary 23

Appendix 1: HP PCIe IO Accelerators 23

Appendix 2: Oracle Advanced Compression 24

For more information 25

3

Executive summary

The HP Data Accelerator Solution for Oracle Database delivers extreme performance for OLTP database workloads without an extreme price tag, without high operating costs, and without forever locking you, or your IT budget, into an all-Oracle server solution.

We start with an enterprise-class HP ProLiant server, the industry’s most popular and comprehensive x86 server offering, and we replace the traditional external mechanical storage with HP IO Accelerators, internal PCIe-based high-capacity non-volatile flash-memory modules, in order to offload the database’s most active data sets, or to house the entire database. Because IO Accelerator storage provides orders of magnitude lower latency than traditional storage, the result is a database server which can deliver more than four times the transaction throughput of a competitive Oracle Exadata at half the acquisition cost and one-quarter the operational cost.

Because the storage is internal to the server, the configuration is very compact – requiring 70% less data center space than comparable traditional server configurations – and is substantially simpler than a competitive Exadata and its complex web of interconnected components. Indeed, unlike Exadata, a rigid and inflexible product which locks customers into all-Oracle solutions, the HP Data Accelerator Solution is based on open standards: you can choose from multiple software versions, operating systems, and patch levels (or even choose a database other than Oracle!) in order to meet the needs of your applications and your budget. You can update, expand, or extend your HP server environment as your needs change, you have your choice of support and service offerings, and you can change your software stack or even re-purpose the server as needed and leverage the HP solution over time. And this solution, with its capacity for multi-terabyte databases, was built from the ground up for OLTP workloads, rather than being a re-tread of a data warehouse design.

And while their advanced error-correcting capabilities make HP IO Accelerator modules extremely reliable, this solution is also combinable with database-replication tools such as Oracle Data Guard, allowing the creation of a multi-server solution which guards against hardware or database failures by eliminating single points of failure.

This paper will inform you about the basics of the Data Accelerator solution, including information about the HP PCIe IO Accelerator flash modules which are the keystone of the solution. We discuss the characterization studies we undertook to evaluate this solution, both in a standalone (single-server) configuration, and a redundant dual-server configuration which uses Oracle Data Guard to replicate the database across a dedicated high-speed 10GbE LAN connection. This paper is intended to be used with planned future Configuration Briefs that will document potential Data Accelerator configurations for workloads of various sizes.

Target audience: This HP white paper was written for IT professionals who use, program, manage, or administer Oracle databases (or who manage people who do so), and specifically for those who design, evaluate, or recommend new IT solutions.

This white paper describes testing performed in April-September 2011, with further testing performed February through April 2012.

Overview and benefits

The HP Data Accelerator architecture is based on HP IO Accelerator memory array modules: enterprise-class flash-based solid-state storage media which are deployed internally within HP ProLiant servers. HP IO Accelerators are available in three form factors (PCIe half-height, PCIe full-height, and a mezzanine card for blade servers) and two density levels (single-level or multi-level cell; SLC or MLC). Storage capacities range from 160GB for a half-height SLC card to 1.28TB for a full-height MLC card, with larger generation-2 IO Accelerator cards offering nearly double the capacity having just been released as this document was going to press. For more information about SLC and MLC variants of the HP IO Accelerator, see “IO Accelerator types: SLC and MLC”, below.

Potentially 13+TB of raw space can be configured using generation-1 IO Accelerator cards, which would accommodate a 6+TB uncompressed database when mirrored or a much larger database if Oracle Advanced Compression were used; even higher capacities will be possible using the newly introduced generation-2 IO Accelerator cards, which generally support twice the capacity of the earlier modules. (Note: all data reported in this paper is based on the use of generation-1 IO Accelerator cards.)

The HP Data Accelerator solution can deliver excellent OLTP performance above and beyond a traditionally designed server-with-external-storage combination: The HP IO Accelerator modules allow the database to be located in high-speed memory arrays rather than traditional fabric-based rotational storage media. The resulting substantial improvement in latency and seek times practically eliminates the bottleneck represented by the storage and retrieval of data to/from the database, allowing the full performance potential of the server to be realized.

4

Storing a database (or part of it) on solid-state memory-based devices instead of traditional mechanical media is not a new concept, but recent advances in size, reliability, and performance have made the concept much more viable and broadly applicable, and has led to a resurgence in the interest in such solutions.

HP’s solution portfolio for Oracle databases includes several reference architectures based on the workload, availability, and features customers require. The HP Data Accelerator approach is excellent for transactional databases (up to about 6-7 TB uncompressed just using generation-1 IO Accelerators alone) with high random I/O workloads, and where cost and manageability are top priorities. In addition, HP also offers the High Performance Database Solution, providing additional scalability and clustering options for performance-based OLTP workloads, the HP Scalable Warehouse Solution for business intelligence and analytic workloads exhibiting high sequential I/O activity, and the Unified Database Solution, which slashes database deployment time from weeks to hours, and cuts IT costs by letting you move resources on the fly to meet your changing business needs. Each of these solutions addresses an organization’s needs along with customer choice, so as to provide the most value for the proposed implementation.

Note that Oracle Advanced Database Compression, a feature introduced with Oracle Database 11g, can be used to great advantage with the HP Data Accelerator Solution to reduce the on-disk footprint of an Oracle database by as much as 40-60%, without significant reduction in database performance (based on tests previously performed by HP). Oracle Advanced Database Compression can thus be used to significantly increase the effective capacity of IO Accelerator storage for Oracle databases.

As with any reference architecture, the solution documented here represents one way – a recommended way, but not the only way – to architect a high-performance Oracle database server using an enterprise-class server equipped with IO Accelerator modules. The results discussed here reflect the results of, and best practices we learned from, our evaluations of the specific configuration(s) we describe below.

Data Accelerator with business continuity

The single-server Data Accelerator configuration tested for this paper (and described in the following sections) represents HP’s highest level of single-server reliability available in an x86 package, complete with the reliability of HP components, including the advanced error-correction capabilities of the HP PCIe IO Accelerator modules which will be discussed below. However, any solution constructed from a single server – with the notable exception of a NonStop server – will necessarily include potential single points of failure.

Eliminating the potential single points of failure requires the construction of a redundant multi-server solution in which the data is available to the redundant server. Since the data storage in a Data Accelerator configuration is internal to the server, a redundant solution will necessarily involve:

A secondary (redundant) server with its own storage for a second copy of the database

A software tool which effectively synchronizes the primary and secondary copies of the database

High-performance, low-latency networking infrastructure connecting the two servers to optimize the efficiency of the synchronization process

While more than one database-synchronization software tool is available, HP chose to employ Oracle Data Guard in a demonstrator configuration that illustrates how to eliminate single points of failure with the HP Data Accelerator solution. Details about our Data Accelerator with Oracle Data Guard architecture, along with more information about high-availability solutions and considerations in general, can be found towards the end of this document in the section entitled “Continuity considerations: increasing availability”.

Components of the Reference Architecture

The major components which comprise the (single-server) HP Data Accelerator Solution are:

An HP ProLiant server

– With internal disks for OS, user files, and Oracle installation

HP PCIe IO Accelerator modules

– Internally installed in the server

– Used for storage of the database

5

Software

– Red Hat Enterprise Linux

– Oracle Database

– Oracle Data Guard (for higher-availability configurations)

We deliberately do not specify any particular version of the Oracle database – any version (indeed, any database software at all!) that is supported on x86 Linux would be acceptable in this configuration. This allows great flexibility that is not present in competing Oracle appliances, which require exactly one database version and there are no possibilities to diverge from that version. The HP Data Accelerator Solution would thus be much better suited for users who do not wish to be locked into any particular Oracle version, or for applications which are fussy about their database version and/or which might require certain database patches to be installed (the Oracle appliances also do not allow any customization, including database patches).

Figure 1. HP PCIe IO Accelerator modules

Components tested

In order to test the proposed single-system reference architecture, a configuration was assembled following the above blueprint. The particulars of the configuration which was used in the testing are as follows:

HP ProLiant DL980 server

– Eight Intel® “Westmere” E7-4870 processors (80 cores; 2.4GHz with 30MB cache)

o Hyper-Threading enabled (80 cores = 160 threads)

– Two 300GB 10K RPM SAS disks, structured as RAID 1+0

– 1 TB system RAM

HP PCIe IO Accelerator modules

– Four 160GB SLC-type modules (in half-height PCIe slots 12, 13, 15, 16)

– Eight 1.28TB MLC-type modules (in full-height PCIe slots 2, 3, 5, 6, 7, 8, 9, 11)

Software

– RHEL 5.6, kernel 2.6.18-238.el5

– Oracle Database 11gR2 (11.2.0.2)

o Oracle Automatic Storage Management (ASM)

The differences between “SLC” and “MLC” and the ramifications of each are explained below.

In a Data Accelerator environment, the system is configured as a self-contained Oracle database server, with the operating system and Oracle software installed conventionally on traditional internal rotational hard disks. The PCIe-based IO Accelerator modules, installed in the server’s PCIe card cage, act as primary storage for the database itself. (See figure 2)

Note that the IO Accelerator storage pool acts in the role of traditional external storage arrays – the IO Accelerator cards, which are presented to the OS as standard block devices, are used to store the database and nothing else. The other system components (internal disks and system memory) perform the same roles that they would in a traditional architecture – there are no radical departures from the familiar.

It should be pointed out that IO Accelerators, like other storage products, are named based on the decimal definition of gigabytes – 1,000,000,000 bytes – and terabytes – 1,000 gigabytes. If we consider these products’ capacities according to the customary binary definition of GB and TB (1024x1024x1024 bytes per GB and 1024GB per TB), then a “1.28TB” IO Accelerator contains approximately 1192GB of addressable space, and the eight 1.28TB MLC IO Accelerators configured in the system we tested therefore provided a total of 9536GB of MLC space. For comparison, the “300GB” internal disk

6

we used for the operating system actually provides 279GB of addressable space. Throughout this document, while we use the nominal capacities to refer to the different “models” of IO Accelerator modules themselves, when we discuss actual storage capacities and the size of databases or files, we will always use the traditional 1024-based mathematics.

Figure 2. Placement of software and data components on the server configured for the Database Accelerator Solution

HP PCIe IO Accelerator

The HP IO Accelerator for ProLiant servers, the heart of the Data Accelerator solution, is a direct-attach flash-memory (solid state) storage PCIe card solution for application performance enhancement. Based on NAND flash technology, these devices, placed in the PCIe slots of a ProLiant server, provide terabytes of internal non-volatile storage that can be employed by the OS in the same manner as a traditional disk storage device. Data stored on an IO Accelerator can be accessed significantly faster than the data on a traditional mechanical storage device, with its “move the arm to the right track and wait for the data to come under the head” delays, plus the presence of the IO Accelerator on the PCIe bus eliminates the latencies associated with traversing complex Fibre Channel storage networks to an external storage array. (See figure 3, which compares the IO Accelerator – “PCIe flash disks” – to other types of storage.)

Note that HP and other storage vendors have long offered a solid-state disk (SSD) option for large storage arrays: the SSD is supplied in a form factor that duplicates the electronics of the array’s rotational disk modules, allowing it to be plugged into the back end of the array. The array benefits from a pool of SSD storage which is faster than rotational disk storage, but the latencies associated with the SAN that connects the array to the server are not eliminated. The IO Accelerator differs from a SAN-based SSD in that it is installed directly inside the server on the PCIe bus, much closer to the processors and therefore much faster to access.

And, thanks to recent advances in technology that have brought costs down significantly, the IO Accelerator is relatively inexpensive to purchase!

7

Figure 3. A comparison of different storage technologies showing the new flash-based solid-state storage tier with approximate (ballpark) capacities and access times

New generation-2 IO Accelerator cards released just prior to the publication of this paper offer nearly double the capacity of the older generation-1 IO Accelerator cards discussed throughout this paper. Clearly, these new cards can significantly increase the capacity of any IO Accelerator-based configuration, but we have not tested or evaluated these cards for the HP Data Accelerator Solution and will thus not be discussing them further in this paper.

Generation-1 IO Accelerator cards are available in nominal sizes of 160-320 GB for low-profile IO Accelerator cards, and 320GB to 1.28TB for full-height cards. Here are some key specifications for the 1.28TB MLC card which was used in our testing:

150,000 (512-byte) random read/write I/O operations per second (75% read, 25% write), with latency reported at under 40 microseconds.

185,000 (512-byte) random read-only I/O operations per second

IO Accelerator types: SLC and MLC

NAND flash storage, the main component of the HP IO Accelerator, is available in two varieties, single-level cell (SLC) and multi-level cell (MLC). HP thus offers SLC-based and MLC-based IO Accelerator modules; each has its distinct advantages:

MLC modules allow for the highest storage density at the lowest cost.

SLC modules are lower density (less data stored per unit area) but consume less power, can operate at higher temperatures, exhibit faster write/erase speeds, and can endure a significantly greater number of write/erase cycles than MLC modules

MLC memory is known to be subject to a higher rate of raw bit errors, but the HP PCIe IO Accelerator’s 39-bit error correction guards against such errors.

The different characteristics of MLC and SLC modules makes MLC ideal for storing large quantities of data with moderate write/erase demands – such as the bulk of an OLTP database. SLC would be ideal for highly write-intensive data, such as the Oracle redo logs.

Our test server was equipped with eight 1.28-terabyte MLC modules – these are full-height PCIe cards – for a total of 10.24 terabytes of MLC storage. In addition, there were four 160-gigabyte half-height SLC modules, for a total of 640 gigabytes of SLC storage.

8

Figure 4. SLC and MLC flash memory and the Oracle database objects assigned to each in the test configuration

Single-server test results

As part of our efforts to ratify the feasibility of the Data Accelerator solution, a variety of tests were conducted. Low-level I/O stress tests allowed us to better understand the optimum performance characteristics of the HP IO Accelerator modules. And a battery of OLTP-workload tests allowed us to conduct “what-if” analyses of various configurations and to evaluate this configuration while running real-world OLTP transactions.

Low-level I/O stress tests

We performed low-level random I/O tests to determine the maximum supportable I/O rates for these devices in what we would expect to be a typical OLTP environment. The tests were done using a low-level I/O exerciser tool, with the 8k-byte block size and the 80-20 read-write ratio that’s typical of OLTP databases. Results, shown in figure 5, were about 90,000 I/O operations per second (IOPS) per card. (These tests were conducted on the eight 1.28TB MLC IO Accelerator cards which would be used to store our database – all but the redo logs – in the OLTP workload testing which was to follow.)

Figure 5. Raw IOPS testing – eight 1.28TB MLC IO Accelerators . (“0.5” refers to using only one of the two partitions which constitute a single IO Accelerator card; each IO Accelerator card is typically presented to the operating system as two devices, each of which comprises half the capacity of the IO Accelerator card.)

9

Device loading levels and the so-called “write cliff” effect

Flash memory has a unique I/O characteristic: a memory location which is “used” (initialized with a stored value) cannot be re-written until it is first erased. To reduce the performance impact of this extra erase operation, a “cleanup” process operates in the background, performing physical erase operations on data that has been released at the OS level (when it is freed by deletion or if it is the “old” data replaced during an update). The goal is to maintain a large pool of pre-erased memory locations available to be written on demand.

Write-performance degradation will occur if there’s not enough clean memory to accommodate the write load; the likelihood of such degradation depends on whether the demand for new locations exceeds the supply. This in turn depends on the write rate, the speed of the cleanup process, and how much empty (unused or unformatted) space is available. In short, if the device is largely empty or unformatted (and thus there are plenty of clean locations available), the device should not be hindered in satisfying write requests; if it is full, then write-performance degradation may occur if the write rate is very high.

This characteristic of flash memory is sometimes referred to as “the write cliff”: when demand for fresh memory exceeds the rate at which memory is cleaned, write rates can fall off significantly.

As part of our evaluation of the Data Accelerator solution concept, we undertook some elementary tests to confirm our understanding of the write-degradation issue, and judge its impact on performance. This evaluation consisted of a 30-minute run of a raw I/O exerciser issuing random writes against four 160GB HP IO Accelerator (SLC) modules. The test was conducted first on a device that was 62.5% formatted (the 160GB card was formatted to present only 100GB of available storage); the device was then fully formatted (to 160GB) and the test repeated. The results are shown in Figure 6.

Figure 6. How to avoid the “write cliff” by keeping the device partially empty/unformatted (or keeping write rates low). Data is the average number of I/O operations per second (IOPS) to each of the four IO Accelerator cards.

Both the fully and the partially formatted devices performed relatively well at the start of the test. As time went by, the full device began to experience write-performance degradation, whereas the partially-empty device had no trouble keeping up with demand throughout the test. (Note that while there is clear evidence of write-rate degradation, it can hardly be described as a “write cliff”. Even so, this is a term that is used commonly in the industry to describe this phenomenon.)

As we would learn later, the write rates generated by our OLTP workload would not be sufficiently high to trigger this “write cliff” effect even on a fully formatted HP IO Accelerator device, but we nevertheless undertook further tests to determine the optimal placement of Oracle’s redo logs, the most write-intensive component of a typical OLTP database. We discuss these tests in the next section.

10

It was our conclusion from the above tests that limiting the formatted size of the IO Accelerator devices is not necessary, because typical OLTP workloads would not be expected to generate write rates high enough to trigger the write-cliff effect. Only for special-case uses where sustained write I/O rates greater than 15,000 per second per device are expected would we advise reducing the formatting of the device to avoid the write cliff.

OLTP workload tests

While the previous tests were simple raw I/O tests, the ones that follow were full-blown database tests involving a fully populated Oracle database. The workload and database were based on a popular industry-standard OLTP benchmark, but the tests were run with the intent of characterizing the type of performance a typical customer running a similar workload might achieve.

The database used in these tests was a 30,000-warehouse database, sized at 3TB total. The working-set footprint of the database was 100% (transactions during the test exercised data across the full 3TB data space).

Due to legal restrictions imposed by Oracle, we are not able to publish the actual results (transaction throughput), but we can discuss our experiences and the lessons learned from the testing. (We do share the results of other performance tests later in this paper.)

Evaluating the optimum placement of Oracle redo logs

In the first OLTP workload tests, we decided to evaluate whether we could improve performance by controlling the location of the Oracle redo logs. We compared three configurations:

No special location for the redo logs – same IO Accelerator cards, in the same partitions, as the data

Move the redo logs to separate partitions on the same cards (from the perspective of the OS and Oracle Automatic Storage Management, the redo logs would be assigned to separate devices and a separate ASM diskgroup, but these devices map back to the same physical cards as the partitions storing the data).

Move the redo logs to separate devices entirely. For these tests, we added four 160GB low-profile SLC cards to our test configuration, and moved the Oracle redo logs to these devices.

Figure 7. Finding the best placement for the Oracle redo logs – see section following figure 8 for explanation of “load generators”

Relative Throughput

Clearly, the best results were obtained when the redo logs were located on dedicated storage devices. This is no surprise, since separating the redo logs has been an accepted Oracle best practice for many years.

11

Tuning by favoring the Oracle log writer process

Once we determined the optimum placement for the Oracle redo logs (see above), we decided to test two commonly available and generally accepted tuning methods which have been proven to be effective in past Oracle tests. Both of these involve ways to allocate more resources to the Oracle log writer (LGWR), a critical process which can sometimes act as a governor on the amount of work the system can do. These methods were:

a) Dedicate an entire processor core to the log writer process by pinning it to a single core and preventing other processes from running on this core

b) Raise the execution priority of the log writer process so it does not have to compete for CPU resources

c) Do both of these at the same time

The results of our tests appear below (figure 8).

Figure 8. OLTP workload tests, 3TB database located on 12 IO Accelerators (8 MLC for data; 4 SLC for redo)

Relative Throughput

The tests using the “c” tuning method (dedicate a CPU core to, and raise the priority of, the log writer process) produced the best throughput.

Note that we’ve labeled the “LGWR prioritized” test results as “recommended”. Because of functional limitations of the Linux version we tested, way too much effort turned out to be necessary to give the log-writer process exclusive access to a single processor core. We thus do not recommend that customers attempt the “core pinning” method if they are running the Linux 5.6 kernel. (Newer versions of Linux include a feature – “cgroups” – which we believe would allow the LGWR process exclusive use of a processor core without a great deal of work on the part of a system administrator.)

So if the log-writer process had exclusive use of a single core (in the “core pinning” test), then why did raising its priority (in the “Pinning + LGWR” test) result in greater throughput? Both methods attempted to do the same thing – give the log writer as many CPU cycles as it could consume – and if either method was 100% successful, then adding the other method should not have raised the throughput (if the LGWR process truly had exclusive access to its assigned processor core, then its priority should not matter since no other processes would affect its execution). The fact that the two methods had a different effect on performance (and that the two of them combined for an even greater effect) suggests that neither method is quite “watertight”. This has to do with the effects of Hyper-Threading (in which each processor core appears to the OS as if it was two processors) and the resulting need for – and difficulty of – limiting access to the “other half” (companion Hyper-Thread) of the processor dedicated to the log writer.

12

A note about the load generators: these processes, which could be loosely interpreted as “artificial users” in that they submit OLTP transactions which drive the database workload, would normally be run on one or more external servers, submitting their transactions through the network to the server under test. In our tests, however, the load generators were run on the server-under-test itself, which added CPU overhead that’s unrelated to the Oracle workload we were observing. While it’s possible that better transaction throughput could have been achieved without this extra load on the CPUs, it was an Oracle bottleneck and not a lack of CPU power that was the limiting factor in our tests.

Single-server results analysis: conclusions and comparisons

Our interpretation of the above results follows, along with additional information to help you better understand the HP Data Accelerator Solution.

Discussion and analysis of findings

Our test results allowed us to draw a number of conclusions about the HP Data Accelerator Solution for Oracle Database. First, we were very pleased with the sustained throughput results, which compare very favorably to similar tests we have conducted using traditional SAN-based mechanical storage arrays; the HP PCIe IO Accelerator storage solution was clearly responsible for a significant improvement in performance. Low-level I/O tests confirmed that the IO Accelerators have the raw-I/O muscle to accommodate OLTP-type workloads, including the ability to handle heavy write rates typical of Oracle redo log activity, without being affected by the “write cliff” issue that is anecdotally associated with flash memory storage.

These results, in conjunction with other tests of the Data Accelerator solution as well as of the database-on-flash concept itself (see “Data Accelerator Solution: some comparable performance data points“, below), give HP great confidence in the power of the HP Data Accelerator Solution.

Comparing to “HP High-Performance Oracle Database Solution”

HP also offers another Oracle Database solution using flash storage for the database: the HP High-Performance Oracle Database Solution (HPDBS). Based on externally connected HP VMA flash arrays, this solution provides up to 80TB of RAID-protected flash storage with performance characteristics very similar to those of the Data Accelerator Solution.

Comparing the two solutions, the Data Accelerator has the smaller physical footprint (being self-contained within a single ProLiant chassis, whereas the HPDBS involves one to eight 3U-sized external arrays in addition to the host server), uses less power, and requires less cooling. Both solutions are based on the latest HP ProLiant servers and therefore derive the full benefit of a powerful server largely freed from the chains of a classic disk bottleneck. The HPDBS of course has greater database capacity, but slightly greater latencies than the Data Accelerator solution (mainly because the VMA arrays are located farther from the processor than the IO Accelerators). Finally, the HPDBS’s external VMA arrays are potentially sharable among multiple servers, introducing the possibility of high-availability clustering (Oracle RAC is not supported at this time but may be supported in the future); for the Data Accelerator Solution, sharing of the internal IO Accelerators is not possible.

The HPDBS brings all the same advantages as the Data Accelerator Solution when compared to expensive competing solutions from Oracle (see next section).

Whether to choose the Data Accelerator Solution or the HPDBS for your Oracle database is first and foremost a matter of database size but may also involve logistical preferences (internal vs. external storage; physical space requirement; cooling and power issues).

Data Accelerator Solution: some comparable performance data points

Various tests have been conducted, in HP’s labs, or in competitive customer situations, which offer more confirmation of the excellent performance characteristics of the basic concept of an HP ProLiant server equipped with flash storage.

Although these tests do not involve the exact architecture of the Data Accelerator Solution, their excellent results validate the concept and lend great weight to HP’s faith in the capabilities of the Data Accelerator Solution.

13

A customer benchmark

HP has been involved in several customer benchmark exercises where a flash-equipped HP ProLiant server was put up against a more expensive competitive solution in a closely controlled test involving real customer workloads.

In one such benchmark, the HP Data Accelerator solution, configured with ten HP IO Accelerators (1.28TB each), achieved more than FOUR TIMES the transaction throughput of the competitive Exadata solution from Oracle. The HP Data Accelerator solution was half the price of the competition, and will cost 75% less to operate (that is, the Exadata’s operational cost would be four times that of the HP solution).

A laboratory performance evaluation

HP’s performance labs have undertaken an internal evaluation of the potential OLTP capacity of a ProLiant server equipped with flash storage using a popular industry-standard OLTP benchmark. The results of this test, which were extremely favorable compared to those of other solutions in the same class (especially in the price-performance category), give HP great confidence in the use of flash for database storage. The test was performed using a 29TB Oracle database, all of which was accommodated in flash storage.

Because Oracle prohibits third parties (including customers) from reporting the results of performance tests, we are not able to disclose our results, but we offer this (admittedly subjective) characterization as additional evidence that our flash-based configuration is capable of handling highly demanding OLTP workloads.

Hybrid storage: a “best of both” approach

A reference architecture depicts one possible method of configuring or architecting a solution; the architecture we’ve discussed and reviewed here is one way to configure an Oracle Database server based on the HP IO Accelerator modules, but other approaches are quite valid, and we will briefly discuss a few such alternatives here.

Hybrid configuration 1: use flash for “hot” database files

The Data Accelerator Solution approach could easily be modified to accommodate databases which would be too large to fit comfortably in a single server’s worth of IO Accelerator devices by taking a hybrid approach: IO Accelerator storage could be used to store only the database’s “hottest” objects (redo logs plus the most frequently accessed tablespaces), while the rest of the database is located on traditional external SAN-based mechanical storage arrays. While this approach would be less compact than an all-flash Data Accelerator configuration, and would require a bit more care in its initial layout, various conceptual tests we’ve run in the past suggest that a hybrid architecture such as this could provide substantial performance benefits compared to an all-mechanical storage configuration. Since some of the server’s I/O slots would have to be used for Fibre Channel connectivity, the flash capacity of the server would be somewhat less than an all-flash solution (since Fibre Channel adapters are low-profile cards, however, this would be less of a reduction on servers equipped with a pool of dedicated low-profile slots).

Figure 9 illustrates how such a hybrid approach might look.

14

Figure 9. A hybrid configuration using both HP IO Accelerator storage and external RAID array storage

Hybrid configuration 2: Oracle Database Smart Flash Cache

Another hybridized approach which would potentially deliver the value of flash performance in combination with the flexibility and large storage capacity of traditional external storage would be to use HP IO Accelerator flash memory configured with Oracle’s Database Smart Flash Cache. This Oracle database feature, which is supported only with Oracle Linux and not Red Hat, increases the effective size of the database buffer cache by using flash memory as a secondary cache: objects which would have been aged out of Oracle’s main memory-based buffer cache are instead migrated to the Smart Flash Cache, where they are available for read-only use. This greatly expands the number of objects stored in cache, which eliminates the need for read operations from the external physical disk whenever those objects must be accessed. (Updates to these objects, however, require that the data be returned to the main memory-resident buffer cache; even in this case, physical disk I/O is avoided since the object can be restored from the Smart Flash Cache.) In this scenario, the entire database (including redo logs) would be housed on traditional external storage; it would just be the Oracle Database Smart Flash Cache that is configured to use the HP PCIe IO Accelerator modules. For that reason, the degree of performance improvement from this approach is limited to the number of disk reads which can be eliminated thanks to the Smart Flash Cache – write performance would not benefit at all.

(Note that Oracle Database Smart Flash Cache is not the same as Exadata Smart Flash Cache; the former is a feature of Oracle Database 11gR2 that is limited to Oracle Linux and Solaris; the latter is a feature of Oracle Database 11gR2 that is limited to Exadata.)

Previous HP testing with Oracle Database Smart Flash Cache determined that the performance advantage of this approach is less than it would be if an equivalent amount of main memory were added to the server and the primary Oracle database buffer cache were increased to take advantage of that additional memory. In order for Smart Flash Cache to return worthwhile performance benefits, the Smart Flash Cache would have to be configured to be larger than the maximum main memory capacity for the server. (Our previous testing showed a performance penalty for configuring too much Smart Flash Cache, however.)

Continuity considerations: increasing availability

The HP Data Accelerator Solution is based on a single-instance Oracle database, which represents a potential single point of failure; without further infrastructure to provide continuity, the database would become unavailable in the event of a database or server fault until such fault is rectified. Toward this end, any traditional business-continuity configuration or method that is supported for Oracle databases on Linux servers would be applicable to a configuration based on the HP Data Accelerator Solution, with the notable exception of Oracle Real Application Clusters.

15

Supporting Oracle Real Application Clusters (RAC)

Real Application Clusters (sometimes redundantly called “RAC clusters”) require shared storage between multiple servers: the database, stored on the shared storage, is thereby available for multiple servers, acting as nodes of an Oracle cluster, to access. Since HP IO Accelerator modules are internal to the server, they are not sharable; when any part of the database is stored on an HP IO Accelerator, it cannot be accessed by other servers, so a RAC cluster is not compatible with a Data Accelerator configuration which has the database located on IO Accelerator modules.

The only possible approach which would permit the construction of a RAC environment in combination with HP IO Accelerator cards would be to use the “hybrid configuration 2” that we just discussed: store the entire database on shared external storage, with the HP IO Accelerator storage used exclusively for Oracle Database Smart Flash Cache (which is indeed supported for use with RAC).

Otherwise, if RAC is absolutely required with a flash-storage database server, the HP High-Performance Database Solution is the better choice because its flash-based database storage is potentially sharable between servers.

Data replication to a standby server using a disaster-tolerant approach

Replication – the act of maintaining a physically separate copy of the active database on a separate set of storage devices – is generally considered a “disaster-tolerant” rather than a “high-availability” (HA) solution largely because it:

Is feasible over the large distances generally required for applications to survive all but global-scale disasters

Performs at a level generally lower than typical HA solutions, which can typically provide significantly better scalability characteristics. Database replication over long distances introduces performance issues related to long-distance latencies which HA solutions do not have to deal with.

However, there is theoretically no reason that a solution designed to provide disaster tolerance could not be used between servers in the same data center to provide a higher level of availability. The close proximity of the secondary server, presumably, would result in various improvements (lower latency, faster failover) over a traditional disaster-tolerance-at-a-great-distance solution.

Oracle Data Guard, a replication-and-recovery technology that is generally used to provide rapid disaster recovery to distant data centers, is an example of a tool that can indeed be used within a local campus or even within a single data center without as much latency impact when distances are shortened. An Oracle Data Guard configuration is much simpler, much easier to manage, and much less costly than Oracle RAC, while providing excellent recovery-from-failure times.

The following section describes an architecture for a Data Accelerator solution which uses Oracle Data Guard to replicate the database to a secondary (standby) server co-located with the primary server, thereby eliminating single points of failure.

An HP Data Accelerator architecture with Oracle Data Guard

It’s one thing to state that data replication techniques could theoretically be used with the Data Accelerator Solution (see previous section), and entirely another thing to prove that such an approach actually works. Thus, HP’s Oracle lab set out to assemble a no-single-failure-point configuration using Oracle Data Guard to replicate an Oracle database from a Data Accelerator-equipped primary server to a similarly equipped standby. The purpose of this section is to state categorically that the approach is quite valid, as well as to discuss its performance implications.

But first, it would be advantageous to introduce a few basic concepts and requirements of Data Guard and some of its related terminology. Figure 10 depicts the major components of an Oracle instance as they relate to transaction logging, and will be useful in the discussion of Data Guard in the next section.

16

Figure 10. How Data Guard uses the transaction-logging architecture of an Oracle database to replicate transactions to a standby database. When transactions make changes to the database, the user processes write their changes not only to database buffers in system memory but also log those changes to log buffers. While the database writer (DBWR) processes write the changes from the database buffers out to the database files in the background, the log writer (LGWR) process immediately writes the log entries to a “redo log” which can be used to re-create the transaction in case of a failure. Because the redo logs are circular and must eventually be re-used, an archiver process (ARCH) copies the contents of the redo logs out to “archive logs” which provide a historical record of all database changes (and which can be used to resurrect a failed database in conjunction with database-file backups created at some point in the past). Oracle Data Guard 11gR2 copies the redo information right from the log buffers across a network to a standby server, where it applies them to a standby copy of the database.

A brief introduction to Oracle Data Guard and its pre-requisites

In a nutshell, Data Guard exploits the transaction-logging architecture of an Oracle database by copying transaction log (“redo”) entries over a network to one or more standby servers, and applying those records to a database-copy on the remote server(s). This data-replication action results in a duplicate of the primary database on the standby server(s) (albeit potentially slightly lagging behind the primary). Data Guard makes sure that redo log entries are applied to the standby in strict chronological sequence (necessary to ensure data integrity) even if the transmitted redo information may arrive out of proper order. However, depending on the Data Guard replication methodology one chooses, some transactions may be lost in the event of a crash of the primary server. More specifically, Data Guard supports the following methodologies, known as “protection modes”:

Maximum performance (default): Performance of the primary database is minimally impacted, because all transactions behave the same as they would in a non-Data-Guard configuration (the transactions complete when all associated redo entries are written to the redo log on the primary). For replication purposes, redo information is asynchronously copied to the standby (remote) server. When a failure of the primary database/server occurs, there may be transactions which have completed on the primary but whose redo has not arrived at the standby server; these transactions will thus be lost when the standby database becomes the primary. This could be as few as the last (most recent) transaction or two, but the nature of networks is that packets may arrive out of order, so gaps may develop in the received redo entries. Because changes must be applied to the standby database in the exact order in which they happened on the primary, the sequential application of redo to the standby database will stop with the last complete record before any such gap. That last complete record will represent the last (chronologically newest) transaction that will be reproduced in the standby database when a failure occurs; the transactions in the “gap”, and all subsequent transactions, will be lost. (This gap problem would be expected to be far less significant for a dedicated, high-speed, low-latency local-area network than for a slower, potentially shared, higher-latency, longer-distance wide-area network, resulting in far fewer transactions lost when failure occurs, but no guarantees can be made in max-performance mode.)

17

Maximum availability: Transactions do not complete on the primary database until the corresponding redo information has been propagated to at least one standby database; redo information is transmitted synchronously. However, when a network issue causes the primary server to lose touch with the standby server(s), Data Guard starts behaving like “maximum performance” mode (transactions are allowed to complete even though redo information has not been propagated to the standby). This allows the primary database to remain available despite the outage (redo information is saved for asynchronous transport when network connectivity resumes).This mode guarantees zero data loss (the standby database is synchronized with the primary with no lag time) as long as synchronicity can be maintained with the standby, but results in slower transaction completion times. Availability of the primary database is not affected by a loss of network connectivity to the standby server(s), but transaction loss could occur if a failure were to happen while the standby is out of touch.

Maximum protection: This mode is similar to “maximum availability” in that redo information is transmitted synchronously and transactions on the primary do not complete until their redo information has been propagated to at least one standby database; however, when the standby server is not reachable, the primary database prevents the possibility of a data loss situation by becoming unavailable (it actually shuts down).

Data Guard offers two basic types of standby databases: physical and logical; the difference is due to the method in which the redo information is applied to the standby database. A physical standby is an exact physical duplicate of the primary database; at any point in time, the two are exactly identical save for the effects of whatever transactions have most recently occurred on the primary but have not yet been applied to the standby. Redo information is applied directly to the standby database – that is, the standard Oracle database recovery process (reading the redo records in sequential order, and changing the affected data to reflect the results of each represented transaction) is continuously used to update the standby. (Data Guard can also be used to create a “snapshot standby”, which is a physical standby in which the redo information is not applied until a later time, allowing test scenarios to be carried out on the snapshot copy. At any time, the snapshot standby can be restored to a full physical standby by backing out the test transactions and applying the accumulated redo.)

A logical standby differs from a physical standby in that redo information is not applied directly to the standby but is rather converted into equivalent SQL statements which are then executed on the standby database to re-create the original transactions. (In simpler terms, a logical standby is kept up to date by re-creating the transactions from the primary – “UPDATE Employees SET Address='3000 Hanover St', City='Palo Alto' WHERE Address='19420 Homestead Road' AND City=’Cupertino’” – whereas a physical standby simply changes the affected data directly: “change rows X,Y, and Z to ‘3000 Hanover Street Palo Alto..’”) The primary and logical standby databases are not physically identical but are functionally equivalent: the same query executed on both would yield the same results, although very possibly the pattern of I/O operations performed by Oracle to obtain those results would be different. A logical standby has some advantages – it is available for read/write access, whereas a physical standby can only be accessed for reading (see next section regarding Active Data Guard); it can be patched or upgraded and then become the primary while the “main site” (former primary) is similarly upgraded (rolling upgrade) – but the use of certain Oracle data types preclude the use of logical standby.

Data Guard licensing and add-ons, including “Active Data Guard”

Data Guard is part of, and included with, Oracle Database Enterprise Edition (it is not available with lesser editions of Oracle). An added-fee enhancement to Data Guard, called “Active Data Guard”, allows a physical standby database to be opened for read-only access at the same time as redo information is being applied (without Active Data Guard, the database can only be accessed while the application of redo is suspended). Logical standby databases, however, may be accessed (read/write) at any time without any extra license requirements. The Active Data Guard option is not necessary to make a Data Accelerator solution highly available, but it’s a useful option that allows the standby database to be put to work as a secondary query server, potentially offloading some transactions from the main database server.

Redo-Transport Compression is another added-fee option offered by Oracle that is often used with Data Guard, chiefly benefitting environments whose primary and standby servers are separated by great distances and relatively slow (high-latency or low-bandwidth) and/or expensive (high cost per byte) wide-area networks.

Data Guard: associated components and prerequisites

Certain Oracle products and/or sub-products and/or configuration modes are either required to be used or are recommended for use with Oracle Data Guard:

Data Guard Broker – Oracle’s management framework for Data Guard; permits creation, monitoring, and management of Data Guard environments through its GUI (which is incorporated into Oracle Enterprise Manager) or through its command-line interface, DGMGRL. This is highly recommended, though not required.

18

Enterprise Manager Grid Control – Provides access to the Data Guard management pages that are part of Data Guard Broker. Enterprise Manager is generally installed on a server that is reserved for management tools, separate from the server(s) used to run the database. Highly recommended (required if Data Guard Broker’s GUI is desirable).

Data Guard Observer (aka Fast-Start Failover Observer) – a process which runs on a separate server and which monitors the primary and standby environments. When the primary is unreachable for more than a certain (configurable) amount of time, Observer initiates a “Fast-Start Failover”: a completely automatic failover operation. Optional but highly recommended in conjunction with Data Guard Broker.

Oracle Restart – a new Oracle 11gR2 feature which enforces the proper launch order (in accordance with component dependencies) of the various Oracle components needed when a database service is started up. It can also be used to monitor critical components and restart them when failure occurs. Highly recommended; required in conjunction with DG Observer and especially when transparent application failover (see next section) is enabled.

Database archivelog mode – the standard Oracle mechanism for preserving redo information that forms the primary and most elementary method of restoring a database in the event of failure. All Oracle write transactions (updates and inserts) are logged in the Oracle redo logs, but by default, these logs are overwritten over time. In order to preserve the record of executed transactions so that the database can be fully re-created at any time, archivelog mode is enabled, which causes the redo information to be archived. Archivelog mode is quite commonly used for all but the most unimportant databases, and is a requirement for Oracle Data Guard.

Database force-logging mode – Oracle requires that this database option be enabled whenever a standby database is in use. This option ensures that all writes to the primary database are logged through Oracle’s redo logging mechanism. Specifically, the FORCE LOGGING clause of ALTER DATABASE overrides any NOLOGGING options which may have been set on any individual database objects. The NOLOGGING option was provided to allow for certain operations – usually those which could be easily reproduced in the event of problems, notably bulk inserts – to be performed without the performance impact of generating redo data. FORCE LOGGING is required when a standby database is in use because it is only through the redo-log information that database changes to the primary database are propagated to the standby database(s).

Flashback Database and Fast Recovery Area – Oracle’s Flashback Database option is used to quickly restore a database to its state at a previous (usually somewhat recent) point in time; it is usually used to “erase” the effects of major manual errors (accidentally dropped tables, for example) or other issues that might impede a database’s availability. It can be used with Data Guard to restore a failed primary database (after the former standby has switched to the primary role) and, with fairly minimal effort, enable that former primary to begin functioning as a standby database. As such, it serves a very convenient purpose for Data Guard environments and is highly recommended. (The Fast Recovery Area, FRA, is a storage area – which was known as the “Flashback Recovery Area” prior to Oracle Database 11gR2 – that’s reserved for the “flashback logs” and other components required to support the flashback database capability. An FRA must be configured into the database if Oracle Flashback Database is enabled.)

Oracle Net – the same Oracle network stack which supports communication between client processes and the database server (through the use of the Oracle listener processes) is used in a Data Guard environment to effect communication between the primary and standby databases.

More information about Data Guard

For more information about Data Guard, see the Oracle Data Guard article on Wikipedia, or Oracle’s manual, “Oracle Data Guard Concepts and Administration”, specifically section 1, “Introduction to Oracle Data Guard”. (http://docs.oracle.com/cd/B28359_01/server.111/b28294/concepts.htm)

Data Guard and application failover: not all apps are created equal

Some applications are better suited to Data Guard than others: when a failure occurs and a standby takes over the role as the primary (production) database, some applications are particularly adept at reconfiguring themselves to the new location of the database with a minimum of manual intervention and a minimum of delay, while other applications might have to be manually reconfigured in order to connect to the new primary database.

Data Guard supports a number of features designed to allow applications to be simply or even automatically failed over to the new primary database after a failure occurs (for example, Fast Application Notification, Transparent Application Failover, Fast Connection Failover). Some application programmers have taken advantage of these features, and some have not.

While some Oracle applications – such as PeopleSoft and Siebel – are designed to take advantage of these special features and generally can be expected to fail over gracefully in the event of a database failover scenario, some applications (notably E-Business Suite applications) may require some degree of manual intervention to ensure that

http://en.wikipedia.org/wiki/Oracle_Data_Guard

http://docs.oracle.com/cd/B28359_01/server.111/b28294/concepts.htm



19

they will function properly after a database failover. We highly recommend that you verify the suitability of your applications to Oracle Data Guard before considering a Data Guard implementation.

Tested approach to incorporating Data Guard with Data Accelerator

In order to prove our contention that Data Guard could be used effectively with the Data Accelerator solution, we set up a co-located dual-server configuration, duplicated our database from one server (the “primary”) to the other (the “standby”), and configured Data Guard to replicate all transactions from the primary to the standby. Both servers, of course, were equipped as Data Accelerator servers, with the database fully installed on IO Accelerator flash devices, as has been previously described in this paper. In keeping with good HA practices, the two servers were set up with redundant power and network connections (as well as various other standard practices for avoiding single points of failure). Oracle ASM was used to manage database storage, and was set in “normal redundancy” mode (data plus one redundant copy).

Since the goal was to optimize performance by minimizing latency across the network to the standby server, we set up a dedicated 10Gb Ethernet LAN in place of what traditionally would be a wide-area network connection between the two servers, and tested both failover capabilities (planned and unplanned) as well as the impact of Data Guard on the performance of the primary server under an OLTP workload. We evaluated only Data Guard’s physical standby capability, but we did so using both “maximum availability” (synchronous redo transport) and “maximum performance” (asynchronous redo transport) protection modes. We also tested the impact of accessing the standby database (including using “Active Data Guard” to allow read access to the standby database even while incoming redo data continued to be applied) on the performance of the primary database.

To comply with Data Guard requirements and recommendations, we additionally made the following changes to the environment that we had used for the single-server tests we described earlier in this paper:

Invoked archivelog mode and enabled “force logging” on the database.

Set up a Fast Recovery Area and enabled Flashback Database capabilities on the database.

Installed and configured Oracle Data Guard and set up a physical standby database on the second server.

Set up a dedicated management server and installed and configured Oracle Enterprise Manager (with Data Guard Broker) and Data Guard Observer. An OLTP workload-generation tool was also installed on this system.

The two database servers themselves were configured with twelve HP PCIe IO Accelerator modules, just like in the single-system tests, four 320GB SLC modules and eight 1.28TB MLC modules in each server. While the total aggregate size of the flash storage pool was thus slightly greater than in the previous tests (whose four SLC modules were the 160GB models), the total capacity for database storage was actually less than in the original single-server tests because:

ASM “normal redundancy” (two copies of the data instead of one) was used

A Fast Recovery Area was instituted, which required substantial allocation of space (we allocated 2.5TB to the “data” disk group, and 2.0TB to the “FRA” disk group).

The maximum size of the database (data, indexes, temp space, undo, etc., but excluding redo) supportable by this test configuration was thus 2.5TB.

Each server’s built-in 1GbE NICs were configured to provide redundant connections to the public network, and an additional pair of 10GbE NICs was added to serve as the redundant high-speed network dedicated to Data Guard replication. Oracle Database 11gR2 was once again used as the database platform, and Red Hat Enterprise Linux (RHEL) was the operating system, though version 5.7 was used for these tests (compared to 5.6 for the single-server tests).

To summarize:

Two HP ProLiant DL980 servers, each with:

– Eight Intel “Westmere” E7-4870 processors (80 cores; 2.4GHz with 30MB cache)

o Hyper-Threading enabled (80 cores = 160 threads)

– Two 300GB 10K RPM SAS disks, structured as RAID 1+0

– 1 TB system RAM

– Four embedded NICs (two of which were used for public network) plus two PCIe 10GbE NICs for inter-server connectivity

20

– HP PCIe IO Accelerator modules: Four 320GB SLC-type modules, eight 1.28TB MLC-type modules

– Software

– RHEL 5.7, kernel 2.6.18-274.el5

– Oracle Database 11gR2 v.2

o Oracle Automatic Storage Management (ASM) in “normal redundancy” mode

– Oracle Data Guard

An HP P2000 was added for backup storage; it was connected to the standby database server.

One HP ProLiant server was used as a management server (Oracle Enterprise Manager, Data Guard Broker, Observer, and the OLTP workload generator).

The following diagram (Figure 11) depicts the configuration we used for these tests.

Figure 11. Configuration used for testing Data Accelerator-equipped servers with Oracle Data Guard. NICs 1 and 2 on each server were 1Gb Ethernet used for the public network; the 10GbE NICs (3-4) comprised the dedicated, redundant interconnect for Data Guard.

21

Results summary (Data Guard testing)

Our testing had two goals:

To verify that Data Guard works properly with the database installed on IO Accelerator flash modules (that is, that Data Guard works the same as it does when the database is installed on traditional rotational storage).

To learn about the performance characteristics of a co-located pair of Data Guard-protected servers (that is, when the standby server is in the same rack as the primary rather than halfway across the world, connected by a high-speed dedicated network rather than a slower wide-area network) with a representative OLTP workload. Our hope was that the reduction in latency would be sufficient to allow synchronous transport mode to be used with a minimum of performance impact.

Our functional testing found that Data Guard worked splendidly with the Data Accelerator configuration, but the reduction in latency associated with the proximity of the standby server to the primary was not sufficient to overcome the performance penalty of synchronous redo propagation.

Functional testing

Once Data Guard and all its various pre-requisite Oracle components were installed and configured, and the database was copied to the standby server and Data Guard replication was established, a series of functional tests was conducted, mainly to verify that switchover (planned reversal of roles, the standby becomes the primary) and failover (unplanned role change) function properly and in a timely manner.

Our functional tests measured switchover/failover times in the 19-25 second range (that is, the role change from standby to primary was complete in 25 seconds or less), and role changes completed successfully for switchovers, intentional failovers, and even a few unintentional failovers that happened on their own. When Data Guard Broker was used, switchover and failover completed automatically in the cleanest possible order.

A zero-transaction-loss solution with pretty good performance

Performance tests of Data Guard’s “maximum availability” mode (synchronous redo transport) revealed that for a typical OLTP workload with a substantial number of write/update transactions, there was a significant performance impact of requiring all write/update transactions to wait until their redo is written to the standby system before a transaction is complete (see Figure 13).

So while synchronous transport guarantees that no transaction will be lost due to server failure (because no transaction is considered complete on the primary until its redo information is written to the standby server), the overall performance of any solution which includes synchronous Data Guard will be less than optimal.

An extreme-performance solution with pretty good transaction-loss protection

Performance tests of Data Guard’s “maximum performance” mode (asynchronous redo transport) established that the performance impact of asynchronous transport was very small (see Figure 12).

The tradeoff, as we explained earlier, is that asynchronous transport cannot guarantee that every transaction completed on the primary database will have been written to the standby database when a failure occurs. Transactions which changed the primary database in the last seconds before a failure might possibly not have made it to the standby and would therefore be lost. While this situation would probably be unacceptable to many types of workloads, there are certainly quite a few applications in which the potential loss of a few transactions in the last seconds before a failure is considered an acceptable risk in light of the cost involved in protecting them. We will discuss this further in the Conclusions from Data Guard results section, below.

Performance impact of accessing standby database (Active Data Guard) including performing backups

We also conducted tests to determine the performance impact (on the primary system’s performance) of various activities on the standby server. Specifically, we found that there was only the slightest difference (less than 3%, which is practically “noise”) between the following:

No transport (standby is shut down)

Standby is in mounted mode

Standby is in open mode

With redo transport and no apply

22

With redo transport and redo apply

With or without DG Broker configured/disabled/enabled.

With redo transport completely interrupted by bringing down dedicated network

Database backup running on standby

Figure 12. Performance of a Data Accelerator server with Data Guard protection to a standby server, comparing asynchronous to synchronous redo transport modes.

Conclusions from Data Guard results

Our experience with, and the results of, the above tests allowed us to conclude that Data Guard does indeed allow the construction of a highly available, no-single-failure-point Data Accelerator configuration that will reliably protect the contents of an Oracle database against software or hardware failures that incapacitate the database on, or prevent client access to, the main database server.

The nature of Data Guard is such that total protection (zero transaction loss) is a trade-off against performance, however. When every single transaction, even the ones that might have occurred in the last seconds before a failure, must be guaranteed to be recorded in the database, synchronous redo transport is the only choice, for which the performance of the application will pay a price. How much of a price will depend on the application. Typical OLTP applications, with a liberal amount of database-changing write/update transactions, will see a moderate performance penalty, but workloads that are light on writes and predominantly reads will be much less affected.

On the other hand, Data Guard with asynchronous redo transport provides protection for all but the last few transactions before a failure, and it does so without much effect on the performance of the application. Many workloads will be considered adequately protected even if those last few transactions before a failure are subject to loss, for a variety of possible reasons:

The application itself provides some level of additional protection, or the nature of the write/update transactions is that they are easily repeatable in the event of a failure (data loads from an external source that is retained or archived external to the database server, for example, such as a user uploading photos to a photo-sharing site, where users are expected/required to retain copies of their photos even after they’re uploaded).

The economic impact of individual transactions is minute. Examples of this type of situation might be a company’s internal (non-customer-facing) applications such as asset administration, or a social-networking site (such as a photo-sharing site or blogging site). In the latter case, it is the service of ad impressions which drives revenue; the very rare loss of a few transactions immediately after they are executed is not only likely to be immediately noticed by the user (“hey, that last blog entry didn’t get posted”) and therefore re-executed (“let me just try that again”), but

23

no revenue would be lost because of it. Pretty much the same argument could be made for the former case, as a company worker would be likely to notice the effects of a failover and would be likely to verify (and re-execute, if necessary) their most recent transactions.

The cost of protecting the very small number of transactions that occasionally might be lost in a relatively rare failure is much too high compared to the value of the transactions potentially to be lost (that is, the cost of mitigating the risk is far greater than the risk itself).

Summary

The HP Data Accelerator Solution is a viable, highly performant concept that has been proven both in the lab and by real customers with real OLTP workloads. For databases which can be completely – or even partially – accommodated within a (redundantly mirrored) storage domain of up to about 6.5TB (or even larger when generation-2 IO Accelerator cards are employed), storing the database (or frequently accessed parts of it) on internal flash storage yields significant performance advantages over traditional SAN-based external mechanical storage devices, yet the resulting server architecture is elegantly simple, small in footprint, and very value-oriented. And, using a data-replication approach, it can be configured in a redundant mode that eliminates single points of failure, as demonstrated by our work with Oracle Data Guard.

While Oracle claims that Exadata is the solution for every problem, HP, with the Data Accelerator solution, offers a very high performance, very reasonably priced, very flexible targeted solution for OLTP applications on multi-terabyte or smaller databases.

Appendix 1: HP PCIe IO Accelerators

The following table lists the HP PCIe IO Accelerator modules supported by HP as of May 2012, including the “generation 2” IO Accelerator products which were announced in mid-May of 2012.

The testing described in this paper was conducted before the release of the generation-2 modules; all capacities referenced throughout this document are based on the use of generation-1 modules. Clearly, the use of the new generation-2 IO Accelerator modules can greatly increase the overall storage-capacity of a Data Accelerator server.

Nominal capacity*

flash type Part number Generation

Card size

min driver version slot

aux power needed**

160 GB SLC 600278-B21 1 half 2.x x4 N

320 GB MLC 600279-B21 1 half 2.x x4 N

320 GB SLC 600281-B21 1 full 2.x x4 N

365 GB MLC 673642-B21 2 half 3.x x4 N

640 GB MLC 600282-B21 1 full 2.x x4 N

785 GB MLC 673644-B21 2 half 3.x x4 N

1205 GB MLC 673646-B21 2 half 3.x x4 N

1280 GB MLC 641027-B21 1 full 2.x x4 N

2410 GB MLC 673648-B21 2 full 3.x x8 Y

* 1GB=109 bytes

** auxiliary power requires extra power from card slot (enable driver option) or from extra power cable internal to server

24

Appendix 2: Oracle Advanced Compression

Many customers are looking for solutions that provide a means for reducing the size of their rapidly growing databases without negatively affecting their end user performance. Oracle 11gR2 offers integrated database compression to address this requirement.

We often think of compression as being a trade-off between performance and storage: compression reduces the amount of storage required, but the overhead of compressing and decompressing makes things slower. However, while there is always some CPU overhead involved in compression the effect on table scan I/O can be favorable, since if a table is reduced in size it will require fewer I/O operations to read it.

Prior to Oracle Database 11g, table compression could only be achieved when the table was created, rebuilt or when using direct load operations. However, in 11gR2, the Advanced Compression option allows data to be compressed when manipulated by standard DML (Data Manipulation Language). The data compression feature in Oracle 11gR2 Enterprise Edition reduces the size of tables and indexes while providing full row-level locking for updates. There are two types of compression.

1. Row compression enables storing fixed-length data types in a variable-length storage format.

2. Page compression is a superset of row compression. It minimizes the storage of redundant data on the page by storing commonly-occurring byte patterns on the page once, and then referencing these values for respective columns.

Oracle’s Advanced Compression offers three distinct levels: low, medium, and high. HP and Oracle recommend using the “low” method for best overall OLTP workload performance when data compression is desired. Oracle has provided a compression algorithm specifically designed to work with OLTP type workloads. This recommendation is based upon tests performed by HP and Oracle on industry-standard x86 hardware (see the reference at the end of this document, “For more information”). Users may wish to evaluate other compression options to determine if the “medium” or “high” setting offers superior performance for their specific workload.

As one would expect, Oracle Advanced Data Compression was very effective at reducing disk utilization of traditional storage arrays. The result was improved data transfer from storage into the database instance for processing and reduced I/O wait overhead. Testing conducted by HP’s Oracle Alliances team showed that Advanced Data Compression scaled linearly across the full range of CPU cores on HP 8-socket servers. All indications are that data compression will have the same positive effect on the performance of OLTP solutions implemented using the Data Accelerator approach, while at the same time expanding the effective capacity of the IO Accelerator storage.

Information about the cost of Oracle Advanced Compression can be found at Oracle’s online store at oracle.com.

http://www.oracle.com/

25

For more information

HP solutions for Oracle: hp.com/go/oracle

HP PCIe IO Accelerator web page: hp.com/go/ioaccelerator

HP High Performance Database Solution white paper: “HP recommended configurations for online transaction processing: ProLiant DL980 G7, VMA-series Memory Array (VMA) and Oracle 11gR2 database”: http://h20195.www2.hp.com/V2/GetDocument.aspx?docname=4AA3-2367ENW

HP Scalable Warehouse Solution white paper: “HP reference configuration for Scalable Warehouse Solution for Oracle: HP DL980 G7 and P2000 G3 MSA”: http://h20195.www2.hp.com/V2/GetDocument.aspx?docname=4AA3-8244ENW

Oracle Advanced Compression HP white paper: “Oracle Database Compression with HP DL785 and EVA: a scalability study”: http://h20195.www2.hp.com/V2/GetDocument.aspx?docname=4AA1-0234ENW

Oracle’s Data Guard documentation, “Oracle Data Guard Concepts and Administration”, specifically section 1, “Introduction to Oracle Data Guard”: http://docs.oracle.com/cd/E11882_01/server.112/e25608/concepts.htm

Oracle’s “Client Failover Best Practices for Highly Available Oracle Databases” describes failover technologies (FAN, TAF, and FCF) available to applications which connect to Data Guard-protected databases: oracle.com/technetwork/database/features/availability/maa-wp-11gr2-client-failover-173305.pdf

To help us improve our documents, please provide feedback at hp.com/solutions/feedback.

Get connected hp.com/go/getconnected

Current HP driver, support, and security alerts delivered directly to your desktop

© Copyright 2011, 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.

Oracle is a registered trademark of Oracle and/or its affiliates. Intel is a trademark of Intel Corporation in the U.S. and other countries.

4AA3-7748ENW, Created October 2011, Updated August 2012, Rev. 3

http://www.hp.com/go/oracle

http://www.hp.com/go/ioaccelerator

http://h20195.www2.hp.com/V2/GetDocument.aspx?docname=4AA3-2367ENW


http://www.google.com/url?q=http://h20195.www2.hp.com/v2/GetPDF.aspx/4AA1-0234ENW.pdf&sa=U&ei=N7GcTqHOGbL9iQKChf3MCQ&ved=0CBIQFjAA&usg=AFQjCNFzEDOYzn0LcTNDPcRHz9TcjW64tg

http://www.google.com/url?q=http://h20195.www2.hp.com/v2/GetPDF.aspx/4AA1-0234ENW.pdf&sa=U&ei=N7GcTqHOGbL9iQKChf3MCQ&ved=0CBIQFjAA&usg=AFQjCNFzEDOYzn0LcTNDPcRHz9TcjW64tg


http://docs.oracle.com/cd/E11882_01/server.112/e25608/concepts.htm

http://www.oracle.com/technetwork/database/features/availability/maa-wp-11gr2-client-failover-173305.pdf

http://www.hp.com/solutions/feedback

http://www.hp.com/go/getconnected

hp data white paper

Documents