consolidating oltp workloads on dell™ poweredge™ 11g
TRANSCRIPT
Consolidating OLTP Workloads on Dell™ PowerEdge™ 11G Servers
A Dell Technical White Paper
Database Solutions Engineering By Zafar Mahmood
Dell Product Group
July 2009
Consolidating OLTP Workloads on Dell PowerEdge 11G Servers
2
THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND.
© 2009 Dell Inc. All rights reserved. Reproduction in any manner whatsoever without the express written permission of Dell, Inc. is strictly forbidden. For more information, contact Dell.
Dell, the DELL logo, and are trademarks of Dell Inc. Intel and Core i7 are registered trademarks of Intel Corporation in the U.S. and other countries. EMC is the registered trademark of EMC Corporation. Oracle is a registered trademark of Oracle Corporation. Quest Software and Benchmark Factory are registered trademarks of Quest Software, Inc. Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Dell disclaims proprietary interest in the marks and names of others.
Consolidating OLTP Workloads on Dell PowerEdge 11G Servers
3
EXECUTIVE SUMMARY
The Dell™ enterprise portfolio is evolving to incorporate better‐performing, more energy‐efficient, and more highly‐available products. With the introduction of Dell’s latest server product line, customers have an opportunity to improve their total cost of ownership by consolidating distributed legacy environments. This is the second white paper discussing server consolidation on Dell 11G product line. In the previous white paper DSS workload and its consolidation on Dell PowerEdge™ 11G servers was discussed: http://www.dell.com/downloads/global/solutions/database_11g_consolidate.pdf?c=ec&l=en&s=gen
This white paper focuses on Online Transaction Processing (OLTP) workloads and consolidation.
Dell strives to simplify IT infrastructure by consolidating legacy production environments to reduce data center complexity while still meeting customers’ needs. The tools and procedures described in this white paper can help administrators test, compare, validate, and implement the latest hardware and database solution bundles. Dell established these procedures and guidelines based on lab experiments and database workload simulations performed by the Dell Database Solutions Engineering team. Using the tools and procedures described in this document, customers may not only select the appropriate database solution hardware and software stack, but also optimize the solution to help optimize total cost of ownership according to the database workloads they choose to run. The intended audience of this white paper includes database administrators, IT managers, and system consultants.
Consolidating OLTP Workloads on Dell PowerEdge 11G Servers
4
Table of Contents
EXECUTIVE SUMMARY ....................................................................................................................................... 3
TABLE OF CONTENTS .......................................................................................................................................... 4
INTRODUCTION.................................................................................................................................................. 5
TEST METHODOLOGY ......................................................................................................................................... 6
TEST CONFIGURATION ...................................................................................................................................... 17
RESULTS ............................................................................................................................................................ 18
CONSOLIDATION FACTOR .............................................................................................................................................. 18
SUMMARY ........................................................................................................................................................ 21
REFERENCES ...................................................................................................................................................... 21
Consolidating OLTP Workloads on Dell PowerEdge 11G Servers
5
INTRODUCTION
This white paper concentrates on server consolidation for Oracle database running OLTP workloads on legacy platforms. An enterprise database system may be running DSS, OLTP or a mixed workload. The OLTP workloads typically send thousands of small I/O requests from the database servers to the backend storage subsystem. The large amount of I/O requests characteristic of the OLTP workload, means that the backend storage subsystem must have sufficient number of disks to handle the I/O requests coming from the hosts. A typical 15K RPM disk can service around 180‐200 IO requests per second (IOPS).
OLTP database systems typically service hundreds or thousands of concurrent users. An example of this type of system could be of a travel reservation system with large number of customers and agents performing online travel reservations, or checking available flights or flight schedules. The OLTP database transactions performed by these thousands of concurrent users get translated into tens of thousands of I/O requests to the backend storage subsystem depending on the nature of these OLTP transactions. For example, an Oracle AWR report reveals that a typical TPCC transaction results in approximately 70 Physical database I/O requests if the database size is around 300 GB and less than 1% of the data is in Oracle System Global Area (SGA) cache. The database host CPUs may only be efficiently utilized if the backend storage subsystem is configured with a sufficient number of disks to handle the large number of I/O requests. Otherwise the database host CPUs exhibit large IOWAIT times instead of doing useful work. In this scenario, consolidating, upgrading or migrating to a faster database server, or scaling the number of CPUs or memory does not help. The correct approach is to appropriately scale the backend disk subsystem to handle the I/O requests, and then move to the next stage of CPU and memory sizing as we will discuss later in this white paper.
Server consolidation can be defined as maximizing the efficiency of computer server resources, thereby minimizing the associated power/cooling, rack footprint and licensing costs. It essentially solves a fundamental problem—called server sprawl—in which multiple, under‐utilized servers take up more space and consume more power resources than the workload requirement indicates.
Consider a two‐node Oracle RAC database hosted on two eighth‐generation (8G) PowerEdge 2850 dual‐socket, single‐core or dual‐core servers running Oracle 10g Release 2. Dell recently announced the availability of its 11G server product line equipped with a chipset that is designed to support the Intel® Xeon 5500 series processors, QuickPath Interconnect, DDR3 memory technology, and PCI Express Generation 2. The natural replacement of eighth‐generation 2U Dell servers is 2U Dell R710 servers that support dual‐socket, quad‐core processors. The R710 also supports two different types of energy efficient CPUs, and it is designed with a highly efficient overall architecture.
The goal of this study is to determine if a multi‐node Oracle RAC cluster can be replaced with a cluster consisting of fewer PowerEdge 11G nodes, and still process the OLTP workload faster with less power consumption and lower Oracle RAC licensing fees. The savings in RAC licensing fees may be utilized to efficiently configure and scale the backend storage system with enough I/O modules and disks to remove the I/O bottlenecks that are almost always an issue in an OLTP environment. Also, based on the results of this study, one may determine how many distributed standalone legacy environments running OLTP workloads can be consolidated on a single Oracle RAC solution running on Dell R710 servers.
Consolidating OLTP Workloads on Dell PowerEdge 11G Servers
6
Figure 1: System Architecture
TEST METHODOLOGY
Dell’s solution engineers used Quest Software® Benchmark Factory® TPCC workload to test the legacy system, and then reran a similar workload on a test environment running the PowerEdge 11G servers. The TPCC workload provided by the Benchmark Factory schema simulates an order entry system consisting of multiple warehouses, with data populated in tables with rows according to the scale factor defined during table creation. The legacy database was configured with a scale factor of 3000 that created 900 Million and 300 Million rows in New Order and Stock tables respectively. The total database size that resulted with this scale factor was around 290 GB. Once populated, we started with 200 concurrent users and increased the user load to 1000 in increments of 200 users randomly running transactions against the legacy database while making sure that the average query response time always stays below 2 seconds. Average query response time of an OLTP database environment may be described as the average time it takes for an OLTP transaction to complete and deliver the results of the transaction to the end user initiating that transaction; this response time metric was chosen as the basis for our Service Level Agreement that we chose to maintain throughout our testing. The average query response time is the most important factor when it comes to fulfilling end user requirements, and it establishes the performance criteria for an OLTP database. The backend storage subsystem consisting of a Dell/EMC® CX4‐960 storage array was configured with 10 15K RPM 136GB disks in RAID 10 configuration.
The test methodology used is as follows:
1. To simulate the legacy production environment, we selected a two‐node Oracle 10g R2 RAC cluster running on two PowerEdge 2850 single‐core, dual‐socket 3.4 GHz CPU machines connected to a CX4‐960 that had a 400 GB LUN for DATA and a 100 GB LUN for the database SYSTEM ASM disk groups. We also created a 2 GB LUN, and created partitions to host the voting and Oracle Cluster Registry (OCR).
Consolidating OLTP Workloads on Dell PowerEdge 11G Servers
7
2. We applied the Oracle 10g R2 patch set 4 (10.2.0.4) to the legacy server simulated production environment.
3. We loaded TPCC schema test data with a scale factor of 3000 into the legacy server simulated production environment.
4. After data population, we used the Oracle Data Pump to export data at the schema level to avoid a data reload after each test iteration.
expdp system/oracle@racdb1 SCHEMAS=quest CONTENT=all directory=export;
5. We ran our first test iteration on the legacy RAC environment starting with a user load of 200. The user load was increased in 200 user increments while constantly monitoring the average query response time. Once the average query response time reached above 2 seconds, the test was stopped.
6. In an OLTP environment, once the back‐end spindles are saturated beyond 200 IOPS, they start exhibiting large I/O latency which results in large IOWAIT at the host CPU and a large average query response time. Once our legacy environment reached an average query response time of more than 2 seconds we decided to double the number of spindles for our DATA ASM disk group, rebalance TPCC data across additional disks and perform another iteration of tests to see if we can lower the average query response below 2 seconds with a user load higher than 1000.
7. We took an Automatic Workload Repository (AWR) snapshot of database activity in the legacy production environment while running the peak user load for later analysis.
8. Using the Quest Bench Mark Factory, we populated the Oracle 11g single node environment running on the test environment with the same TPCC scale factor and back‐end disk configuration as we did on the legacy environment. We used a PowerEdge R710 server to simulate our test environment.
9. Again, the same user load was run on the test environment to determine the transactions per second and the average query response time of the 11G test environment. The average response time was then compared against the legacy production environment. Again, with the base configuration consisting of 10 disks for DATA disk group, the average query response time crossed our predefined SLA of 2 seconds at 1000 user load.
10. Similar to step 6, the 11G test environment back‐end disks were doubled and incorporated into the existing DATA ASM disk group. Another iteration of tests was performed to determine if we can support a higher user load while keeping the average query response time below 2 seconds.
11. We decided that if the memory or the data disks of the test environment 11G database server became a bottleneck, they will be scaled further and additional user load would be applied until the 11G server CPUs become the bottleneck. For this purpose, the 11G test environment was tested with additional memory configurations of 18 GB and 36 GB to support the additional user load while staying below the 2 seconds response time. Similarly, the back‐end spindles were again scaled with an additional 10 spindles to make a total of 30 data disks to bring down the response time to less than or equal to 2 seconds if disks became the bottleneck.
Figure 2 and 3 below shows a comparison in terms of transactions per second and average query response time between the legacy production and the 11G test environments using the base configuration of 10 disks RAID 10 ASM disk group.
Consolidating OLTP Workloads on Dell PowerEdge 11G Servers
8
Figure 2: Base configuration TPS comparison between legacy and 11G environment
Figure 3: Base configuration Average Query Response Time comparison between legacy and 11G environment
In Figure 2 and 3, we see that the both the legacy and the 11G environments exhibit similar performance in terms of transactions per second and the average query response time. Do not be misled by these results. Upon further analysis of the CPU utilization in terms of USER time and IOWAIT times, it was revealed that the legacy production environment was exhibiting higher USER time to IOWAIT time ratio as compared to the 11G test environment as shown in Figure 4 and 5.
0
10
20
30
40
50
200 400 600 800 1000
TPS
User Load
TPS legacy 10 disks
TPS R710 10 disks
0
0.5
1
1.5
2
2.5
200 400 600 800 1000
AQRT
(
sec)
User Load
Avg Response Time legacy 10 disks
Avg Response Time R710 10 disks
Consolidating OLTP Workloads on Dell PowerEdge 11G Servers
9
Figure 4: Base configuration CPU behavior for the legacy environment
Figure 5: Base configuration CPU behavior for the 11G test environment
These charts reveal something very interesting: the 11G test environment having the faster CPU and overall more efficient design was able to handle the OLTP workload much faster as compared to the legacy production environment, and exhibited a low USER to IOWAIT time ratio as compared to the legacy production environment (1.7 for legacy vs. 0.24 for 11G at 1000 user load). Since both environments had an identical storage configuration, the reason for higher IOWAIT and lower USER CPU time on the 11G test environment was due to the faster processing power available on that environment as compared to the legacy production environment. Overall, the charts 4 and 5 reveal that in order to take advantage of
0
10
20
30
40
50
60
70
80
90
100
1 91 181
271
361
451
541
631
721
811
901
991
1081
1171
1261
1351
1441
1531
1621
1711
CPU
Utilization
Time
%IOWAIT
%USER
0
10
2030
4050
60
7080
90100
1 84 167
250
333
416
499
582
665
748
831
914
997
1080
1163
1246
1329
1412
1495
1578
CPU
Utilization
Time
%IOWAIT
%USER
Consolidating OLTP Workloads on Dell PowerEdge 11G Servers
10
the faster processing power of the 11G test environment, we need to remove the I/O bottleneck and reduce the IOWAIT time.
This revelation led to further tests and analysis, and we decided to verify our conclusions by trying to alleviate some of the I/O bottlenecks from both our legacy production and the 11G test environments by doubling the spindle count for our DATA disk group. This methodology to ascertain the performance delta between two environments running OLTP workloads can provide reliable results without making huge investments into storage which may be required to remove IO bottlenecks to study database host performance. Figure 6 and 7 below show the test results of on both environments after doubling the spindle count.
Figure 6: 20 disk configuration TPS comparison between legacy and 11G environment
Figure 7: 20 disk configuration Average Query Response Time comparison between legacy and 11G environment
0
10
20
30
40
50
60
200 400 600 800 1000
TPS
User load
TPS legacy 20 disks
TPS R710 20 disks
00.20.40.60.81
1.21.41.61.82
200 400 600 800 1000
AQRT
(
sec)
User load
Avg Response Time legacy 20 disks
Avg Response Time R710 20 disks
Consolidating OLTP Workloads on Dell PowerEdge 11G Servers
11
As shown in the previous figures, the legacy production environment showed marginal improvement in its average query response time even after doubling the spindle count. The performance delta was only (2.11‐1.83)/1.83=15.3%. On the other hand, the 11G test environment exhibited a (2.059‐0.486)/0.486=323% performance delta. Also note that at a 1000 user load, even after doubling the spindle count we could not add additional user load beyond 1000 users without violating our SLA of 2 seconds AQRT (Average Query Response Time). While at the 1000 user load, the 11G test environment exhibited only 0.486 seconds of AQRT. Apparently, the legacy test environment cannot be scaled any further without adding additional processing power only possible by adding additional RAC nodes to the cluster. Although the addition of the RAC nodes to the legacy environment may fix the SLA violation, the cost implications would be enormous in terms of additional systems, SAN components, power, and RAC licenses.
On the other hand, figure 6 and 7 reveal that the 11G test environment exhibited only 0.486 seconds of average query response time and it was only logical that if we increase the user load to determine how many additional users could be sustained on our single node 11G test server populated with a single quad core processor without violating our SLA of 2 seconds query response time. So far, our 11G server has been configured with 12 GB of RAM with 2 banks of all three 2 GB DDR3 memory channels populated to achieve the optimal memory configuration for a single socket. At a 1000 user load, the server that was configured with 2.5 GB of Oracle target memory was almost running out of RAM to make additional user connections. So, we decided to increase the RAM to 18 GB by populating all three banks of three channels with 2 GB DDR3 RDIMMS. During this iteration, we increased the user load on the 11G test environment all the way to 1600 users while monitoring our established SLA of less than or equal to 2 seconds of AQRT to ensure no violations of the SLA. The results are in figure 8 and 9 below.
Figure 8: 12GB and 18GB configuration TPS comparison on 11G environment (20 disks)
0
10
20
30
40
50
60
70
80
90
200 400 600 800 1000 1200 1400 1600
TPS
User Load
TPS R710 12GB RAM
TPS R710 18GB RAM
Consolidating OLTP Workloads on Dell PowerEdge 11G Servers
12
Figure 9: 12 GB and 18GB configuration AQRT comparison on 11G environment (20 disks)
The above figures reveal that after increasing the RAM on our 11G test environment not only were we able to sustain a higher user load, but we also improved our average query response at 1000 users by (0.486‐0.383)/0.383=26.8% while the TPS remained the same until reaching a 1000 user load. Also at 1600 user load, our average query response time reached 1.866, which is almost equal to the average query response time of our legacy production environment at 1000 users, 20 spindles and 16 GB of RAM. We decided not to increase further load on the 11G test environment at this point since the next increment of 1800 users load violated our SLA of 2 seconds.
To summarize our results so far, the test environment running Oracle 11g Release 1 with one Intel® Xeon® X5570 processor was able to handle 600 more users to run the workload, as compared to the legacy two‐node cluster running on PowerEdge eighth‐generation 2850 servers while maintain our SLA of 2 second average query response time. Also the 11G test environment exhibited (76.46‐47.6)/47.6= 60.6% increase in the in the resulting TPS. We can look at the performance gain from two different dimensions – average query response time improvement at the same user load or the TPS improvement resulting from being able to increase the user load while maintain the SLA. From the perspective of average query response time improvement at 1000 user load, we see a (2.11‐0.383)/0.383=450% performance gain. From the perspective of TPS improvement, the 11G environment exhibits a 60% performance gain while sustaining additional user load and maintaining the SLA. Figures 10 and 11 below display the performance gain in both TPS and average query response time.
00.20.40.60.81
1.21.41.61.82
200 400 600 800 1000 1200 1400 1600
AQRT
(
sec)
User Load
Avg Response Time R710 12GB RAM
Avg Response Time R710 18 GB RAM
Consolidating OLTP Workloads on Dell PowerEdge 11G Servers
13
Figure 10: AQRT comparison between legacy(8GB+8GB) and 11G(18GB) environment (20 disks)
Figure 11: TPS comparison between legacy(8GB+8GB) and 11G(18GB) environment (20 disks)
It is worth noting that our legacy RAC environment exhibited only a marginal decrease in IOWAIT and increase in USER CPU time as a result of doubling the number of spindles hosting the DATA disk group. On the other hand, our 11G test environment exhibited a (67.1‐48.33)/48.33=38.8% decrease in the IOWAIT time and a (21.41‐16.34)/16.34=31% increase in USER time that shows that the system is spending more time performing useful work as we scale the backend storage subsystem with additional disks. The comparison is shown in figure 12 and 13 below.
0
0.5
1
1.5
2
200 400 600 800 1000 1200 1400 1600
AQRT
(
sec)
User Load
Avg Response Time legacy 20 disks
Avg Response Time R710 20 disks
0
20
40
60
80
100
200 400 600 800 1000 1200 1400 1600
TPS
User Load
TPS legacy 20 disks
TPS R710 20 disks
Consolida
14
Fi
Fi
Siaf30usfrth
ating OLTP W
igure 12: CPU ti
igure 13: CPU ti
nce we encoufter scaling the0 for the 11G tser load, TPS arom 18GB to 36he diagrams 14
0
10
20
30
40
50
60
CPU
Utilization
0
10
20
30
40
50
60
70
CPU
Utilization
Workloads on
ime comparison
ime comparison
ntered a decree disks for our test environmend AQRT. In or6GB, scaled the4 and 15 below
average iowait
average iowait
Dell PowerEd
n for legacy env
n for 11G enviro
ease in the IOW11G test enviroent to figure ourder to suppore backend diskw:
average user time
average user time
avsyt
dge 11G Serv
vironment after
onment after sca
WAIT time and onment, we deut if there is adrt additional usks to 30 and re
average system time
verage ystem time
vers
scaling disks
aling disks
a proportionaecided to incredditional roomser load, we alsran the test ite
Legacy:users
Legacy:users
R710: 10 Di
R710: 20 Di
l increase in thease the spindl for growth in so scaled our meration. The re
: 10 Disk 1000
: 20 Disks 1000
sk 1000 users
sks 1000 users
:
he CPU USER tie count from 2terms of additmemory furtheesults are show
0
s
me 20 to tional er wn in
Consolidating OLTP Workloads on Dell PowerEdge 11G Servers
15
Figure 14: TPS comparison for 11G environment after scaling disks from 20 to 30
Figure 15: AQRT comparison for 11G environment after scaling disks from 20 to 30
The figure 16 reveals that the addition of spindles further improved the AQRT, and at 1600 users, we saw an improvement of (1.866‐.563)/.563=231%. The next logical step would be to further increase the user load on the 11G test environment, and determine at what user load our SLA of 2 seconds response time is violated. Figures 16 and 17 below show both the maximum user load that can be sustained on our 11G test environment without violating the SLA, as well as the performance gain after scaling the RAM from 18 GB to 36 GB:
0
10
20
30
40
50
60
70
80
90
200 400 600 800 1000 1200 1400 1600
TPS
User Load
TPS R710 20 disks
TPS R710 30 disks
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
200 400 600 800 1000 1200 1400 1600
AQRT
(
sec)
User Load
Avg Response Time R710 20 disks
Avg Response Time R710 30 disks
Consolidating OLTP Workloads on Dell PowerEdge 11G Servers
16
Figure 16: 18 GB and 36 GB configuration TPS comparison on 11G environment (20 disks)
Figure 17: 18 GB and 36 GB configuration AQRT comparison on 11G environment (20 disks)
The above figures reveal that with 30 DATA disks and 36 GB of RAM, we are now able to scale the user load to around 2500 users without violating our SLA as well as having a (2500‐1000)/1000= 150% improvement from the base legacy environment in terms of user load.
0
20
40
60
80
100
120
140
200 600 1000 1400 1800 2200 2600 3000
TPS
User Load
TPS R710 30 disks 18GB RAM
TPS R710 30 disks 36GB RAM
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
200 600 1000 1400 1800 2200 2600 3000
AQRT
(
sec)
User Load
Avg Response Time R710 30 disks 18GB RAM
Avg Response Time R710 30 disks 36GB RAM
Consolidating OLTP Workloads on Dell PowerEdge 11G Servers
17
TEST CONFIGURATION
Table 1 describes the complete software and hardware configuration that was used throughout testing on both the simulated legacy production environment and the 11G test environment.
Table 1: Oracle 11g Database Replay Test Configuration
Component Legacy Production Environment Dell PowerEdge 11G Test Environment
Systems Two PowerEdge 2850 2U servers One PowerEdge R710 2U server Processors Two Intel Xeon CPU 3.40GHz single core per
node Cache: L2=1M per CPU
Test1: One Intel Xeon X5570 2.93GHz quad‐core CPU Cache: L2=4x256K L3=8M
Memory 8GB DDR2 per node (16GB total for 2 nodes) Iteration1: 12GB DDR3 Iteration2: 18GB DDR3 Iteration3: 36GB DDR3
Internal disks Two 73GB 3.5” SCSI R1 Two 73GB 2.5” SAS R1 Network Two Intel 82544EI Gigabit Ethernet Four Broadcom® NetXtreme II BCM5709
Gigabit Ethernet External storage Dell/EMC CX4‐960 with:
10 x 146GB Fibre Channel disks for DATABASE disk group Iteration1: 10 x 146GB Fibre Channel disks for DATA disk group Iteration2: 20 X 146GB Fibre Channel disks for DATA disk group
Dell/EMC CX4‐960 with 10 x 146GB Fibre Channel disks for DATABASE disk group Iteration1: 10 x 146GB Fibre Channel disks for DATA disk group Iteration2: 20 X 146GB Fibre Channel disks for DATA disk group Iteration3: 30 X 146GB Fibre Channel disks for DATA disk group
HBA Two QLE2460 per node Two QLE 2460OS Enterprise Linux® 4.6 Enterprise Linux 5.2 Oracle software • Oracle 10g R2 10.2.0.4
• File System: ASM • Disk groups: DATABASE, DATA • sga_target = 1600M • pga_target = 800M
• Oracle 11g R1 11.1.0.6 • File System: ASM • Disk groups: DATABASE, DATA • memory_target = 2400M
Workload • Quest Benchmark Factory TPCC‐like workload
• Scale factor: 3000 • User connections: 200‐1000
• Quest Benchmark Factory TPCC‐like workload
• Scale factor: 3000 • User connections: • 12 GB RAM: 200‐1000 • 18 GB RAM: 200‐1600 • 36 GB RAM: 200‐3000
Consolidating OLTP Workloads on Dell PowerEdge 11G Servers
18
RESULTS
NOTE: The results we have provided are intended only for comparison of the two environments consisting of specific configurations in a lab environment. The results do not portray the maximum capabilities of any system, database software, or storage.
The following test results address questions regarding the limiting performance factors of the legacy environment running Oracle RAC, the capabilities to scale the 11G test environment, and the resulting consolidation factor of the 11G test environment. The goals of the test were to determine:
• The maximum performance capabilities of our legacy production environment, and to determine whether it can be efficiently scaled by adding additional resources.
• A baseline comparison of AQRT between the legacy environment and the 11G test environment, with a baseline configuration using the same backend disk configuration and the number of CPU cores. The baseline also defines an SLA of a maximum of 2 seconds of AQRT.
• The capabilities of the 11G test environment to scale after adding additional resources to determine the scale factor
• The consolidation factor resulting from migrating to PowerEdge 11G servers from PowerEdge 8G single‐core servers running OLTP workloads
Table 2 provided below summarizes the test results as discussed in the test methodology section of this white paper. From the results table we can see that our legacy production environment could not scale any further after adding additional disks. This is evident from the fact the AQRT, IOWAIT time did not improve significantly even after doubling the spindle count. The only way to further improve performance would be to add additional RAC nodes, as well as adding additional disks to scale the legacy environment which could be cost prohibitive given the added license costs associated with Oracle RAC.
On the other hand, we can see that the 11G test environment consisting of only one node and 12 GB of RAM performed slightly better than the two legacy RAC nodes but also exhibited huge scalability potential. As we can see from the data provided in table 2, the 11G test environment was repeatedly scaled with additional disks and memory and each time it could sustain either additional user load, or improved the TPS and AQRT at the same user load depending on the usage model that the customer adopts. Finally, during the test iteration configured with 30 DATA disks, 36 GB of RAM and 2600 user load, the 11G test environment started to exhibit similar bottlenecks that the legacy RAC environment was showing. The average CPU utilization started to reach above 97% with a very low IDLE time. At this point, instead of further scaling the DATA disks or memory, it will be more beneficial to add an additional quad‐core CPU to increase the processing power. Once additional processing power has been added, you can continue to take advantage of the scalable architecture provided by the 11G environment by adding additional resources as needed.
Consolidation Factor
After a cursory analysis of the results, one could reach the conclusion that our 11G test environment consisting of a single server running Oracle 11g database was able to handle the OLTP workload of a two‐server legacy RAC environment while maintaining our SLA of 2 seconds AQRT. But, one must not ignore the fact that our 11G R710 server was only populated with 1 quad core CPU. From that perspective, we can extrapolate that one single 11G R710 server populated with two quad core CPUs would be able to consolidate the workload of a four‐node legacy RAC environment, provided that both environments are configured with adequate disk and memory resources so that they do not become the bottleneck.
Consolidating OLTP Workloads on Dell PowerEdge 11G Servers
19
Table 2: Results Summary
Our results also revealed that although the base comparison between both environments showed almost identical performance in terms of AQRT, the legacy RAC environment could not be scaled any further even after removing the I/O bottleneck. However, the 11G test environment still had a lot of room for growth in CPU Idle time. This behavior is depicted in figure 18 below.
System DATA Disks
RAM User Load
Average CPU Utilization
User System IOWAIT IDLE AQRT
2850 Legacy RAC node1+node2
10 16 GB 1000 99.25 56.8 9.97 32.48 0.75 2.11
20 16 GB 1000 99.24 57.37 11.27 30.62 0.76 1.83
R710 (X5570) 10 12 GB 1000 85.87 16.34 2.43 67.10 14.13 2.05
20 12 GB 1000 78.59 21.41 2.84 48.33 27.42 0.486
20 18 GB 1000 66.25 18.02 2.32 45.91 33.75 0.383
30 18 GB 1000 63.74 21.23 2.57 39.94 36.26 0.287
30 36 GB 1000 48.24 13.13 2.21 32.9 51.76 0.159
20 18 GB 1600 88.29 22.81 3.82 61.66 11.71 1.866
30 18 GB 1600 84.17 29.82 4.07 50.28 15.83 0.563
30 36 GB 1600 74.65 21.09 3.83 49.73 25.45 0.372
30 36 GB 2400 94.58 33.83 5.55 55.2 5.42 1.43
30 36 GB 2600 97.29 37.67 8.62 51 2.95 2.481
30 36 GB 3000 97.15 38.67 6.33 52.15 2.85 3.419
Consolidating OLTP Workloads on Dell PowerEdge 11G Servers
20
Figure 18: Adding disk and memory resources: CPU idle time behavior comparison between legacy and 11G environment
For example, in the 20 disk 1000 user test iteration, the legacy RAC environment CPUs only exhibited 0.76% idle time while the 11G RAC environment exhibited a 27.42% CPU idle time while performing (1.83‐0.486)/0.486=276% or 3.76 times faster in terms of AQRT. From this perspective, our 11G test environment should be able to perform the workload of almost 7‐server legacy RAC environment. This translates into a consolidation factor of 7 to 1.
Of course, there are multiple ways in which to analyze these results. Another perspective could be the scalability factor of the 11G environment as compared to the legacy RAC environment. Our legacy RAC environment exhibited only 15.2% AQRT improvement when additional disks were added while the 11G RAC environment showed (2.05‐.486)/.486=321% improvement in AQRT. This shows that with identical storage configurations, the 11G test environment exhibits more than 300% better scalability when adding disk resources.
It is also important to note the fact that Oracle Standard Edition RAC licensing is based on maximum of 4 CPU sockets per RAC irrespective of the number of cores per sockets. From that perspective, you may replace the legacy Enterprise Edition RAC environment with a two node 11G R710 Standard Edition RAC environment with each server populated with 2 sockets, consisting of 4 cores per socket totaling 16 cores resulting in tremendous performance gains, energy savings and a huge future scalability potential. We can also look at the results from the perspective of consolidating multiple legacy distributed standalone OLTP workloads on a single Oracle RAC environment running Dell R710 servers. From this angle, one may conclude that as many as fourteen standalone legacy nodes running OLTP workloads may be consolidated on an Oracle 11G RAC environment consisting of two
0
10
20
30
40
50
60
10 disk 12
GB RA
M (1
000 users)
20 Disk 12
GB RA
M (1
000 users)
20 Disk 18
GB RA
M (1
000 users)
30 Disk 18
GB RA
M (1
000 users)
30 Disk 36
GB RA
M (1
000 users)
20 Disk 18
GB RA
M (1
600 users)
30 Disk 18
GB RA
M (1
600 users)
30 Disk 36
GB RA
M (1
600 users)
30 Disk 36
GB RA
M (2
400 users)
30 Disk 36
GB RA
M (2
600 users)
30 Disk 36
GB RA
M (3
000 users)
CPU
%IDLE
CPU %IDLE time vs Disk and Memory Resources
CPU %IDLE (11G)
CPU %IDLE (Legacy)
Consolidating OLTP Workloads on Dell PowerEdge 11G Servers
21
Dell R710 servers provided that the backend storage and memory is scaled according to the aggregated IOPS and concurrent user connections respectively as discussed in this study.
SUMMARY
Database systems running Online Transaction Processing workloads require the optimal backend storage disk layout and disk quantities to efficiently service a large concurrent user population. The legacy servers running these types of workloads have been suffering architectural limitations of front side bus designs that were a limiting factor when it came to efficiently utilizing the CPU resources. Thus, only a limited number of disks or memory could be serviced by a CPU core in a system based on FSB design. In this white paper we demonstrated that PowerEdge 11G servers equipped with Xeon 5500 Series chipsets for I/O and processor interfacing remove the FSB bottleneck and provide an ideal platform to consolidate legacy database environments. The R710 chipset is designed to support Intel’s Core i7 processor family, QuickPath Interconnect, DDR3 memory technology, and PCI Express Generation 2. This study also demonstrated that 11G servers offer large performance gains when compared to older generation servers with front side‐bus architectures. The database systems running on PowerEdge 11G servers exhibit better scalability when additional resources, such as disks and memory, are added.
Customers running Oracle 9i or 10g RAC environments on legacy servers and storage can follow the guidelines and procedures outlined in this white paper to consolidate power‐hungry RAC nodes into fewer, faster, more energy‐efficient nodes. The resulting legacy RAC node consolidation can also drive down Oracle licensing costs, resulting in savings that you can use to additional backend storage resources to improve average query response time, implement disaster recovery sites and additional RAC test‐bed sites for application development and testing. The reduced number of nodes does not compromise performance when paired with PowerEdge 11G servers. The result is less cluster overhead, simplified management, and positive movement toward an objective of simplifying IT and reducing complexity in data centers.
REFERENCES
• Consolidating DSS Workloads on Dell™ PowerEdge™ 11G Servers Using Oracle® 11g Database Replay
http://www.dell.com/downloads/global/solutions/database_11g_consolidate.pdf?c=us&cs=555&l=en&s=biz
• Oracle® Database Performance Tuning Guide 10g Release 2 (10.2) Part Number B14211‐03