oow 2012 oraclerealapplicationtesting

31
Oracle Real Application Testing Tips from Bank of America Presenter : Keerti Prasad, Vice President, Bank of America

Upload: tssr2001

Post on 25-Nov-2015

57 views

Category:

Documents


5 download

DESCRIPTION

oracle database Oow 2012 OracleRealApplicationTesting

TRANSCRIPT

  • Oracle Real Application Testing Tips from Bank of America

    Presenter : Keerti Prasad, Vice President, Bank of America

  • Disclaimer

    All the views expressed in this presentation are the opinion of the author and do not necessarily reflect the views of Bank of America

  • Keerti Prasad Working in Merrill Lynch/Bank of America for 6+ years on varied applications 18+ years experience with Oracle database and its related technologies

    Speaker Info

  • Agenda

    Overview

    Role of Real Application Testing in Migrations

    Database Replay details

    Issues encountered during Database Replay

    Summary

    Q&A

  • Overview What is Real Application Testing

    Real Application Testing

    Oracle tool to capture actual workload being performed on a database and then to replay it.

    Modules Capture, Preprocess, Instantiate, Replay

    Replay Components - DB Replay and SPA

  • 6

    Overview - Challenges Business often need new features which are available in latest

    software Infrastructure needs to be on supported version / cheaper

    alternative triggering migration Application or infrastructure changes need a rapid deployment for

    break fix or version changes. Performance of the newer systems need to be at par or better Any Changes Need to be tested. Testing methods need to be fast, reliable and consistent We need to simulate actual production data/volumes on the testing

    environment Real Application Testing can achieve all the above

  • 7

    Overview Factors Driving Changes

    Application : Break-fixes, increasing data volumes, new requirements

    OS Platforms : Server patching/upgrades, third party upgrades

    Database : Upgrades, patching, parameter changes

    Storage/Network : Datacenter Moves

  • 8

    Overview - Application/Database Info Database

    Database size is ~5Tb

    Workload can roughly be classified into 2 parts:

    OLTP transactions is done during business hours

    Batch processing is done after business hours with some overlap

    SLA is 3ms and Rate of Redo generated varies between 3-5Gb/hour

    Application -

    Some are vendor supplied and some in house developed

  • 9

    Online Database Reporting Database

    RAC: Online RAC: Reporting

    Production Support

    End Users

    Adhoc Queries

    OGG

    Overview - Current Configuration

    60+

  • Role of Real Application Testing in Migration High Level

    Three major Migrations done within the span of 3years :

    No. Date RAT Testing Change Details Issues

    1 Sep 2010 Not Used Hardware, OS, Database, Application Migration/Upgrade/Release Many

    2 May 2011 Not Used OS, Database, Application Upgrade/Release Many

    3 Aug 2012 Extensive OS Upgrade, Database migration to 3-Node RAC, Application Upgrade/Release None

  • 11

    Role of Real Application Testing in Migration 1

    Migration 1 (Oracle DB migration 10.2.0.4 on Solaris to 11.2.0.1 on Linux): Types of changes:

    Hardware changes OS migration Solaris to Linux Oracle Upgrade 10.2.0.4 to 11.2.0.1 Application Upgrade/Release

    Some problems we encountered after migration: Optimizer related issues Library cache and mutex issues High version counts

    One-off patches were applied and a few parameters were changed

  • 12

    Role of Real Application Testing in Migration 2

    Migration 2 (Oracle Upgrade 11.2.0.1 to 11.2.0.2): Types of changes:

    OS upgrade Oracle Upgrade 11.2.0.1 to 11.2.0.2 Application Upgrade/Release

    Some problems we encountered after migration: Memory resizing operations Issues caused due to cursor sharing set to force High version count issues

    One-off patches were applied and a few parameters were changed Real Application Testing was in a rollout phase

  • 13

    Role of Real Application Testing in Migration 3 Migration 3 (Oracle migration Standalone to 3 Node RAC) :

    Types of changes: Hardware OS upgrade Oracle GI Upgrade 11.2.0.2 to 11.2.0.3 Database migration from standalone to 3 Node RAC without

    version change. Application Upgrade/Release

    Real Application Testing was used extensively. Migration was accomplished with minor issues and performance of the

    system was satisfactory.

  • 14

    Role of Real Application Testing in Migration Problems Exposed by DB Replay

    Problems occurred during migration which were exposed by DB Replay

    11g Child Cursor issue replay was going into a hang situation. High version counts from child cursor was the cause

    Storage issue Replay was showing high i/o in some reports. Storage/hardware changes were made which fixed the issue.

    Authentication issue Oracle database owner on OS is authenticated by third party tool. During replay, clients encountered errors. On investigation, it was found to be caused due to a bug in the authentication tool

  • DB Replay details - Typical Cycle

    Typical DB Replay cycle: Workload is captured on source(production) for one week

    daily between 9am-5pm

    Source database is restored to test system. DB Replay done for first day Batches are run by application team in the night. This is repeated for a week. In the end DB Replay reports, AWR comparison reports and

    batches load timings are analyzed

  • 16

    DB Replay details Impact on Production System during Capture

    Impact during Capture and steps taken to circumvent it -

    Load on server is minimal ~ 5% CPU

    Only OLTP traffic is captured during business hours, since batches can be easily reproduced.

    Captures were done for a typical workload cycle Mon-Fri business hours. They were retaken ONLY when the database went through release.

  • 17

    DB Replay details Capture Tips

    We took a capture for a whole business day, but pre-processing ran into an issue since it creates objects in sysaux tablespace which grew a lot. So, a capture interval was set to create files of ~2Gb. Divergence increase minimal due to split capture.

    To facilitate the above, script was created to start/end capture and restart a new one.

    We needed to capture only the users that performed OLTP which is not easily reproducible, hence user filters were used.

  • 18

    DB Replay Details - Capture Report

    Statistic Name Value % Total ---------------------------------------- ------------- --------- DB time (secs) 48407.45 78.86 Average Active Sessions 13.44 User calls captured 13141577 89.62 User calls captured with Errors 43961 Session logins 726 9.45 Transactions 271724 90.02 ---------------------------------------------------------------- Top Service/Module Captured Service Module % Activity Action % Action -------------- ------------------------ ---------- ------------------ ---------- APPCONN1.US.ML UNNAMED 55.22 UNNAMED 55.22 APPCONN2.US.ML UNNAMED 6.30 UNNAMED 6.30 APPCONN1.US.ML APP_SCHE.PKG_APPS_PFL.sp 5.62 Identify the Maste 5.62 PKG_APPS_PFL.SP_CALCULAT 1.62 Calculate EPI Alig 1.62 SQL*Plus 1.52 UNNAMED 1.52 -------------------------------------------------------------------------------- Workload Filters DB: SOURCE Snaps: 3008-3011 # Filter Name Type Attribute Value --- ------------------------ ------- ------------ -------------------------- 1 APPSCHEM_USER_162 INCLUDE USER APPSCHEM_USER 2 APPSCHEMAA2_USER_162 INCLUDE USER APPSCHEMAA2_USER ----------------------------------------------------------------------------

    Major DB Time/ User Calls/ Transactions Captured

    Filtered Users Captured

    Services that were captured

  • 19

    DB Replay Details - Replay Details/Tips SCN was used for synchronization parameter Calibration tool used to calculate number of clients

    Typically 15-25 clients were used for replay Remapping was achieved by defining database service name

    for the clients

    Think time auto correct was used. Script was used to run the replays. Scale up multiplier set to 1 as Real Application testing was

    used as a tool for performance testing and not stress testing.

  • 20

    DB Replay Details - Replay Report

    Replay Options ================================================================= Option Name Value Synchronization SCN Connect Time 0% Think Time 0% Think Time Auto Correct TRUE Number of WRC Clients 15 (15 Completed, 0 Running ) Replay Statistics ================================================================= Statistic Replay Capture DB Time 32811.942 s 37622.648 s AvG Active Sessions 7.43 10.45 User calls 12824968 12824968 Network Time 2925862.226 s . Think Time 766.466 s . =================================================================== Replay Divergence Summary Divergence Type Count % Total Session Failures During Replay 0 0.00 Errors No Longer Seen During Replay 79 0.00 New Errors Seen During Replay 532 0.00 Errors Mutated During Replay 0 0.00 DMLs with Different Number of Rows Modified 3382 0.03 SELECTs with Different Number of Rows Fetched 514558 4.01

    Replay errors were minor

  • 21

    DB Replay Details Capture and Replay Compare

    Information About AWR and Time Periods Interconnect Usage Main Performance Statistics

    ADDM REPORT I/O to Data and Temp Files

    Capture total

    time(s)

    Replay total

    time(s)

    Database Time 44009.17 34604

    CPU Time 10567.09 10623.24

    User I/O Wait Time 30913.41 23118.46

    Cluster Wait Time N/A 0 impact (sec)

    average active

    sessions

    percent active

    sessions

    "User I/O" wait Class

    Capture 30608 8.39 69.31

    Replay 22439 5.07 63.49

    Global Cache Messaging

    Capture N/A N/A N/A

    Replay 849 0.19 2.4

    PL/SQL Execution Capture 1615 0.44 3.66

    Replay 2453 0.55 6.94

    Capture Replay average 8Kb ping time (ms) Not RAC 667.56 average fast grant time (ms) Not RAC 0.24 average fast block transfer time (ms) Not RAC 0.58 total GC blocks transferred Not RAC 21228672 total interconnect usage (G bits) Not RAC 4001.231 interconnect throughput (Mb/s) Not RAC 926.56

    Capture Replay average time of single block read (ms) 5.95 3.8 total number of single block reads 4537537 5144026 total physical read 59.515 G 72.303 G total physical write 7.201 G 9.861 G read rate 16.71M/sec 16.743M/s write rate 2.022M/sec 2.284M/s

    Capture Replay Start Time 2012-07-16 11:03:08 2012-07-16 17:45:34 End Time 2012-07-16 12:03:51 2012-07-16 18:57:42 Duration 1.01 hours 1.2 hours

  • 22

    1st (CAPTURE) 2nd (REPLAY)

    Event

    Wait Class Waits Time (s)

    Avg Time (ms)

    Waits Time (s)

    Avg Time (ms) Performance

    db file sequential read User I/O 4,263,441 37,343 8.76 2,868,731 11,541 4.02 Better direct path read User I/O 78,285 3,551 45.41 32,702 613 18.75 Better log file sync Commit 205,154 3,090 15.06 75,012 213 2.84 Better log file parallel write System I/O 291,138 1,547 5.32 357,429 498 1.39 Better db file parallel write System I/O 257,180 1,081 4.2 809,049 688 0.85 Better

    DB Replay Details Performance Comparison

    AWR Performance Comparison for a Replay

  • 23

    1st (CAPTURE) 2nd (REPLAY)

    Event

    Wait Class Waits Time (s)

    Avg Time (ms)

    Waits Time (s) Avg Time (ms) Performance

    db file sequential read User I/O 5,243,296 34,811 6.64 16,932,235 26,133 1.54 Avg.better but Total Wait high

    Log file sync Commit 275,891 3,554 12.88 78,157 152 1.94 Better log file parallel write System I/O 368,717 1,698 4.6 593 1 0.4 Better Read by other session User I/O 702,920 1,616 2.3 204,975 428 2.09 Better

    db file scattered read User I/O 2,204 50 23.11 19,227,535 12,628 0.66 Avg.better but Total Wait high

    DB Replay Details Performance Comparison

    AWR Performance Comparison for a Replay with I/O issues

  • 24

    DB Replay Details Test Cycles

    Multiple cycles of OLTP captures and replays were done.

    Captures taken were minimal and retaken only when source had gone through a release cycle.

    Sometimes, after the restore, a guaranteed restore point was set on the target system, and the database was flash-backed to run multiple cycles with different settings to save restore time.

    One off patches applied were first tested with a DB Replay cycle

  • Issues encountered during DB Replay

    Issue 1 : Pre-processing made system tablespace become full Fix : Run the capture in batches.

    Issue 2 : ORA-15590: encountered an incomplete workload capture file.

    Fix : As per oracle this could be ignored.

    Issue 3 : ORA-600 [17183] calling dbms_workload_replay process_capture

    Fix : Applied DB Replay patch

  • 26

    Issues encountered during DB Replay (Contd.)

    Issue 4 : Replay seemed to be hanging Fix : There were some minor changes in workload. Replay ran

    fine after stats was gathered on the database.

    Issue 5 : Batches after replay complained of constraint violations.

    Fix : The violations were for sequences. Once these were bumped up, the replay ran successfully.

  • 27

    Issues encountered during DB Replay - Limitations

    Only selects can be ramped up. DMLs can only be replayed once.

    Some part of our application uses XA connections. They are not yet supported in Real Application testing.

    CTAS operations are not captured

  • 28

    Value

    Service Highlights

    Comprehensive support offering with Solution Support Center

    Focused engineering team Fast track solutions based on intimate

    knowledge of customer environment and

    business needs Proactive advice and knowledge transfer Performance and patch reviews Single point of contact for all technical

    issues - Technical Lead Single point of contact for all escalations

    Service Delivery Manager (SDM) Onsite assistance Proof of Concept and Test Case assistance

    Maximize Availability

    Mitigate risks

    Optimize Performance

    Reduced costs

    Oracle ACS for Bank of America

  • Summary Lessons Learned - Traditional testing methods limitations:

    They have to be setup each testing cycle They require a lot of resources and coordination is needed amongst all teams Baselines set can be questionable making it difficult to compare results,

    remediate or tune the system

    Last month our application in BofA migrated successfully to RAC platform without any issues unlike the previous migrations. Considering past migrations, this undoubtedly was due to a combination of traditional and Real Application Testing.

    Reports from DB Replay gave conclusive evidence of system performance which boosted DBA and Application teams confidence on the migration.

  • 30

    Q & A

    Questions ?

  • 31

    For general information on Advanced Customer Support Services, go to oracle.com/acs

    or e-mail [email protected]

    My Email: Keerti Prasad ([email protected])

    Acknowledgements: Sridhar Kakumanu ([email protected]) Mohammad Awwal, Ph.D. ([email protected]) Jeremy Schirer ([email protected]) Bryant Bisnar ([email protected]) Stephen Ahn ([email protected])

    Acknoledgements/Email