being sure: confident consolidations with oracle real application
TRANSCRIPT
Being Sure
Confident Consolidations with Oracle Real Application Testing 12c
Jeremiah Wilton, Cluster Technical Lead
September, 2013
Jeremiah Wilton [email protected]
• Working with Oracle since 1994 (v.5) • Amazon’s first DBA 1997 – 2005 • Remote DBA and troubleshooter • Technical leadership and mentoring • Focus areas:
– Recovery, repair and salvage – Systems design and architecture – Cloud computing – Consolidation and license economy – Availability and scalability – Service-time-oriented performance management
© 2013 Pythian Confidential 2
Who is Pythian? • Service provider to data-driven businesses • Employ only industry top talent for Data Infrastructure services • Typically engaged by IT and Operations executives looking to
address skill or resourcing gaps
• Two main service offerings:
– Managed Services • 24x7x365 named team • Monitoring and rapid response • All senior resources • Compliment in-house with breadth, depth, availability • Monthly services model
– Consulting Services • Dedicated resource
Why is consolidation suddenly so big again?
• Part of Oracle’s push for license elasticity
• Allow customers to start small, grow into a larger footprint
• Allow customers to consolidate into smaller footprints
• Lower initial barriers to adoption of Oracle
• Dilute justification for migration to other technologies
Specialized Hardware Drives Consolidation • Engineered systems provide
special capabilities
• Oracle-designed/built infrastructure
• Smart Scan / Storage Cells
• Promoted as a single point of consolidation
Specialized Software Drives Consolidation
• 12c Multitenant – Designed expressly for consolidation
• OVM / hard partitioning – Enables licensing a subset of cores
• RAC – Enables data services to scale across infrastructure
The Cloud Drives Consolidation • Allows customers to start small
• License economy on small VMs / instances
• Radically different resource profile than bare metal • Business requires assurance of scale
Consolidation is rife with uncertainty • Known workloads
• Unknown effect of coexistence
• Different I/O subsystems
• Different processors / architectures
• You can’t just sum active sessions; the result of concurrency is unpredictable
The need for testing real workloads • Testing real production workloads eliminates
uncertainty • Unexpected side effects of coexistence and
concurrency are revealed before consolidation • Errors, regressions, bugs resulting from new
environment are revealed before consolidation
• Real Application Testing (Consolidated Database Replay)
RAT Basics • Rationale
– Change assurance • Feature evolution
– Backport captures and other features – Scale up, timeouts, reports, multiple workloads
• Cost – $11,500 per license list; ~¼ of Enterprise Edition – Same as Partitioning and Advanced Compression options
• SPA • DB Replay
Consolidated Replay • Multiple unrelated workloads can be executed at the
same time – Separated by schema / objects – Separated by pluggable database
• 12c native feature
• 11gR2 patch-enabled feature
Only same-database consolidations need Consolidated DB Replay
• Classic DB Replay – Consolidation by instance – Consolidation by virtual machine
• Consolidated DB Replay – Consolidation into one database (by schema) – Consolidation into one database (by PDB)
Some sane approaches • Capture workloads for hours of days you expect
to represent peak workload
• Use an average active sessions approach
• Capture a workload long enough to represent real business but short enough so that you can repeatedly test
A case study • Two synthetic workloads, two separate DBs
– Dell DVD Store 2 – Dominic Giles’s Swingbench Order Entry
• Captured 36 minutes of peak workload from each • Exported each DB as of beginning of capture • Imported each DB into separate 12c PDBs on a new
host/DB • Used Consolidated DB Replay to concurrently replay
workloads against PDBs
Details: Workload Capture (DS2) create directory cap_ds2 as '/u01/app/oracle/admin/uw01/cap_ds2'; exec dbms_workload_capture.add_filter( -‐ fname=>'DS2USERFILTER', -‐ fattribute=>'USER', -‐ fvalue=>'DS2') exec dbms_workload_capture.start_capture( -‐ name=>'CAP_DS2', -‐ dir=>'CAP_DS2', -‐ duration=>2160, -‐ default_action=>'EXCLUDE') Wait ~36 minutes…
Capture only DS2 user’s workload. Can also filter by:
• Instance • Module • Action • Program • Service
Details: Workload Capture (DS2) Wait ~36 minutes… select status, start_scn, dbtime from dba_workload_captures where name = 'CAP_DS2' STATUS START_SCN DBTIME -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ COMPLETED 5548016 1909135166 expdp directory=data_pump_dir \ dumpfile=ds2.dmp \ flashback_scn=5548016 \ schemas=ds2
Get the SCN of the moment the capture started. Our copy for testing must be from that moment in time
Make a copy as of the capture start SCN
Details: Consolidate into PDB (DS2) create pluggable database ds2 admin user ds2 identified by ds2 roles=(DBA); alter pluggable database ds2 open; connect ds2/ds2@localhost/ds2 create tablespace ds2 datafile size 2G; alter user ds2 default tablespace ds2; alter user ds2 quota unlimited on ds2; impdp ds2/ds2@localhost/ds2 directory=dp_dir_2 \ dumpfile=ds2.dmp remap_tablespace=ORDERTBS:DS2 \ remap_tablespace=CUSTTBS:DS2 remap_tablespace=DS_MISC:DS2 \ remap_tablespace=INDXTBS:DS2
For this demo, I used multitenant. I used Data Pump to move the data in. I also could have adopted (plugged in) the entire source DB as a PDB
Details: Workload Replay (Combined) create restore point before_replay guarantee flashback database; create directory cons_replay as '/u01/app/oracle/admin/orcl/cons_replay'; create directory ds2 as '/u01/app/oracle/admin/orcl/cons_replay/cap_ds2'; create directory soe as '/u01/app/oracle/admin/orcl/cons_replay/cap_soe'; ')
I used a guaranteed restore point (GRP) so that I could test, tune, and repeat the test as many times as I wanted.
Consolidated Replay requires each of the workloads to be placed in subdirectories of a single workload directory.
Details: Workload Replay (Combined) exec dbms_workload_replay.process_capture(capture_dir=>'DS2') exec dbms_workload_replay.process_capture(capture_dir=>'SOE')
Before a workload can be replayed, it must be processed by the database version where it will be replayed. This procedure reads through the captured workload and creates several files containing metadata about the workload.
$ find /u01/app/oracle/admin/orcl/cons_replay/cap_soe/pp* /u01/app/oracle/admin/orcl/cons_replay/cap_soe/pp12.1.0.1.0 /u01/app/oracle/admin/orcl/cons_replay/cap_soe/pp12.1.0.1.0/wcr_data.extb /u01/app/oracle/admin/orcl/cons_replay/cap_soe/pp12.1.0.1.0/wcr_conn_data.extb /u01/app/oracle/admin/orcl/cons_replay/cap_soe/pp12.1.0.1.0/wcr_dep_graph.extb /u01/app/oracle/admin/orcl/cons_replay/cap_soe/pp12.1.0.1.0/wcr_references.extb /u01/app/oracle/admin/orcl/cons_replay/cap_soe/pp12.1.0.1.0/wcr_login.pp /u01/app/oracle/admin/orcl/cons_replay/cap_soe/pp12.1.0.1.0/wcr_scn_order.extb /u01/app/oracle/admin/orcl/cons_replay/cap_soe/pp12.1.0.1.0/wcr_commits.extb /u01/app/oracle/admin/orcl/cons_replay/cap_soe/pp12.1.0.1.0/wcr_calibrate.xml /u01/app/oracle/admin/orcl/cons_replay/cap_soe/pp12.1.0.1.0/wcr_seq_data.extb /u01/app/oracle/admin/orcl/cons_replay/cap_soe/pp12.1.0.1.0/wcr_process.wmd
Details: Workload Replay (Combined) exec dbms_workload_replay.set_replay_directory( -‐ replay_dir=>'CONS_REPLAY') variable ds2 number variable soe number exec dbms_workload_replay.begin_replay_schedule('CONS_SCHEDULE') exec :ds2 := dbms_workload_replay.add_capture('DS2') exec :soe := dbms_workload_replay.add_capture('SOE') exec dbms_workload_replay.end_replay_schedule exec dbms_workload_replay.initialize_consolidated_replay( -‐ replay_dir_obj=>'CONS_REPLAY', -‐ schedule_name=>'CONS_SCHEDULE')
Point DB Replay at the combined directory
A replay schedule defines the set of workloads to be simultaneously replayed
The initialize step loads the metadata from the process step into the database
Details: Remap Connections select schedule_cap_id, conn_id, capture_conn, replay_conn from dba_workload_connection_map; SCHEDULE_CAP_ID CONN_ID CAPTURE_CONN REPLAY_CONN -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ -‐-‐-‐-‐-‐-‐-‐ -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ 1 1 (DESCRIPTION=(CONNECT_DATA=(SERVICE _NAME=uw01)(CID=(PROGRAM=C:\Users\w ilton\TPG\OOW\ds2\oracleds2\ds2orac ledriver.exe)(HOST=WILTON-‐WIN7PR-‐A) (USER=wilton)))(ADDRESS=(PROTOCOL=T CP)(HOST=127.0.0.1)(PORT=21521))) 2 2 (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP )(HOST=127.0.0.1)(PORT=21521))(CONN ECT_DATA=(CID=(PROGRAM=JDBC Thin Cl ient)(HOST=__jdbc__)(USER=wilton))( SERVICE_NAME=uw01)(CID=(PROGRAM=JDB C Thin Client)(HOST=__jdbc__)(USER= wilton))))
Initialize loads the connection details exactly as they appeared in the production workload. For the replay clients to connect successfully, we must remap each of the original connections to point to the test system
Details: Remap Connections exec dbms_workload_replay.remap_connection(schedule_cap_id=>1, connection_id=>1, -‐ replay_connection=>'DS2') exec dbms_workload_replay.remap_connection(schedule_cap_id=>2, connection_id=>2, -‐ replay_connection=>'SOE') select schedule_cap_id, conn_id, capture_conn, replay_conn from dba_workload_connection_map; SCHEDULE_CAP_ID CONN_ID CAPTURE_CONN REPLAY_CONN -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ -‐-‐-‐-‐-‐-‐-‐ -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ 1 1 (DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=u DS2 w01)(CID=(PROGRAM=C:\Users\wilton\TPG\OOW\ ds2\oracleds2\ds2oracledriver.exe)(HOST=WI LTON-‐WIN7PR-‐A)(USER=wilton)))(ADDRESS=(PRO TOCOL=TCP)(HOST=127.0.0.1)(PORT=21521))) 2 2 (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST= SOE 127.0.0.1)(PORT=21521))(CONNECT_DATA=(CID= (PROGRAM=JDBC Thin Client)(HOST=__jdbc__)( USER=wilton))(SERVICE_NAME=uw01)(CID=(PROG RAM=JDBC Thin Client)(HOST=__jdbc__)(USER= wilton))))
Details: Prepare the consolidated workload
exec dbms_workload_replay.prepare_consolidated_replay
The prepare step allows you to set the various parameters that control how replay will behave. Some of the configurable options: • synchronization (SCN / OFF) • connect_time_scale • think_time_scale • think_time_auto_correct • capture_sts (SQL Tuning Set)
Details: Replay the consolidated workload
wrc system/manager replaydir=/u01/app/oracle/admin/orcl/cons_replay Wait for the replay to start (16:23:10) wrc system/manager replaydir=/u01/app/oracle/admin/orcl/cons_replay Wait for the replay to start (16:23:10)
Generally on a separate host (or hosts), acting as the “client” machine(s), run one or more instances of the Workload Replay Client (WRC) for each workload you need to replay. The client hosts also must have a copy of the processed workload to read from.
Details: Replay the consolidated workload
...Wait for the replay to start (16:23:10) exec dbms_workload_replay.start_consolidated_replay ... Replay client 1 started for scheduled capture 1 (16:23:20) Replay client 1 finished (17:21:15) ... Replay client 2 started for scheduled capture 2 (16:23:20) Replay client 2 finished (17:16:35)
Upon issuing “start” on the database, the WRCs wake up, and replay the workload.
Results • First try
– DS2 took 52 minutes to complete 36-minute workload – SOE took 49 minutes to complete 36-minute workload
• Where did the time go?
Some tuning • If log file sync is the largest source of wait time,
what are the log writer and workers doing?
Some tuning • Change to faster storage for logs
– Amazon EC2 Provisioned 4000 IOPS volume • Flash back to guaranteed restore point and repeat test
– DS2 took 34 minutes to complete 36-minute workload – SOE took 29 minutes to complete 36-minute workload
Some tuning • Both apps now down to mainly CPU and I/O • Both apps complete the same work in less time • Remainder of the tuning is SQL (if necessary)
Conclusions • When consolidating it is useful to have data showing
how workloads will perform when combined
• You can’t just sum average active sessions, because the results of concurrency are unpredictable (log file sync)
• Consolidated DB Replay is a good tool for testing consolidation using real workloads.