being sure: confident consolidations with oracle real application

36
Being Sure Confident Consolidations with Oracle Real Application Testing 12c Jeremiah Wilton, Cluster Technical Lead September, 2013

Upload: others

Post on 14-Feb-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Being Sure

Confident Consolidations with Oracle Real Application Testing 12c

Jeremiah Wilton, Cluster Technical Lead

September, 2013

Jeremiah Wilton [email protected]

•  Working with Oracle since 1994 (v.5) •  Amazon’s first DBA 1997 – 2005 •  Remote DBA and troubleshooter •  Technical leadership and mentoring •  Focus areas:

–  Recovery, repair and salvage –  Systems design and architecture –  Cloud computing –  Consolidation and license economy –  Availability and scalability –  Service-time-oriented performance management

© 2013 Pythian Confidential 2

Who is Pythian? •  Service provider to data-driven businesses •  Employ only industry top talent for Data Infrastructure services •  Typically engaged by IT and Operations executives looking to

address skill or resourcing gaps

•  Two main service offerings:

–  Managed Services •  24x7x365 named team •  Monitoring and rapid response •  All senior resources •  Compliment in-house with breadth, depth, availability •  Monthly services model

–  Consulting Services •  Dedicated resource

Why is consolidation suddenly so big again?

•  Part of Oracle’s push for license elasticity

•  Allow customers to start small, grow into a larger footprint

•  Allow customers to consolidate into smaller footprints

•  Lower initial barriers to adoption of Oracle

•  Dilute justification for migration to other technologies

Cost Drives Consolidation (especially license cost)

Specialized Hardware Drives Consolidation •  Engineered systems provide

special capabilities

•  Oracle-designed/built infrastructure

•  Smart Scan / Storage Cells

•  Promoted as a single point of consolidation

Specialized Software Drives Consolidation

•  12c Multitenant –  Designed expressly for consolidation

•  OVM / hard partitioning –  Enables licensing a subset of cores

•  RAC –  Enables data services to scale across infrastructure

The Cloud Drives Consolidation •  Allows customers to start small

•  License economy on small VMs / instances

•  Radically different resource profile than bare metal •  Business requires assurance of scale

Consolidation is rife with uncertainty •  Known workloads

•  Unknown effect of coexistence

•  Different I/O subsystems

•  Different processors / architectures

•  You can’t just sum active sessions; the result of concurrency is unpredictable

Complimentary and non-complimentary load patterns

The need for testing real workloads •  Testing real production workloads eliminates

uncertainty •  Unexpected side effects of coexistence and

concurrency are revealed before consolidation •  Errors, regressions, bugs resulting from new

environment are revealed before consolidation

•  Real Application Testing (Consolidated Database Replay)

RAT Basics •  Rationale

–  Change assurance •  Feature evolution

–  Backport captures and other features –  Scale up, timeouts, reports, multiple workloads

•  Cost –  $11,500 per license list; ~¼ of Enterprise Edition –  Same as Partitioning and Advanced Compression options

•  SPA •  DB Replay

Consolidated Replay •  Multiple unrelated workloads can be executed at the

same time –  Separated by schema / objects –  Separated by pluggable database

•  12c native feature

•  11gR2 patch-enabled feature

Only same-database consolidations need Consolidated DB Replay

•  Classic DB Replay –  Consolidation by instance –  Consolidation by virtual machine

•  Consolidated DB Replay –  Consolidation into one database (by schema) –  Consolidation into one database (by PDB)

Some sane approaches •  Capture workloads for hours of days you expect

to represent peak workload

•  Use an average active sessions approach

•  Capture a workload long enough to represent real business but short enough so that you can repeatedly test

A case study •  Two synthetic workloads, two separate DBs

–  Dell DVD Store 2 –  Dominic Giles’s Swingbench Order Entry

•  Captured 36 minutes of peak workload from each •  Exported each DB as of beginning of capture •  Imported each DB into separate 12c PDBs on a new

host/DB •  Used Consolidated DB Replay to concurrently replay

workloads against PDBs

Details: Workload Capture (DS2) create  directory  cap_ds2            as  '/u01/app/oracle/admin/uw01/cap_ds2';    exec  dbms_workload_capture.add_filter(  -­‐            fname=>'DS2USERFILTER',  -­‐            fattribute=>'USER',  -­‐            fvalue=>'DS2')    exec  dbms_workload_capture.start_capture(  -­‐            name=>'CAP_DS2',  -­‐            dir=>'CAP_DS2',  -­‐            duration=>2160,  -­‐            default_action=>'EXCLUDE')    Wait  ~36  minutes…  

Capture only DS2 user’s workload. Can also filter by:

•  Instance •  Module •  Action •  Program •  Service

Details: Workload Capture (DS2) Wait  ~36  minutes…    select  status,  start_scn,  dbtime  from  dba_workload_captures    where  name  =  'CAP_DS2'    STATUS                                START_SCN          DBTIME  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐  COMPLETED                              5548016  1909135166    expdp  directory=data_pump_dir  \              dumpfile=ds2.dmp  \              flashback_scn=5548016  \              schemas=ds2    

Get the SCN of the moment the capture started. Our copy for testing must be from that moment in time

Make a copy as of the capture start SCN

Average Active Sessions (captured workloads)

SOE

DS2

Details: Consolidate into PDB (DS2) create  pluggable  database  ds2                admin  user  ds2                identified  by  ds2                roles=(DBA);    alter  pluggable  database  ds2  open;    connect  ds2/ds2@localhost/ds2    create  tablespace  ds2  datafile  size  2G;    alter  user  ds2  default  tablespace  ds2;    alter  user  ds2  quota  unlimited  on  ds2;      impdp  ds2/ds2@localhost/ds2  directory=dp_dir_2  \  dumpfile=ds2.dmp  remap_tablespace=ORDERTBS:DS2  \  remap_tablespace=CUSTTBS:DS2  remap_tablespace=DS_MISC:DS2  \  remap_tablespace=INDXTBS:DS2    

For this demo, I used multitenant. I used Data Pump to move the data in. I also could have adopted (plugged in) the entire source DB as a PDB

Details: Workload Replay (Combined) create  restore  point  before_replay                  guarantee  flashback  database;      create  directory  cons_replay            as  '/u01/app/oracle/admin/orcl/cons_replay';  create  directory  ds2            as  '/u01/app/oracle/admin/orcl/cons_replay/cap_ds2';  create  directory  soe            as  '/u01/app/oracle/admin/orcl/cons_replay/cap_soe';  ')  

I used a guaranteed restore point (GRP) so that I could test, tune, and repeat the test as many times as I wanted.

Consolidated Replay requires each of the workloads to be placed in subdirectories of a single workload directory.

Details: Workload Replay (Combined) exec  dbms_workload_replay.process_capture(capture_dir=>'DS2')  exec  dbms_workload_replay.process_capture(capture_dir=>'SOE')  

Before a workload can be replayed, it must be processed by the database version where it will be replayed. This procedure reads through the captured workload and creates several files containing metadata about the workload.

$  find  /u01/app/oracle/admin/orcl/cons_replay/cap_soe/pp*  /u01/app/oracle/admin/orcl/cons_replay/cap_soe/pp12.1.0.1.0  /u01/app/oracle/admin/orcl/cons_replay/cap_soe/pp12.1.0.1.0/wcr_data.extb  /u01/app/oracle/admin/orcl/cons_replay/cap_soe/pp12.1.0.1.0/wcr_conn_data.extb  /u01/app/oracle/admin/orcl/cons_replay/cap_soe/pp12.1.0.1.0/wcr_dep_graph.extb  /u01/app/oracle/admin/orcl/cons_replay/cap_soe/pp12.1.0.1.0/wcr_references.extb  /u01/app/oracle/admin/orcl/cons_replay/cap_soe/pp12.1.0.1.0/wcr_login.pp  /u01/app/oracle/admin/orcl/cons_replay/cap_soe/pp12.1.0.1.0/wcr_scn_order.extb  /u01/app/oracle/admin/orcl/cons_replay/cap_soe/pp12.1.0.1.0/wcr_commits.extb  /u01/app/oracle/admin/orcl/cons_replay/cap_soe/pp12.1.0.1.0/wcr_calibrate.xml  /u01/app/oracle/admin/orcl/cons_replay/cap_soe/pp12.1.0.1.0/wcr_seq_data.extb  /u01/app/oracle/admin/orcl/cons_replay/cap_soe/pp12.1.0.1.0/wcr_process.wmd  

Details: Workload Replay (Combined) exec  dbms_workload_replay.set_replay_directory(  -­‐            replay_dir=>'CONS_REPLAY')        variable  ds2  number  variable  soe  number  exec  dbms_workload_replay.begin_replay_schedule('CONS_SCHEDULE')  exec  :ds2  :=  dbms_workload_replay.add_capture('DS2')  exec  :soe  :=  dbms_workload_replay.add_capture('SOE')  exec  dbms_workload_replay.end_replay_schedule    exec  dbms_workload_replay.initialize_consolidated_replay(  -­‐            replay_dir_obj=>'CONS_REPLAY',  -­‐            schedule_name=>'CONS_SCHEDULE')    

Point DB Replay at the combined directory

A replay schedule defines the set of workloads to be simultaneously replayed

The initialize step loads the metadata from the process step into the database

Details: Remap Connections select  schedule_cap_id,  conn_id,  capture_conn,  replay_conn        from  dba_workload_connection_map;    SCHEDULE_CAP_ID  CONN_ID  CAPTURE_CONN                                                REPLAY_CONN  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐  -­‐-­‐-­‐-­‐-­‐-­‐-­‐  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐                              1              1  (DESCRIPTION=(CONNECT_DATA=(SERVICE                                                  _NAME=uw01)(CID=(PROGRAM=C:\Users\w                                                  ilton\TPG\OOW\ds2\oracleds2\ds2orac                                                  ledriver.exe)(HOST=WILTON-­‐WIN7PR-­‐A)                                                  (USER=wilton)))(ADDRESS=(PROTOCOL=T                                                  CP)(HOST=127.0.0.1)(PORT=21521)))                                2              2  (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP                                                  )(HOST=127.0.0.1)(PORT=21521))(CONN                                                  ECT_DATA=(CID=(PROGRAM=JDBC  Thin  Cl                                                  ient)(HOST=__jdbc__)(USER=wilton))(                                                  SERVICE_NAME=uw01)(CID=(PROGRAM=JDB                                                  C  Thin  Client)(HOST=__jdbc__)(USER=                                                  wilton))))    

Initialize loads the connection details exactly as they appeared in the production workload. For the replay clients to connect successfully, we must remap each of the original connections to point to the test system

Details: Remap Connections exec  dbms_workload_replay.remap_connection(schedule_cap_id=>1,  connection_id=>1,  -­‐            replay_connection=>'DS2')    exec  dbms_workload_replay.remap_connection(schedule_cap_id=>2,  connection_id=>2,  -­‐            replay_connection=>'SOE')    select  schedule_cap_id,  conn_id,  capture_conn,  replay_conn        from  dba_workload_connection_map;    SCHEDULE_CAP_ID  CONN_ID  CAPTURE_CONN                                                              REPLAY_CONN  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐  -­‐-­‐-­‐-­‐-­‐-­‐-­‐  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐                              1              1  (DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=u  DS2                                                  w01)(CID=(PROGRAM=C:\Users\wilton\TPG\OOW\                                                  ds2\oracleds2\ds2oracledriver.exe)(HOST=WI                                                  LTON-­‐WIN7PR-­‐A)(USER=wilton)))(ADDRESS=(PRO                                                  TOCOL=TCP)(HOST=127.0.0.1)(PORT=21521)))                                2              2  (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=  SOE                                                  127.0.0.1)(PORT=21521))(CONNECT_DATA=(CID=                                                  (PROGRAM=JDBC  Thin  Client)(HOST=__jdbc__)(                                                  USER=wilton))(SERVICE_NAME=uw01)(CID=(PROG                                                  RAM=JDBC  Thin  Client)(HOST=__jdbc__)(USER=                                                  wilton))))  

Details: Prepare the consolidated workload

exec  dbms_workload_replay.prepare_consolidated_replay  

The prepare step allows you to set the various parameters that control how replay will behave. Some of the configurable options: •  synchronization (SCN / OFF) •  connect_time_scale •  think_time_scale •  think_time_auto_correct •  capture_sts (SQL Tuning Set)

Details: Replay the consolidated workload

wrc  system/manager  replaydir=/u01/app/oracle/admin/orcl/cons_replay    Wait  for  the  replay  to  start  (16:23:10)      wrc  system/manager  replaydir=/u01/app/oracle/admin/orcl/cons_replay    Wait  for  the  replay  to  start  (16:23:10)    

Generally on a separate host (or hosts), acting as the “client” machine(s), run one or more instances of the Workload Replay Client (WRC) for each workload you need to replay. The client hosts also must have a copy of the processed workload to read from.

Details: Replay the consolidated workload

...Wait  for  the  replay  to  start  (16:23:10)    exec  dbms_workload_replay.start_consolidated_replay    ...  Replay  client  1  started  for  scheduled  capture  1  (16:23:20)  Replay  client  1  finished  (17:21:15)    ...  Replay  client  2  started  for  scheduled  capture  2  (16:23:20)  Replay  client  2  finished  (17:16:35)  

Upon issuing “start” on the database, the WRCs wake up, and replay the workload.

Results •  First try

–  DS2 took 52 minutes to complete 36-minute workload –  SOE took 49 minutes to complete 36-minute workload

•  Where did the time go?

Results •  Log file sync wasn’t such a big deal on either

app before move to 12c PDBs and new host

Some tuning •  If log file sync is the largest source of wait time,

what are the log writer and workers doing?

Some tuning •  Change to faster storage for logs

–  Amazon EC2 Provisioned 4000 IOPS volume •  Flash back to guaranteed restore point and repeat test

–  DS2 took 34 minutes to complete 36-minute workload –  SOE took 29 minutes to complete 36-minute workload

Some tuning •  Both apps now down to mainly CPU and I/O •  Both apps complete the same work in less time •  Remainder of the tuning is SQL (if necessary)

Active Session Comparison (separate vs. consolidated)

Conclusions •  When consolidating it is useful to have data showing

how workloads will perform when combined

•  You can’t just sum average active sessions, because the results of concurrency are unpredictable (log file sync)

•  Consolidated DB Replay is a good tool for testing consolidation using real workloads.

Questions

•  [email protected]