introduction to parallel execution 4119

Upload: agy-herlangga

Post on 10-Apr-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Introduction to Parallel Execution 4119

    1/48

    Tuning & Tracing Parallel Execution

    (An Introduction)

    Doug Burns([email protected])

  • 8/8/2019 Introduction to Parallel Execution 4119

    2/48

    Introduction Introduction

    Parallel Architecture

    Configuration

    Dictionary Views

    Tracing and Wait Events

    Conclusion

  • 8/8/2019 Introduction to Parallel Execution 4119

    3/48

    Parallel Query Option introduced in 7.1 Now called Parallel Execution

    Parallel Execution splits a single large task intomultiple smaller tasks which are handled byseparate processes running concurrently. Full Table Scans

    Partition Scans Sorts

    Index Creation

    And others

    Introduction

  • 8/8/2019 Introduction to Parallel Execution 4119

    4/48

    A little history

    So why did so few sites implement PQO?

    Introduction

    - Lack of understanding

    - Leads to horrible early experiences

    - Community's resistance to change

    - Not useful in all environments

    - Needs time and effort applied to the initial design!

    Isnt Oracles Instance architecture parallelanyway?

  • 8/8/2019 Introduction to Parallel Execution 4119

    5/48

    Non-Parallel Architecture?Introduction

  • 8/8/2019 Introduction to Parallel Execution 4119

    6/48

    Parallel Architecture Introduction

    Parallel Architecture

    Configuration

    Dictionary Views

    Tracing and Wait Events

    Conclusion

  • 8/8/2019 Introduction to Parallel Execution 4119

    7/48

    Parallel Architecture

    Non-Parallel

    EM P

    Server

    Process

    User Process

    select * from emp;

    Parallel

    Deg 2

    Slave

    0

    Rea ding 1st

    Half

    Slave 1

    Re a ding 2nd

    Half

    EMP

    QC

    User Process

    select /*+

    parallel(emp,2) */

    * from emp;

  • 8/8/2019 Introduction to Parallel Execution 4119

    8/48

    The Degree of Parallelism (DOP) refers to thenumber of discrete threads of work

    The default DOP for an Instance is calculatedas cpu_count * parallel_threads_per_cpu Used if I dont specify a DOP in a hint or table

    definition

    The maximum number of PX slaves is :- DOP * 2 Plus the Query Coordinator But this is per Data Flow Operation

    And the slaves will be re-used

    Parallel Architecture

  • 8/8/2019 Introduction to Parallel Execution 4119

    9/48

    Parallel Architecture

    Inter-process communication is through message buffers (alsoknown as table queues)

    These can be stored in the shared pool or the large pool

    Slave

    0

    Sorting A- P

    Slave

    1

    Sorting Q- Z

    Slave

    2

    Reading 1st

    Half

    Slave 3

    Reading 2nd

    Half

    EMP

    QC

    QC

    (RANGER)

    User Process

    select *from emp

    order by

    name;

  • 8/8/2019 Introduction to Parallel Execution 4119

    10/48

    Parallel Architecture

    This slide intentionally leftblank

  • 8/8/2019 Introduction to Parallel Execution 4119

    11/48

    Methods of invoking Parallel Execution

    Table / Index LevelALTERTABLE emp PARALLEL(DEGREE 2);

    Optimizer HintsSELECT /*+ PARALLEL(emp) */ *

    FROM emp;

    Note Using Parallel Execution implies that you willbe using the Cost-based Optimiser

    As usual, appropriate statistics are vital Statement Level

    ALTER INDEX emp_idx_1 REBUILD

    PARALLEL 8;

    Parallel Architecture

  • 8/8/2019 Introduction to Parallel Execution 4119

    12/48

    Configuration Introduction

    Parallel Architecture

    Configuration

    Dictionary Views

    Tracing and Wait Events

    Conclusion

  • 8/8/2019 Introduction to Parallel Execution 4119

    13/48

    parallel_automatic_tuning First introduced in Oracle 8i

    This is the first parameter you should set - to TRUE

    An alternative point of view dont use it! Deprecated in 10G and default is FALSE but much of

    the same functionality is implemented

    Ensures that message queues are stored in theLarge Pool rather than the Shared Pool

    It modifies the values of other parameters As well as the 10g default values, the followingsections show the values whenparallel_automatic_tuning is set to TRUE on previousversions

    Configuration

  • 8/8/2019 Introduction to Parallel Execution 4119

    14/48

    parallel_adaptive_multi_user First introduced in Oracle 8

    Default Value FALSE (TRUE in 10g)

    Automatic Tuning Default TRUE Designed when using PX for online usage

    As workload increases, new statements will havetheir degree of parallelism down-graded.

    Configuration

    Effective Oracle by Design Tom Kyte

    This provides the best of both worlds and what usersexpect from a system. They know that when it is busy,it will run slower.

  • 8/8/2019 Introduction to Parallel Execution 4119

    15/48

    parallel_max_servers Default - cpu_count * parallel_threads_per_cpu * 2

    (if using automatic PGA management) * 5 e.g. 1 CPU * 2 * 2 * 5 = 20 on my laptop

    The maximum number of parallel execution slavesavailable for all sessions in this instance.

    Watch out for the processes trap!

    parallel_min_servers Default - 0 May choose to increase this if PX usage is constant

    to reduce overhead of starting and stopping slaveprocesses.

    Configuration

    More on this subject in tomorrows presentation

  • 8/8/2019 Introduction to Parallel Execution 4119

    16/48

    parallel_execution_message_size

    Default Value 2148 bytes

    Automatic Tuning Default 4Kb

    Maximum size of a message buffer May be worth increasing to 8Kb, depending on wait

    event analysis.

    However, small increases in message size could leadto large increases in large pool memoryrequirements

    Remember that DOP2 relationship and multiplesessions

    Configuration

  • 8/8/2019 Introduction to Parallel Execution 4119

    17/48

    Metalink Note 201799.1 contains full detailsand guidance for setting all parameters

    Ensure that standard parameters are also set

    appropriately large_pool_size Modified by parallel_automatic_tuning

    Calculation in Data Warehousing Guide

    Can be monitored using v$sgastat

    processes Modified by parallel_automatic_tuning

    sort_area_size

    For best results use automatic PGA management

    Be aware of _smm_px_max_size

    Configuration

  • 8/8/2019 Introduction to Parallel Execution 4119

    18/48

    Dictionary Views Introduction

    Parallel Architecture

    Configuration Dictionary Views

    Tracing and Wait Events

    Conclusion

  • 8/8/2019 Introduction to Parallel Execution 4119

    19/48

    Parallel-specific Dictionary Views

    SELECT table_nameFROM dict

    WHERE table_name LIKE 'V%PQ%' OR table_name like 'V%PX%;

    TABLE_NAME------------------------------

    V$PQ_SESSTATV$PQ_SYSSTATV$PQ_SLAVEV$PQ_TQSTATV$PX_BUFFER_ADVICEV$PX_SESSION

    V$PX_SESSTATV$PX_PROCESSV$PX_PROCESS_SYSSTAT

    Also GV$PQ_SESSTAT and GV$PQ_TQSTAT with INST_ID

    Dictionary Views

  • 8/8/2019 Introduction to Parallel Execution 4119

    20/48

    v$pq_sesstat Provides statistics relating to the current session

    Useful for verifying that a specific query is using

    parallel execution as expectedSELECT * FROM v$pq_sesstat;

    STATISTIC LAST_QUERY SESSION_TOTAL

    ------------------------------ ---------- -------------

    Queries Parallelized 1 1

    DML Parallelized 0 0

    DDL Parallelized 0 0DFOTrees 1 1

    Server Threads 3 0

    Allocation Height 3 0

    Allocation Width 1 0

    Local Msgs Sent 217 217

    Distr Msgs Sent 0 0

    Local Msgs Recv'd 217 217

    Distr Msgs Recv'd 0 0

    Dictionary Views

  • 8/8/2019 Introduction to Parallel Execution 4119

    21/48

    v$pq_sysstat

    The instance-level overview

    Various values, including information to help set

    parallel_min_servers and parallel_max_servers v$px_process_sysstat contains similar information

    SELECT * FROM v$pq_sysstat WHERE statistic like Servers%;

    STATISTIC VALUE

    ------------------------------ ----------

    Servers Busy 0

    Servers Idle 0

    Servers Highwater 3

    Server Sessions 3

    Servers Started 3

    Servers Shutdown 3

    Servers CleanedUp 0

    Dictionary Views

  • 8/8/2019 Introduction to Parallel Execution 4119

    22/48

    v$pq_slave Gives information on the activity of individual PX slaves

    v$px_process contains similar information

    SELECT slave_name, status, sessions, msgs_sent_total, msgs_rcvd_total

    FROM v$pq_slave;

    SLAV STAT SESSIONS MSGS_SENT_TOTALMSGS_RCVD_TOTAL

    ---- ---- ---------- --------------- ---------------

    P000 BUSY 3 465 508

    P001 BUSY 3 356 290P002 BUSY 3 153 78

    P003 BUSY 3 108 63

    P004 IDLE 2 249 97

    P005 IDLE 2 246 97

    P006 IDLE 2 239 95

    P007 IDLE 2 249 96

    Dictionary Views

  • 8/8/2019 Introduction to Parallel Execution 4119

    23/48

    v$pq_tqstat Shows communication relationship between slaves Must be executed from a session thats been using parallel

    operations refers to this session

    Example 1 Attendance Table (25,481 rows)break on dfo_number on tq_id

    SELECT /*+ PARALLEL(attendance, 4) */ *FROM attendance;

    SELECT dfo_number, tq_id, server_type, process, num_rows, bytesFROM v$pq_tqstatORDERBY dfo_number DESC, tq_id, server_type DESC, process;

    DFO_NUMBER TQ_ID SERVER_TYP PROCESS NUM_ROWS BYTES---------- ---------- ---------- ---------- ---------- ----------

    1 0 Producer P000 6605 114616Producer P001 6102 105653Producer P002 6251 110311Producer P003 6523 113032Consumer QC 25481 443612

    Dictionary Views

  • 8/8/2019 Introduction to Parallel Execution 4119

    24/48

    Example 2 - with a sort operation

    SELECT /*+ PARALLEL(attendance, 4) */ *FROM attendanceORDERBY amount_paid;

    DFO_NUMBER TQ_ID SERVER_TYP PROCESS NUM_ROWS BYTES---------- ---------- ---------- ---------- ---------- ----------

    1 0 Ranger QC 372 13322Producer P004 5744 100069Producer P005 6304 110167Producer P006 6303 109696Producer P007 7130 124060Consumer P000 15351 261380

    Consumer P001 10129 182281Consumer P002 0 103Consumer P003 1 120

    1 Producer P000 15351 261317Producer P001 10129 182238Producer P002 0 20Producer P003 1 37Consumer QC 25481 443612

    Dictionary Views

  • 8/8/2019 Introduction to Parallel Execution 4119

    25/48

    So why the unbalanced slaves? Check the list of distinct values in amount_paid

    SELECT amount_paid, COUNT(*)

    FROM attendance

    GROUP BY amount_paid

    ORDERBY amount_paid

    /

    AMOUNT_PAID COUNT(*)

    ----------- ----------

    200 1

    850 1900 1

    1000 7

    1150 1

    1200 15340

    1995 10129

    4000 1

    Dictionary Views

  • 8/8/2019 Introduction to Parallel Execution 4119

    26/48

    v$px_session and v$px_sesstat Query to show slaves and physical reads

    break on qcsid on server_set

    SELECT stat.qcsid, stat.server_set, stat.server#, nam.name, stat.valueFROM v$px_sesstat stat, v$statname nam

    WHERE stat.statistic# = nam.statistic#AND nam.name = physical readsORDERBY 1,2,3

    QCSID SERVER_SET SERVER# NAME VALUE---------- ---------- ---------- -------------------- ----------

    145 1 1 physical reads 0

    2 physical reads 03 physical reads 0

    2 1 physical reads 632 physical reads 563 physical reads 61 physical reads 4792

    Dictionary Views

  • 8/8/2019 Introduction to Parallel Execution 4119

    27/48

    v$px_process

    Shows parallel execution slave processes, status andsession information

    SELECT * FROM v$px_process;

    SERV STATUS PID SPID SID SERIAL#

    ---- --------- ---------- ------------ ---------- ----------

    P001 IN USE 18 7680 144 17

    P004 IN USE 20 7972 146 11

    P005 IN USE 21 8040 148 25P000 IN USE 16 7628 150 16

    P006 IN USE 24 8100 151 66

    P003 IN USE 19 7896 152 30

    P007 AVAILABLE 25 5804

    P002 AVAILABLE 12 6772

    Dictionary Views

  • 8/8/2019 Introduction to Parallel Execution 4119

    28/48

    Monitoring the SQL being executed by slaves

    set pages 0

    column sql_text format a60

    select p.server_name,sql.sql_text

    from v$px_process p, v$sql sql, v$session s

    WHERE p.sid = s.sidAND p.serial# = s.serial#

    AND s.sql_address = sql.address AND s.sql_hash_value = sql.hash_value

    /

    9i Results

    P001 SELECTA1.C0 C0,A1.C1 C1,A1.C2 C2,A1.C3 C3,A1.C4 C4,A1.C5 C5,

    A1.C6 C6,A1.C7 C7 FROM :Q3000 A1 ORDERBY A1.C0

    10g Results

    P001 SELECT /*+ PARALLEL(attendance, 2) */ * FROM attendance

    ORDERBY amount_paid

    Dictionary Views

  • 8/8/2019 Introduction to Parallel Execution 4119

    29/48

    Additional information in standard DictionaryViews

    e.g. v$sysstat

    SELECT name, value FROM v$sysstat WHERE name LIKE 'PX%';

    NAME VALUE

    ---------------------------------------------- ----------

    PX local messages sent 4895

    PX local messages recv'd 4892

    PX remote messages sent 0PX remote messages recv'd 0

    Dictionary Views

  • 8/8/2019 Introduction to Parallel Execution 4119

    30/48

    Monitoring the adaptive multi-user algorithm We need to be able to check whether operations are

    being downgraded and by how much

    Downgraded to serial could be a particular problem!SELECT name, value FROM v$sysstat WHERE name LIKE 'Parallel%'

    NAME VALUE

    ---------------------------------------------------------------- ----------

    Parallel operations not downgraded 546353

    Parallel operations downgraded to serial 432

    Parallel operations downgraded 75 to 99 pct 790Parallel operations downgraded 50 to 75 pct 1454

    Parallel operations downgraded 25 to 50 pct 7654

    Parallel operations downgraded 1 to 25 pct 11873

    Monitoring the adaptive multi-user algorithm We need to be able to check whether operations are

    being downgraded and by how much

    Downgraded to serial could be a particular problem!SELECT name, value FROM v$sysstat WHERE name LIKE 'Parallel%'

    NAME VALUE

    ------------------ ---------------------------------------------- ----------

    Parallel operations not downgraded 546353

    P*ssed-off users 432

    Parallel operations downgraded 75 to 99 pct 790Parallel operations downgraded 50 to 75 pct 1454

    Parallel operations downgraded 25 to 50 pct 7654

    Parallel operations downgraded 1 to 25 pct 11873

    Dictionary Views

  • 8/8/2019 Introduction to Parallel Execution 4119

    31/48

    Statspack Example Report (Excerpt)

    During overnight batch operation

    Mainly Bitmap Index creation Slightly difficult to read

    Parallel operations downgraded 1 0

    Parallel operations downgraded 25 0

    Parallel operations downgraded 50 7

    Parallel operations downgraded 75 38

    Parallel operations downgraded to 1Parallel operations not downgrade 22

    With one stream downgraded to serial, the rest of theschedule may depend on this one job.

    Dictionary Views

  • 8/8/2019 Introduction to Parallel Execution 4119

    32/48

    Tracing and Wait Events Introduction

    Parallel Architecture

    Configuration Dictionary Views

    Tracing and Wait Events

    Conclusion

  • 8/8/2019 Introduction to Parallel Execution 4119

    33/48

    Tracing Parallel Execution operations is morecomplicated than standard tracing One trace file per slave (as well as the query

    coordinator)

    Potentially 5 trace files even with a DOP of 2

    May be in background_dump_dest oruser_dump_dest (usually background_dump_dest)

    Tracing and Wait Events

    Optimizing Oracle Performance Millsap and HoltThe remaining task is to identify and analyze all of the

    relevant trace files. This task is usually simple

  • 8/8/2019 Introduction to Parallel Execution 4119

    34/48

    Much simpler in 10g Use trcsess to generate a consolidated trace file for

    QC and all slaves

    exec dbms_session.set_identifier(PX_TEST');

    REM tracefile_identifier is optional, but might make things easier for you

    alter session set tracefile_identifier=PX_TEST';

    exec dbms_monitor.client_id_trace_enable(PX_TEST');

    REM DO WORK

    exec dbms_monitor.client_id_trace_disable(PX_TEST);

    GENERATE THE CONSOLIDATED TRACE FILE AND THEN RUN IT THROUGH TKPROF

    trcsess output=/ora/admin/TEST1020/udump/PX_TEST.trc clientid=PX_TEST/ora/admin/TEST1020/udump/*px_test*.trc /ora/admin/TEST1020/bdump/*.trc

    tkprof /ora/admin/TEST1020/udump/DOUG.trc/ora/admin/TEST1020/udump/DOUG.out

    Tracing and Wait Events

  • 8/8/2019 Introduction to Parallel Execution 4119

    35/48

    This is what one of the slaves looks like

    C:\oracle\product\10.2.0\admin\ORCL\udump>cd ../bdump

    C:\oracle\product\10.2.0\admin\ORCL\bdump>more orcl_p000_2748.trc

    *** SERVICE NAME:(SYS$USERS) 2006-03-0710:57:29.812

    *** CLIENT ID:(PX_TEST) 2006-03-07 10:57:29.812

    *** SESSION ID:(151.24) 2006-03-0710:57:29.812

    WAIT #0: nam='PXDeq: Msg Fragment' ela= 13547sleeptime/senderid=268566527 passes=1 p3=0 obj#=-1 tim=3408202924

    =====================

    PARSING IN CURSOR #1 len=60 dep=1 uid=70 oct=3 lid=70 tim=3408244715hv=1220056081 ad='6cc64000'

    select /*+ parallel(test_tab3, 2) */ count(*)

    from test_tab3

    ENDOF STMT

    Tracing and Wait Events

  • 8/8/2019 Introduction to Parallel Execution 4119

    36/48

    Many more wait events and more time spentwaiting

    The various processes need to communicate with

    each other Metalink Note 191103.1 lists the wait events related

    to Parallel Execution

    But be careful of what Idle means

    Tracing and Wait Events

  • 8/8/2019 Introduction to Parallel Execution 4119

    37/48

    Events indicating consumers or QC are waitingfor data from producers PX Deq: Execute Reply

    PX Deq: Table Q Normal

    Although considered idle events, if these waits areexcessive, it could indicate a problem in theperformance of the slaves

    Investigate the slave trace files

    Tracing and Wait Events

  • 8/8/2019 Introduction to Parallel Execution 4119

    38/48

    Events indicating producers are quicker thanconsumers (or QC)

    PX qref latch

    Try increasingparallel_execution_message_size as

    this might reduce the communications overhead

    Although it could make things worse if the consumer isjust taking time to process the incoming data.

    Tracing and Wait Events

  • 8/8/2019 Introduction to Parallel Execution 4119

    39/48

    Messaging Events PX Deq Credit: need buffer

    PX Deq Credit: send blkd

    Although there may be many waits, the time spentshould not be a problem.

    If it is, perhaps you have an extremely busy server that

    is struggling to cope Reduce DOP?

    Increase parallel_execution_message_size?

    Dont use PX?

    Tracing and Wait Events

  • 8/8/2019 Introduction to Parallel Execution 4119

    40/48

    Query Coordinator waiting for the slaves toparse their SQL statements

    PX Deq: Parse Reply

    If there are any significant waits for this event, this mayindicate you have shared pool resource issues.

    Or youve encountered a bug!

    Tracing and Wait Events

  • 8/8/2019 Introduction to Parallel Execution 4119

    41/48

    Partial Message Event

    PX Deq: Msg Fragment

    May be eliminated or improved by increasing

    parallel_execution_message_size

    Not an issue on recent tests

    Tracing and Wait Events

  • 8/8/2019 Introduction to Parallel Execution 4119

    42/48

    Example Excerpt from an overnight Statspack Report

    Event Waits Timeouts Time (s) (ms) /txn

    direct Path read 2,249,666 0 115,813 51 25.5

    PXDeq: Execute Reply 553,797 22,006 75,910 137 6.3PX qref latch 77,461 39,676 42,257 546 0.9

    library cache pin 27,877 10,404 31,422 1127 0.3

    db file scattered read 1,048,135 0 25,144 24 11.9

    Direct Path Reads Sort I/O

    Read-ahead

    PX Slave I/O

    The average wait time SAN!

    Tracing and Wait Events

  • 8/8/2019 Introduction to Parallel Execution 4119

    43/48

    Event Waits Timeouts Time (s) (ms) /txn

    direct Path read 2,249,666 0 115,813 51 25.5

    PXDeq: Execute Reply 553,797 22,006 75,910 137 6.3

    PX qref latch 77,461 39,676 42,257 546 0.9

    library cache pin 27,877 10,404 31,422 1127 0.3

    db file scattered read 1,048,135 0 25,144 24 11.9

    PX Deq: Execute Reply

    Idle event QC waiting for a response from slaves

    Some waiting is inevitable

    PX qref latch

    Largely down to the extreme use of Parallel Execution Practically unavoidable but perhaps we could increase

    parallel_execution_message_size?

    Library cache pin? Need to look at the trace files

    Tracing and Wait Events

  • 8/8/2019 Introduction to Parallel Execution 4119

    44/48

    Conclusion

    Introduction

    Parallel Architecture

    Configuration Dictionary Views

    Tracing and Wait Events

    Conclusion

  • 8/8/2019 Introduction to Parallel Execution 4119

    45/48

    Plan / Test / Implement Asking for trouble if you dont!

    Hardware

    Its designed to suck the server dry Trying to squeeze a quart into a pint pot will make

    things slow down due to contention

    Tune the SQL first All the old rules apply

    The biggest improvements come from doing lessunnecessary work in the first place

    Even if PX does make things go quickly enough, itsgoing to use a lot more resources doing so

    Conclusion

  • 8/8/2019 Introduction to Parallel Execution 4119

    46/48

    Dont use it for small, fast tasks They wont go much quicker They might go slower They will use more resources

    Dont use it for online Not unless its a handful of users With a predictable maximum number of concurrent

    activities Who understand the implications and wont go crazy

    when something takes four times as long as normal! It gives a false initial perception of high performance and

    isnt scalable Okay, Tom, set parallel_adaptive_multi_user to TRUE

    Conclusion

  • 8/8/2019 Introduction to Parallel Execution 4119

    47/48

    The slower your I/O sub-system, the morebenefit you are likely to see from PX

    But shouldnt you fix the underlying problem?

    More on this in the next presentation

    Consider whether PX is the correct parallelsolution for overnight batch operations

    A single stream of parallel jobs?

    Parallel streams of single-threaded jobs?

    Unfortunately youll probably have to do some workto prove your ideas!

    Conclusion

  • 8/8/2019 Introduction to Parallel Execution 4119

    48/48

    Tuning & Tracing Parallel Execution(An Introduction)

    Doug Burns

    ([email protected])(oracledoug.blogspot.com)(doug.burns.tripod.com)