bpel pm 11g performance tuning-7

Upload: tushar

Post on 15-Oct-2015

402 views

Category:

Documents


0 download

DESCRIPTION

This is seventh chapter of Oracle Fusion Middleware BPEL PM 11g Performance Tuning. This chapter covers EM Fusion Middleware Control and WLS Admin Console tuning .

TRANSCRIPT

  • 1

  • 2

    Contents

    DATA COLLECTION .............................................................................................................................3

    1 BEFORE LNP TEST .............................................................................................................................3

    1.1 INFRASTRUCTURE ..................................................................................................................................... 4

    1.2 JVM-HOTSPOT- AND JVM-JROCKIT- ........................................ 4

    1.3 WLS-THREADING- .................................................................................................. 4

    1.4 WLS-NETWORKIO- ................................................................................................. 4

    1.5 WLS-DATASOURCE- ................................................................................................ 4

    1.6 BPEL PM- ............................................................................................................. 5

    1.7 TECHNOLOGY ADAPTERS ............................................................................................................................ 5

    1.8 BPEL PM COMPOSITE ........................................................................................................................... 5

    1.9 DATABASE ............................................................................................................................................... 5

    2 DURING LNP TEST ............................................................................................................................5

    2.1 CPU AND RAM USAGE AND MEMORY AVAILABILITY ....................................................................................... 5

    2.2 JVMS HEAP USAGE, GARBAGE COLLECTION AND ANY MEMORY LEAK IF ANY ..................................................... 12

    2.3 DATABASE CONNECTIONS AVAILABILITY AND USAGE ..................................................................................... 12

    2.4 AVAILABILITY OF PERIPHERAL RESOURCES LIKE JMS QUEUES, AQ AND THEIR USAGE PATTERN ............................. 12

    3 AFTER LNP TEST ............................................................................................................................. 12

    3.1 GET FULL DUMP OF FEW IMPORTANT TABLES ............................................................................................... 12

    3.2 EXECUTE SQL QUERIES AS AND WHEN REQUIRED ......................................................................................... 13

    3.3 ANALYSIS OF AWR REPORT ..................................................................................................................... 19

    3.4 ANALYSIS OF SOA SERVER MEMORY USAGE ................................................................................................ 19

    4 REFERENCE ................................................................................................................................... 19

    Exhibits

    Exhibit 73: vmstat ........................................................................................................................... 6

    Exhibit 74: free ................................................................................................................................ 7

    Exhibit 75: pmap ............................................................................................................................. 7

    Exhibit 76: top ................................................................................................................................. 8

    Exhibit 77: sar-B .............................................................................................................................. 8

    Exhibit 78: meminfo ........................................................................................................................ 9

    Exhibit 79: mpstat ......................................................................................................................... 10

    Exhibit 80: mpstat 2 4 ................................................................................................................... 10

    Exhibit 81: df-m ............................................................................................................................. 11

  • 3

    DataCollection

    Data collection is very important aspect of any LnP testing effort. If data collection effort is not

    directed in right direction, whole of LnP testing effort may be go to drain and may result in false

    positive or false negative.

    Data collection in any LnP testing is divided into three parts on the basis of time of collection:

    Before LnP Testing

    During LnP Testing

    Post LnP Testing

    Data collection needs to happen for all parts of system which will participate in LnP testing.

    From BPEL PM perspective, focus is on:

    Operating System

    JVM

    WLS

    BPEL PM

    Database

    But in any enterprise deployment of BPEL, system under test will have more parts like JMS,

    boundary systems, OSB, Gateways, load balancers, etc.

    1 Before LnP Test

    The effort for data collection in case of Before LnP Testing is targeted for understanding the

    pre-conditions. One of the best tools to record data is spread sheet. My favorite is MS Excel but

    one can use Google Docs Sheet or any other.

    Refer PreLnP-Data.xlsx work book template for collecting initial conditions of LnP testing.

    PreLnP-Data.xlsx consists of following sheets:

    Infrastructure

    JVM-HotSpot-

    JVM-jRockit-

    WLS-Threading-

    WLS-NetworkIO-

    WLS-DataSource-

    BPEL PM-

    Technology Adapters

    BPEL PM Composite

    Database

  • 4

    Before peeking into each sheet, let us understand few rules which are followed in this

    workbook.

    In each sheet only yellow cells need value by a recorder.

    Blue cells values are calculated.

    In combo box, one needs to select one value.

    One need to create copies of sheets which has as suffix in their

    name for each managed server and tab color is green.

    1.1 Infrastructure

    As name suggests this sheet collects information pertaining to infrastructure of LnP

    environment. Primarily this sheet lists managed servers and server & operating system level

    parameters. This sheet has assumed Linux operating system. If one is running some other

    operating system then this sheet needs modifications.

    1.2 JVM-HotSpot- and JVM-jRockit-

    Out of these two sheets only one should survive in real world because one will be using only

    one JVM. If LnP environment is using HotSpot JVM then delete JVM-jRokit-

    sheet and vice versa. Now make copy of surviving sheet for each managed

    server in LnP environment and update the name of sheets accordingly.

    1.3 WLS-Threading-

    One should make copy of this sheet for each managed server in LnP environment and update

    the name of sheets accordingly. This sheet lists details of WorkManagers. When BPEL PM is

    installed, SOAWorkManager Work manager is created by default which can be used to manage

    threads for BPEL PM. In the LnP environment, one might have few more Work managers. If yes,

    then list details of those work managers as well.

    If LnP environment is not using WorkManager then fill up details about Default thread pool.

    1.4 WLS-NetworkIO-

    One should make copy of this sheet for each managed server in LnP environment and update

    the name of sheets accordingly. This sheet lists details of Muxers.

    1.5 WLS-DataSource-

    One should make copy of this sheet for each managed server in LnP environment and update

    the name of sheets accordingly. This sheet lists details of SOADataSource and

    SOALocalTxDataSource which are used by BPEL PM. If composites under consideration using

    some other data sources as well, list details pertaining to those as well.

  • 5

    1.6 BPEL PM-

    One should make copy of this sheet for each managed server in LnP environment and update

    the name of sheets accordingly. This sheet lists parameters related to BPEL PM Threads, time

    outs, Audit and Logging.

    1.7 Technology Adapters

    This sheet lists parameters related to File & FTP and Database adapter related parameters.

    Other technology adapters are covered in BPEL Composite sheet.

    1.8 BPEL PM Composite

    This sheet lists all composites under LnP Test consideration. This sheet also list parameters

    affecting performance at composite level.

    1.9 Database

    This sheet lists attributes related to databases which host SOAINFRA, MDS and any custom

    schemas. One may need to modify this sheet to incorporate additional parameters in case of

    custom schemas.

    2 During LnP Test

    One needs to monitor environment proactively during LnP Testing to find out any unusual

    behavior or gradual degradation of system. One should monitor following items during LnP

    Testing:

    CPU and RAM usage and availability pattern

    Operating systems resource availability and usage patterns (like handles to open file,

    open sockets, running processes, etc.)

    JVMs heap usage, garbage collection and memory leak if any

    Database connections availability and usage

    Availability and usage pattern of peripheral resources like JMS queues, AQs, databases

    This book doesnt assume any specific toolset for monitoring purpose and if tool is must in

    particular scenario then FREE is the key word.

    2.1 CPU and RAM usage and memory availability

    Linux provides few commands which can be utilized to monitor CPU, RAM and Physical memory

    usage as whole and specific process.

    vmstat reports virtual memory statistics for linux operating system.

  • 6

    Exhibit 1: vmstat

    Field Description for VM Mode

    procs r: The number of processes waiting for run time.

    b: The number of processes in uninterruptible sleep. memory swpd: the amount of virtual memory used. free: the amount of idle memory. buff: the amount of memory used as buffers. cache: the amount of memory used as cache. inact: the amount of inactive memory. (-a option) active: the amount of active memory. (-a option) swap si: Amount of memory swapped in from disk (/s). so: Amount of memory swapped to disk (/s).

    IO bi: Blocks received from a block device (blocks/s). bo: Blocks sent to a block device (blocks/s). system in: The number of interrupts per second, including the clock. cs: The number of context switches per second.

    cpu These are percentages of total CPU time

    us: Time spent running non-kernel code. (user time, including nice time)

    sy: Time spent running kernel code. (system time)

    id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time. wa: Time spent waiting for IO. Prior to Linux 2.5.41, shown as zero.

    free reports total amount of free and used physical memory in the system, as well as the

    buffers used by the kernel. The shared memory column should be ignored; it is obsolete.

  • 7

    Exhibit 2: free

    pmap reports memory map of a process or processes.

    Exhibit 3: pmap

    top reports dynamic real-time view of a running system. It can display system summary information as well as a list of tasks currently being managed by the Linux kernel. The types of

  • 8

    system summary information shown and the types, order and size of information displayed for tasks are all user configurable and that configuration can be made persistent across restarts.

    Exhibit 4: top

    sar B reports statistics on page swapping.

    Exhibit 5: sar-B

    pgpgin/s: Total number of kilobytes the system paged in from disk per second.

    pgpgout/s: Total number of kilobytes the system paged out to disk per second.

    fault/s: Number of page faults (major + minor) made by the system per second (post 2.5 kernels only). This is not a count of page faults that generate I/O, because some page faults can be resolved without I/O.

  • 9

    majflt/s: Number of major faults the system has made per second, those which have required loading a memory page from disk (post 2.5 kernels only).

    cat /proc/meminfo reports memory size and usage.

    Exhibit 6: meminfo

    mpstat reports CPU related statics. It has option to execute repeatedly with certain time interval.

  • 10

    Exhibit 7: mpstat

    CPU: Processor number. The keyword all indicates that statistics are calculated as averages

    among all processors.

    %user: Show the percentage of CPU utilization that occurred while executing at the user level

    (application).

    %nice: Show the percentage of CPU utilization that occurred while executing at the user level

    with nice priority.

    %system: Show the percentage of CPU utilization that occurred while executing at the system

    level (kernel). Note that this does not include the time spent servicing interrupts or softirqs.

    %iowait: Show the percentage of time that the CPU or CPUs were idle during which the system

    had an outstanding disk I/O request.

    %irq: Show the percentage of time spent by the CPU or CPUs to service interrupts.

    %soft: Show the percentage of time spent by the CPU or CPUs to service softirqs. A softirq

    (software interrupt) is one of up to 32 enumerated software interrupts which can run on

    multiple CPUs at once.

    %idle: Show the percentage of time that the CPU or CPUs were idle and the system did not

    have an outstanding disk I/O request.

    mpstat 2 4 Display four reports of global statistics among all processors at two second intervals.

    Exhibit 8: mpstat 2 4

    sar u also reports CPU utilization. Its output is similar to mpstat.

    ps -eo pcpu,pid,user,args | sort -r -k1 | less will list top 10 CPU users.

  • 11

    One should create a script which can be executed during LnP via cron to capture statics

    pertaining to cpu, ram and memory. These statics should be stored in a file for later analysis

    and should also made available to monitoring team via email or dashboard for on the flay

    actions.

    Disk space availability is often over looked. Keep an eye on it.

    df m

    Exhibit 9: df-m

    Keep on checking log files for resource issues:

    cat /var/log/messages

    To list open files use lsof . One should write output to a file using wc to parse it later.

    To list running processes use ps. One should write output to a file using wc to parse it later

  • 12

    2.2 JVMs heap usage, garbage collection and any memory leak if any

    To monitor JVM one can use freely available tools and can avoid huge license cost associated

    with proprietary tools. The three favorite tools are jVisualVM, jConsole, and jRockit Mission

    Control. For details refer Appendix K.

    2.3 Database connections availability and usage

    Since BPEL PM uses database extensively, usage monitoring of underlying database is very

    important. For details please refer Appendix J.

    2.4 Availability of peripheral resources like JMS queues, AQ and their usage

    pattern

    In any enterprise class deployment, JMS and AQ are very normal. To monitor JMS and AQ refer Appendix J.

    3 After LnP Test

    After LnP Test, primary source for data collection is SOAINFRA schema. There are two

    approaches for that.

    Get full dump of few important tables

    Execute SQL queries as and when required

    3.1 Get full dump of few important tables

    This approach is useful when consecutive LnP tests are lined up in quick succession and

    operation team need to purge database to make space for latest LnP Test. This approach poses

    a inherent risk of losing some important data due to purge. This risk can be mitigated by

    archiving data before purging.

    In this approach, sometime file size of spreadsheet become unmanageable which forces

    breaking of data on the basis of time or composites which makes analysis little bit more

    complex.

    Table under consideration are:

    COMPOSITE_INSTANCE COMPONENT_INSTANCE COMPOSITE_INSTANCE_FAULT REJECTED_MESSAGE REJECTED_MSG_NATIVE_PAYLOAD COMPOSITE_INSTANCE_ASSOC CUBE_INSTANCE AUDIT_TRAIL AUDIT_DETAILS

  • 13

    WORK_ITEM DLV_MESSAGE DOCUMENT_DLV_MSG_REF XML_DOCUMENT

    3.2 Execute SQL queries as and when required

    This approach is preferred because of flexibility of fetching data as and when required for

    analysis.

    Few of the useful queries are:

    To get the count of instances in different state for the BPEL Composites within a time

    range

    Select composite_name, count(*), DECODE(cube_instance.STATE, 0, 'INITIATED', 1, 'OPEN_RUNNING', 2, 'OPEN_SUSPENDED', 3, 'OPEN_FAULTED', 4, 'CLOSED_PENDING_CANCEL', 5, 'CLOSED_COMPLETED', 6, 'CLOSED_FAULTED', 7, 'CLOSED_CANCELLED', 8, 'CLOSED_ABORTED', 9, 'CLOSED_STALE', 10,'CLOSED_ROLLED_BACK', 'unknown') state from CUBE_INSTANCE where TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI:SS') >= '' and TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI')

  • 14

    Select * from dlv_message where dlv_type = 1 and state = 0 and composite_name = ''

    o To check for rejected messages for a specific composite

    Select count(*) from rejected_message where composite_name = ''

    Time taken by each instance of a process to execute within specified duration

    Select cikey, conversation_id, parent_id, ecid, state, status, domain_name, composite_name, cmpst_id, TO_CHAR(creation_date,'YYYY-MM-DD HH24:MI:SS') cdate, TO_CHAR(modify_date,'YYYY-MM-DD HH24:MI:SS') mdate, extract (day from (modify_date - creation_date))*24*60*60 + extract (hour from (modify_date - creation_date))*60*60 + extract (minute from (modify_date - creation_date))*60 + extract (second from (modify_date - creation_date)) execution_time from cube_instance where TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI') >= '' and TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI') = '' and TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI') = '' and TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI')

  • 15

    where TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI') >= '' and TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI') = '' and TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI') = '' and TO_CHAR(receive_date, 'YYYY-MM-DD HH24:MI') TRUNC(SYSDATE));

    List of all BPEL instances, their state, average, minimum, and maximum durations, and

    their counts:

    select domain_name, component_name, DECODE(state, '0','INITIATED', '1','OPEN_RUNNING', '2','OPEN_SUSPENDED', '3','OPEN_FAULTED', '4','CLOSED_PENDING_CANCEL', '5','CLOSED_COMPLETED', '6','CLOSED_FAULTED', '7','CLOSED_CANCELLED', '8','CLOSED_ABORTED',

  • 16

    '9','CLOSED_STALE', '10','CLOSED_ROLLED_BACK') state, TO_CHAR(AVG((TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),12,2))*60*60) + (TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),15,2))*60) + TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),18,4))),'999990.000') AVG, TO_CHAR(MIN((TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),12,2))*60*60) + (TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),15,2))*60) + TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),18,4))),'999990.000') MIN, TO_CHAR(MAX((TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),12,2))*60*60) + (TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),15,2))*60) + TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),18,4))),'999990.000') MAX, COUNT(1) count from cube_instance where TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI:SS') >= '' and TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI:SS') = '' and TO_CHAR(creation_date, ' YYYY-MM-DD HH24:MI:SS') '' and composite_name IN ('') order by modify_date DESC

    To get the information on currently running processes - Shortest and Longest running

    instances

    select * from ( select composite_name AS "ProcessName", TO_CHAR(MIN(creation_date),'YYYY-MM-DD HH:MI') AS "EarliestDate", COUNT(*) AS "TotalRunningProcesses", TO_NUMBER(SUBSTR(MIN(sysdate-creation_date), 1, INSTR(MIN(sysdate-creation_date), ' '))) AS "ShortestRunning (Days)", SUBSTR(MIN(sysdate-creation_date), INSTR(min(sysdate-creation_date),' ')+1,8) AS "ShortestRunning (Hours)", TO_NUMBER(SUBSTR(MAX(sysdate-creation_date), 1, INSTR(MAX(sysdate-creation_date), ' '))) AS "LongestRunning (Days)", SUBSTR(max(sysdate-creation_date),

  • 17

    INSTR(MAX(sysdate-creation_date),' ')+1,8) AS "LongestRunning (Hours)" from cube_instance where state = 1 group by composite_name ORDER BY "EarliestDate" DESC)

    To find a string in message

    select BLOBCONVERTER(DOCUMENT) , ci.cikey, ci.state, ci.status from cube_instance ci, dlv_message dlvm, document_dlv_msg_ref dlvmr, xml_document xmld where ci.cikey = dlvm.cikey and dlvm.message_guid = dlvmr.message_guid and dlvmr.document_id = xmld.document_id and Modify_Date between to_date ('', ' YYYY-MM-DD HH24:MI:SS') and to_date ('', ' YYYY-MM-DD HH24:MI:SS ') and BLOBCONVERTER(DOCUMENT) like '% %'

    To find number of composites took more than n, m, p seconds, average, min and max

    time

    DECLARE min_time float; max_time float; avg_time FLOAT; count_n NUMBER; count_m NUMBER; count_p NUMBER;

    input_date VARCHAR2(20) := ''; input_date1 VARCHAR2(20) := '';

    str VARCHAR2(1000);

    CURSOR c1 IS select DISTINCT composite_name from cube_instance where To_Date(To_char(creation_date,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') between to_date(input_date,'DD-MM-YYYY HH24:MI:SS') and to_date(input_date1,'DD-MM-YYYY HH24:MI:SS');

    BEGIN

    Dbms_Output.put_line('COMPOSITE NAME'||','||'MIN TIME'||','||'MAX TIME'||','||'AVG TIME'||','||' > n SECONDS'||','||' > m SECONDS'||','||' > p SECONDS');

    FOR i IN c1 LOOP

    select Min(To_Number(substr(to_char(modify_date - creation_date),18,(length(to_char(modify_date - creation_date)))))) into min_time from cube_instance where composite_name = i.composite_name and To_Date(To_char(creation_date,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') between to_date(input_date,'DD-MM-YYYY HH24:MI:SS') and to_date(input_date1,'DD-MM-YYYY HH24:MI:SS') order by modify_date-creation_date;

  • 18

    select Max(To_Number(substr(to_char(modify_date - creation_date),18,(length(to_char(modify_date - creation_date)))))) into max_time from cube_instance where composite_name =i.composite_name and To_Date(To_char(creation_date,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') between to_date(input_date,'DD-MM-YYYY HH24:MI:SS') and to_date(input_date1,'DD-MM-YYYY HH24:MI:SS') order by modify_date-creation_date;

    select Avg(To_Number(substr(to_char(modify_date - creation_date),18,(length(to_char(modify_date - creation_date)))))) into avg_time from cube_instance where composite_name =i.composite_name and To_Date(To_char(creation_date,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') between to_date(input_date,'DD-MM-YYYY HH24:MI:SS') and to_date(input_date1,'DD-MM-YYYY HH24:MI:SS') order by modify_date-creation_date;

    select Count(*) into count_n from cube_instance where composite_name=i.composite_name and To_Number(substr(to_char(modify_date - creation_date),18,(length(to_char(modify_date - creation_date))))) > n and To_Date(To_char(creation_date,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') between to_date(input_date,'DD-MM-YYYY HH24:MI:SS') and to_date(input_date1,'DD-MM-YYYY HH24:MI:SS');

    select Count(*) into count_m from cube_instance where composite_name=i.composite_name and To_Number(substr(to_char(modify_date-creation_date),18,(length(to_char(modify_date-creation_date))))) > m

    and To_Date(To_char(creation_date,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') between to_date(input_date,'DD-MM-YYYY HH24:MI:SS') and to_date(input_date1,'DD-MM-YYYY HH24:MI:SS');

    select Count(*) into count_p from cube_instance where composite_name=i.composite_name and To_Number(substr(to_char(modify_date-creation_date),18,(length(to_char(modify_date-creation_date))))) > p

    and To_Date(To_char(creation_date,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') between to_date(input_date,'DD-MM-YYYY HH24:MI:SS') and to_date(input_date1,'DD-MM-YYYY HH24:MI:SS');

    Dbms_Output.put_line(i.composite_name||','||min_time||','||max_time||','||avg_time||','||count_n ||','||count_m ||','||count_p);

    END LOOP; END;

  • 19

    3.3 Analysis of AWR Report

    Analyze AWR report to get information on bottlenecks on database queries and objects. To get

    better understanding on AWR report, refer Appendix H.

    3.4 Analysis of SOA server memory usage

    Refer Appendix J

    4 Reference

    Linux man pages: http://www.linuxmanpages.com

    Visual VM: http://visualvm.java.net

    jVisualVM Documentation:

    http://docs.oracle.com/javase/6/docs/technotes/tools/share/jvisualvm.html

    jConsole Documentation:

    http://docs.oracle.com/javase/6/docs/technotes/guides/management/jconsole.html

    Monitor jRockit using jRockit Mission Control:

    http://docs.oracle.com/cd/E13222_01/wls/docs90/ConsoleHelp/taskhelp/monitoring/

    MonitorTheJRockitVirtualMachine.html

    jRockit documentation:

    http://www.oracle.com/technetwork/middleware/jrockit/overview/missioncontrol-

    whitepaper-june08-1-130357.pdf