bpel pm 11g performance tuning-7

2

Contents

DATA COLLECTION .............................................................................................................................3

1 BEFORE LNP TEST .............................................................................................................................3

1.1 INFRASTRUCTURE ..................................................................................................................................... 4

1.2 JVM-HOTSPOT- AND JVM-JROCKIT- ........................................ 4

1.3 WLS-THREADING- .................................................................................................. 4

1.4 WLS-NETWORKIO- ................................................................................................. 4

1.5 WLS-DATASOURCE- ................................................................................................ 4

1.6 BPEL PM- ............................................................................................................. 5

1.7 TECHNOLOGY ADAPTERS ............................................................................................................................ 5

1.8 BPEL PM COMPOSITE ........................................................................................................................... 5

1.9 DATABASE ............................................................................................................................................... 5

2 DURING LNP TEST ............................................................................................................................5

2.1 CPU AND RAM USAGE AND MEMORY AVAILABILITY ....................................................................................... 5

2.2 JVMS HEAP USAGE, GARBAGE COLLECTION AND ANY MEMORY LEAK IF ANY ..................................................... 12

2.3 DATABASE CONNECTIONS AVAILABILITY AND USAGE ..................................................................................... 12

2.4 AVAILABILITY OF PERIPHERAL RESOURCES LIKE JMS QUEUES, AQ AND THEIR USAGE PATTERN ............................. 12

3 AFTER LNP TEST ............................................................................................................................. 12

3.1 GET FULL DUMP OF FEW IMPORTANT TABLES ............................................................................................... 12

3.2 EXECUTE SQL QUERIES AS AND WHEN REQUIRED ......................................................................................... 13

3.3 ANALYSIS OF AWR REPORT ..................................................................................................................... 19

3.4 ANALYSIS OF SOA SERVER MEMORY USAGE ................................................................................................ 19

4 REFERENCE ................................................................................................................................... 19

Exhibits

Exhibit 73: vmstat ........................................................................................................................... 6

Exhibit 74: free ................................................................................................................................ 7

Exhibit 75: pmap ............................................................................................................................. 7

Exhibit 76: top ................................................................................................................................. 8

Exhibit 77: sar-B .............................................................................................................................. 8

Exhibit 78: meminfo ........................................................................................................................ 9

Exhibit 79: mpstat ......................................................................................................................... 10

Exhibit 80: mpstat 2 4 ................................................................................................................... 10

Exhibit 81: df-m ............................................................................................................................. 11

3

DataCollection

Data collection is very important aspect of any LnP testing effort. If data collection effort is not

directed in right direction, whole of LnP testing effort may be go to drain and may result in false

positive or false negative.

Data collection in any LnP testing is divided into three parts on the basis of time of collection:

Before LnP Testing

During LnP Testing

Post LnP Testing

Data collection needs to happen for all parts of system which will participate in LnP testing.

From BPEL PM perspective, focus is on:

Operating System

JVM

WLS

BPEL PM

Database

But in any enterprise deployment of BPEL, system under test will have more parts like JMS,

boundary systems, OSB, Gateways, load balancers, etc.

1 Before LnP Test

The effort for data collection in case of Before LnP Testing is targeted for understanding the

pre-conditions. One of the best tools to record data is spread sheet. My favorite is MS Excel but

one can use Google Docs Sheet or any other.

Refer PreLnP-Data.xlsx work book template for collecting initial conditions of LnP testing.

PreLnP-Data.xlsx consists of following sheets:

Infrastructure

JVM-HotSpot-

JVM-jRockit-

WLS-Threading-

WLS-NetworkIO-

WLS-DataSource-

BPEL PM-

Technology Adapters

BPEL PM Composite

Database

4

Before peeking into each sheet, let us understand few rules which are followed in this

workbook.

In each sheet only yellow cells need value by a recorder.

Blue cells values are calculated.

In combo box, one needs to select one value.

One need to create copies of sheets which has as suffix in their

name for each managed server and tab color is green.

1.1 Infrastructure

As name suggests this sheet collects information pertaining to infrastructure of LnP

environment. Primarily this sheet lists managed servers and server & operating system level

parameters. This sheet has assumed Linux operating system. If one is running some other

operating system then this sheet needs modifications.

1.2 JVM-HotSpot- and JVM-jRockit-

Out of these two sheets only one should survive in real world because one will be using only

one JVM. If LnP environment is using HotSpot JVM then delete JVM-jRokit-

sheet and vice versa. Now make copy of surviving sheet for each managed

server in LnP environment and update the name of sheets accordingly.

1.3 WLS-Threading-

One should make copy of this sheet for each managed server in LnP environment and update

the name of sheets accordingly. This sheet lists details of WorkManagers. When BPEL PM is

installed, SOAWorkManager Work manager is created by default which can be used to manage

threads for BPEL PM. In the LnP environment, one might have few more Work managers. If yes,

then list details of those work managers as well.

If LnP environment is not using WorkManager then fill up details about Default thread pool.

1.4 WLS-NetworkIO-


the name of sheets accordingly. This sheet lists details of Muxers.

1.5 WLS-DataSource-


the name of sheets accordingly. This sheet lists details of SOADataSource and

SOALocalTxDataSource which are used by BPEL PM. If composites under consideration using

some other data sources as well, list details pertaining to those as well.

5

1.6 BPEL PM-


the name of sheets accordingly. This sheet lists parameters related to BPEL PM Threads, time

outs, Audit and Logging.

1.7 Technology Adapters

This sheet lists parameters related to File & FTP and Database adapter related parameters.

Other technology adapters are covered in BPEL Composite sheet.

1.8 BPEL PM Composite

This sheet lists all composites under LnP Test consideration. This sheet also list parameters

affecting performance at composite level.

1.9 Database

This sheet lists attributes related to databases which host SOAINFRA, MDS and any custom

schemas. One may need to modify this sheet to incorporate additional parameters in case of

custom schemas.

2 During LnP Test

One needs to monitor environment proactively during LnP Testing to find out any unusual

behavior or gradual degradation of system. One should monitor following items during LnP

Testing:

CPU and RAM usage and availability pattern

Operating systems resource availability and usage patterns (like handles to open file,

open sockets, running processes, etc.)

JVMs heap usage, garbage collection and memory leak if any

Database connections availability and usage

Availability and usage pattern of peripheral resources like JMS queues, AQs, databases

This book doesnt assume any specific toolset for monitoring purpose and if tool is must in

particular scenario then FREE is the key word.

2.1 CPU and RAM usage and memory availability

Linux provides few commands which can be utilized to monitor CPU, RAM and Physical memory

usage as whole and specific process.

vmstat reports virtual memory statistics for linux operating system.

6

Exhibit 1: vmstat

Field Description for VM Mode

procs r: The number of processes waiting for run time.

b: The number of processes in uninterruptible sleep. memory swpd: the amount of virtual memory used. free: the amount of idle memory. buff: the amount of memory used as buffers. cache: the amount of memory used as cache. inact: the amount of inactive memory. (-a option) active: the amount of active memory. (-a option) swap si: Amount of memory swapped in from disk (/s). so: Amount of memory swapped to disk (/s).

IO bi: Blocks received from a block device (blocks/s). bo: Blocks sent to a block device (blocks/s). system in: The number of interrupts per second, including the clock. cs: The number of context switches per second.

cpu These are percentages of total CPU time

us: Time spent running non-kernel code. (user time, including nice time)

sy: Time spent running kernel code. (system time)

id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time. wa: Time spent waiting for IO. Prior to Linux 2.5.41, shown as zero.

free reports total amount of free and used physical memory in the system, as well as the

buffers used by the kernel. The shared memory column should be ignored; it is obsolete.

7

Exhibit 2: free

pmap reports memory map of a process or processes.

Exhibit 3: pmap

top reports dynamic real-time view of a running system. It can display system summary information as well as a list of tasks currently being managed by the Linux kernel. The types of

8

system summary information shown and the types, order and size of information displayed for tasks are all user configurable and that configuration can be made persistent across restarts.

Exhibit 4: top

sar B reports statistics on page swapping.

Exhibit 5: sar-B

pgpgin/s: Total number of kilobytes the system paged in from disk per second.

pgpgout/s: Total number of kilobytes the system paged out to disk per second.

fault/s: Number of page faults (major + minor) made by the system per second (post 2.5 kernels only). This is not a count of page faults that generate I/O, because some page faults can be resolved without I/O.

9

majflt/s: Number of major faults the system has made per second, those which have required loading a memory page from disk (post 2.5 kernels only).

cat /proc/meminfo reports memory size and usage.

Exhibit 6: meminfo

mpstat reports CPU related statics. It has option to execute repeatedly with certain time interval.

10

Exhibit 7: mpstat

CPU: Processor number. The keyword all indicates that statistics are calculated as averages

among all processors.

%user: Show the percentage of CPU utilization that occurred while executing at the user level

(application).

%nice: Show the percentage of CPU utilization that occurred while executing at the user level

with nice priority.

%system: Show the percentage of CPU utilization that occurred while executing at the system

level (kernel). Note that this does not include the time spent servicing interrupts or softirqs.

%iowait: Show the percentage of time that the CPU or CPUs were idle during which the system

had an outstanding disk I/O request.

%irq: Show the percentage of time spent by the CPU or CPUs to service interrupts.

%soft: Show the percentage of time spent by the CPU or CPUs to service softirqs. A softirq

(software interrupt) is one of up to 32 enumerated software interrupts which can run on

multiple CPUs at once.

%idle: Show the percentage of time that the CPU or CPUs were idle and the system did not

have an outstanding disk I/O request.

mpstat 2 4 Display four reports of global statistics among all processors at two second intervals.

Exhibit 8: mpstat 2 4

sar u also reports CPU utilization. Its output is similar to mpstat.

ps -eo pcpu,pid,user,args | sort -r -k1 | less will list top 10 CPU users.

11

One should create a script which can be executed during LnP via cron to capture statics

pertaining to cpu, ram and memory. These statics should be stored in a file for later analysis

and should also made available to monitoring team via email or dashboard for on the flay

actions.

Disk space availability is often over looked. Keep an eye on it.

df m

Exhibit 9: df-m

Keep on checking log files for resource issues:

cat /var/log/messages

To list open files use lsof . One should write output to a file using wc to parse it later.

To list running processes use ps. One should write output to a file using wc to parse it later

12

2.2 JVMs heap usage, garbage collection and any memory leak if any

To monitor JVM one can use freely available tools and can avoid huge license cost associated

with proprietary tools. The three favorite tools are jVisualVM, jConsole, and jRockit Mission

Control. For details refer Appendix K.

2.3 Database connections availability and usage

Since BPEL PM uses database extensively, usage monitoring of underlying database is very

important. For details please refer Appendix J.

2.4 Availability of peripheral resources like JMS queues, AQ and their usage

pattern

In any enterprise class deployment, JMS and AQ are very normal. To monitor JMS and AQ refer Appendix J.

3 After LnP Test

After LnP Test, primary source for data collection is SOAINFRA schema. There are two

approaches for that.

Get full dump of few important tables

Execute SQL queries as and when required

3.1 Get full dump of few important tables

This approach is useful when consecutive LnP tests are lined up in quick succession and

operation team need to purge database to make space for latest LnP Test. This approach poses

a inherent risk of losing some important data due to purge. This risk can be mitigated by

archiving data before purging.

In this approach, sometime file size of spreadsheet become unmanageable which forces

breaking of data on the basis of time or composites which makes analysis little bit more

complex.

Table under consideration are:

COMPOSITE_INSTANCE COMPONENT_INSTANCE COMPOSITE_INSTANCE_FAULT REJECTED_MESSAGE REJECTED_MSG_NATIVE_PAYLOAD COMPOSITE_INSTANCE_ASSOC CUBE_INSTANCE AUDIT_TRAIL AUDIT_DETAILS

13

WORK_ITEM DLV_MESSAGE DOCUMENT_DLV_MSG_REF XML_DOCUMENT

3.2 Execute SQL queries as and when required

This approach is preferred because of flexibility of fetching data as and when required for

analysis.

Few of the useful queries are:

To get the count of instances in different state for the BPEL Composites within a time

range

Select composite_name, count(*), DECODE(cube_instance.STATE, 0, 'INITIATED', 1, 'OPEN_RUNNING', 2, 'OPEN_SUSPENDED', 3, 'OPEN_FAULTED', 4, 'CLOSED_PENDING_CANCEL', 5, 'CLOSED_COMPLETED', 6, 'CLOSED_FAULTED', 7, 'CLOSED_CANCELLED', 8, 'CLOSED_ABORTED', 9, 'CLOSED_STALE', 10,'CLOSED_ROLLED_BACK', 'unknown') state from CUBE_INSTANCE where TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI:SS') >= '' and TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI')

14

Select * from dlv_message where dlv_type = 1 and state = 0 and composite_name = ''

o To check for rejected messages for a specific composite

Select count(*) from rejected_message where composite_name = ''

Time taken by each instance of a process to execute within specified duration

Select cikey, conversation_id, parent_id, ecid, state, status, domain_name, composite_name, cmpst_id, TO_CHAR(creation_date,'YYYY-MM-DD HH24:MI:SS') cdate, TO_CHAR(modify_date,'YYYY-MM-DD HH24:MI:SS') mdate, extract (day from (modify_date - creation_date))*24*60*60 + extract (hour from (modify_date - creation_date))*60*60 + extract (minute from (modify_date - creation_date))*60 + extract (second from (modify_date - creation_date)) execution_time from cube_instance where TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI') >= '' and TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI') = '' and TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI') = '' and TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI')

15

where TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI') >= '' and TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI') = '' and TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI') = '' and TO_CHAR(receive_date, 'YYYY-MM-DD HH24:MI') TRUNC(SYSDATE));

List of all BPEL instances, their state, average, minimum, and maximum durations, and

their counts:

select domain_name, component_name, DECODE(state, '0','INITIATED', '1','OPEN_RUNNING', '2','OPEN_SUSPENDED', '3','OPEN_FAULTED', '4','CLOSED_PENDING_CANCEL', '5','CLOSED_COMPLETED', '6','CLOSED_FAULTED', '7','CLOSED_CANCELLED', '8','CLOSED_ABORTED',

16

'9','CLOSED_STALE', '10','CLOSED_ROLLED_BACK') state, TO_CHAR(AVG((TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),12,2))*60*60) + (TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),15,2))*60) + TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),18,4))),'999990.000') AVG, TO_CHAR(MIN((TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),12,2))*60*60) + (TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),15,2))*60) + TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),18,4))),'999990.000') MIN, TO_CHAR(MAX((TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),12,2))*60*60) + (TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),15,2))*60) + TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),18,4))),'999990.000') MAX, COUNT(1) count from cube_instance where TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI:SS') >= '' and TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI:SS') = '' and TO_CHAR(creation_date, ' YYYY-MM-DD HH24:MI:SS') '' and composite_name IN ('') order by modify_date DESC

To get the information on currently running processes - Shortest and Longest running

instances

select * from ( select composite_name AS "ProcessName", TO_CHAR(MIN(creation_date),'YYYY-MM-DD HH:MI') AS "EarliestDate", COUNT(*) AS "TotalRunningProcesses", TO_NUMBER(SUBSTR(MIN(sysdate-creation_date), 1, INSTR(MIN(sysdate-creation_date), ' '))) AS "ShortestRunning (Days)", SUBSTR(MIN(sysdate-creation_date), INSTR(min(sysdate-creation_date),' ')+1,8) AS "ShortestRunning (Hours)", TO_NUMBER(SUBSTR(MAX(sysdate-creation_date), 1, INSTR(MAX(sysdate-creation_date), ' '))) AS "LongestRunning (Days)", SUBSTR(max(sysdate-creation_date),

17

INSTR(MAX(sysdate-creation_date),' ')+1,8) AS "LongestRunning (Hours)" from cube_instance where state = 1 group by composite_name ORDER BY "EarliestDate" DESC)

To find a string in message

select BLOBCONVERTER(DOCUMENT) , ci.cikey, ci.state, ci.status from cube_instance ci, dlv_message dlvm, document_dlv_msg_ref dlvmr, xml_document xmld where ci.cikey = dlvm.cikey and dlvm.message_guid = dlvmr.message_guid and dlvmr.document_id = xmld.document_id and Modify_Date between to_date ('', ' YYYY-MM-DD HH24:MI:SS') and to_date ('', ' YYYY-MM-DD HH24:MI:SS ') and BLOBCONVERTER(DOCUMENT) like '% %'

To find number of composites took more than n, m, p seconds, average, min and max

time

DECLARE min_time float; max_time float; avg_time FLOAT; count_n NUMBER; count_m NUMBER; count_p NUMBER;

input_date VARCHAR2(20) := ''; input_date1 VARCHAR2(20) := '';

str VARCHAR2(1000);

CURSOR c1 IS select DISTINCT composite_name from cube_instance where To_Date(To_char(creation_date,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') between to_date(input_date,'DD-MM-YYYY HH24:MI:SS') and to_date(input_date1,'DD-MM-YYYY HH24:MI:SS');

BEGIN

Dbms_Output.put_line('COMPOSITE NAME'||','||'MIN TIME'||','||'MAX TIME'||','||'AVG TIME'||','||' > n SECONDS'||','||' > m SECONDS'||','||' > p SECONDS');

FOR i IN c1 LOOP

select Min(To_Number(substr(to_char(modify_date - creation_date),18,(length(to_char(modify_date - creation_date)))))) into min_time from cube_instance where composite_name = i.composite_name and To_Date(To_char(creation_date,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') between to_date(input_date,'DD-MM-YYYY HH24:MI:SS') and to_date(input_date1,'DD-MM-YYYY HH24:MI:SS') order by modify_date-creation_date;

18

select Max(To_Number(substr(to_char(modify_date - creation_date),18,(length(to_char(modify_date - creation_date)))))) into max_time from cube_instance where composite_name =i.composite_name and To_Date(To_char(creation_date,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') between to_date(input_date,'DD-MM-YYYY HH24:MI:SS') and to_date(input_date1,'DD-MM-YYYY HH24:MI:SS') order by modify_date-creation_date;

select Avg(To_Number(substr(to_char(modify_date - creation_date),18,(length(to_char(modify_date - creation_date)))))) into avg_time from cube_instance where composite_name =i.composite_name and To_Date(To_char(creation_date,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') between to_date(input_date,'DD-MM-YYYY HH24:MI:SS') and to_date(input_date1,'DD-MM-YYYY HH24:MI:SS') order by modify_date-creation_date;

select Count(*) into count_n from cube_instance where composite_name=i.composite_name and To_Number(substr(to_char(modify_date - creation_date),18,(length(to_char(modify_date - creation_date))))) > n and To_Date(To_char(creation_date,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') between to_date(input_date,'DD-MM-YYYY HH24:MI:SS') and to_date(input_date1,'DD-MM-YYYY HH24:MI:SS');

select Count(*) into count_m from cube_instance where composite_name=i.composite_name and To_Number(substr(to_char(modify_date-creation_date),18,(length(to_char(modify_date-creation_date))))) > m

and To_Date(To_char(creation_date,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') between to_date(input_date,'DD-MM-YYYY HH24:MI:SS') and to_date(input_date1,'DD-MM-YYYY HH24:MI:SS');

select Count(*) into count_p from cube_instance where composite_name=i.composite_name and To_Number(substr(to_char(modify_date-creation_date),18,(length(to_char(modify_date-creation_date))))) > p

and To_Date(To_char(creation_date,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') between to_date(input_date,'DD-MM-YYYY HH24:MI:SS') and to_date(input_date1,'DD-MM-YYYY HH24:MI:SS');

Dbms_Output.put_line(i.composite_name||','||min_time||','||max_time||','||avg_time||','||count_n ||','||count_m ||','||count_p);

END LOOP; END;

19

3.3 Analysis of AWR Report

Analyze AWR report to get information on bottlenecks on database queries and objects. To get

better understanding on AWR report, refer Appendix H.

3.4 Analysis of SOA server memory usage

Refer Appendix J

4 Reference

Linux man pages: http://www.linuxmanpages.com

Visual VM: http://visualvm.java.net

jVisualVM Documentation:

http://docs.oracle.com/javase/6/docs/technotes/tools/share/jvisualvm.html

jConsole Documentation:

http://docs.oracle.com/javase/6/docs/technotes/guides/management/jconsole.html

Monitor jRockit using jRockit Mission Control:

http://docs.oracle.com/cd/E13222_01/wls/docs90/ConsoleHelp/taskhelp/monitoring/

MonitorTheJRockitVirtualMachine.html

jRockit documentation:

http://www.oracle.com/technetwork/middleware/jrockit/overview/missioncontrol-

whitepaper-june08-1-130357.pdf

bpel pm 11g performance tuning-7

Documents

ram usage

usage pattern

memory availability

lnp test

jvms heap usage

memory leak

bpel pm composite

garbage collection