bpel pm 11g performance tuning-7
Post on 15-Oct-2015
402 Views
Preview:
DESCRIPTION
TRANSCRIPT
-
1
-
2
Contents
DATA COLLECTION .............................................................................................................................3
1 BEFORE LNP TEST .............................................................................................................................3
1.1 INFRASTRUCTURE ..................................................................................................................................... 4
1.2 JVM-HOTSPOT- AND JVM-JROCKIT- ........................................ 4
1.3 WLS-THREADING- .................................................................................................. 4
1.4 WLS-NETWORKIO- ................................................................................................. 4
1.5 WLS-DATASOURCE- ................................................................................................ 4
1.6 BPEL PM- ............................................................................................................. 5
1.7 TECHNOLOGY ADAPTERS ............................................................................................................................ 5
1.8 BPEL PM COMPOSITE ........................................................................................................................... 5
1.9 DATABASE ............................................................................................................................................... 5
2 DURING LNP TEST ............................................................................................................................5
2.1 CPU AND RAM USAGE AND MEMORY AVAILABILITY ....................................................................................... 5
2.2 JVMS HEAP USAGE, GARBAGE COLLECTION AND ANY MEMORY LEAK IF ANY ..................................................... 12
2.3 DATABASE CONNECTIONS AVAILABILITY AND USAGE ..................................................................................... 12
2.4 AVAILABILITY OF PERIPHERAL RESOURCES LIKE JMS QUEUES, AQ AND THEIR USAGE PATTERN ............................. 12
3 AFTER LNP TEST ............................................................................................................................. 12
3.1 GET FULL DUMP OF FEW IMPORTANT TABLES ............................................................................................... 12
3.2 EXECUTE SQL QUERIES AS AND WHEN REQUIRED ......................................................................................... 13
3.3 ANALYSIS OF AWR REPORT ..................................................................................................................... 19
3.4 ANALYSIS OF SOA SERVER MEMORY USAGE ................................................................................................ 19
4 REFERENCE ................................................................................................................................... 19
Exhibits
Exhibit 73: vmstat ........................................................................................................................... 6
Exhibit 74: free ................................................................................................................................ 7
Exhibit 75: pmap ............................................................................................................................. 7
Exhibit 76: top ................................................................................................................................. 8
Exhibit 77: sar-B .............................................................................................................................. 8
Exhibit 78: meminfo ........................................................................................................................ 9
Exhibit 79: mpstat ......................................................................................................................... 10
Exhibit 80: mpstat 2 4 ................................................................................................................... 10
Exhibit 81: df-m ............................................................................................................................. 11
-
3
DataCollection
Data collection is very important aspect of any LnP testing effort. If data collection effort is not
directed in right direction, whole of LnP testing effort may be go to drain and may result in false
positive or false negative.
Data collection in any LnP testing is divided into three parts on the basis of time of collection:
Before LnP Testing
During LnP Testing
Post LnP Testing
Data collection needs to happen for all parts of system which will participate in LnP testing.
From BPEL PM perspective, focus is on:
Operating System
JVM
WLS
BPEL PM
Database
But in any enterprise deployment of BPEL, system under test will have more parts like JMS,
boundary systems, OSB, Gateways, load balancers, etc.
1 Before LnP Test
The effort for data collection in case of Before LnP Testing is targeted for understanding the
pre-conditions. One of the best tools to record data is spread sheet. My favorite is MS Excel but
one can use Google Docs Sheet or any other.
Refer PreLnP-Data.xlsx work book template for collecting initial conditions of LnP testing.
PreLnP-Data.xlsx consists of following sheets:
Infrastructure
JVM-HotSpot-
JVM-jRockit-
WLS-Threading-
WLS-NetworkIO-
WLS-DataSource-
BPEL PM-
Technology Adapters
BPEL PM Composite
Database
-
4
Before peeking into each sheet, let us understand few rules which are followed in this
workbook.
In each sheet only yellow cells need value by a recorder.
Blue cells values are calculated.
In combo box, one needs to select one value.
One need to create copies of sheets which has as suffix in their
name for each managed server and tab color is green.
1.1 Infrastructure
As name suggests this sheet collects information pertaining to infrastructure of LnP
environment. Primarily this sheet lists managed servers and server & operating system level
parameters. This sheet has assumed Linux operating system. If one is running some other
operating system then this sheet needs modifications.
1.2 JVM-HotSpot- and JVM-jRockit-
Out of these two sheets only one should survive in real world because one will be using only
one JVM. If LnP environment is using HotSpot JVM then delete JVM-jRokit-
sheet and vice versa. Now make copy of surviving sheet for each managed
server in LnP environment and update the name of sheets accordingly.
1.3 WLS-Threading-
One should make copy of this sheet for each managed server in LnP environment and update
the name of sheets accordingly. This sheet lists details of WorkManagers. When BPEL PM is
installed, SOAWorkManager Work manager is created by default which can be used to manage
threads for BPEL PM. In the LnP environment, one might have few more Work managers. If yes,
then list details of those work managers as well.
If LnP environment is not using WorkManager then fill up details about Default thread pool.
1.4 WLS-NetworkIO-
One should make copy of this sheet for each managed server in LnP environment and update
the name of sheets accordingly. This sheet lists details of Muxers.
1.5 WLS-DataSource-
One should make copy of this sheet for each managed server in LnP environment and update
the name of sheets accordingly. This sheet lists details of SOADataSource and
SOALocalTxDataSource which are used by BPEL PM. If composites under consideration using
some other data sources as well, list details pertaining to those as well.
-
5
1.6 BPEL PM-
One should make copy of this sheet for each managed server in LnP environment and update
the name of sheets accordingly. This sheet lists parameters related to BPEL PM Threads, time
outs, Audit and Logging.
1.7 Technology Adapters
This sheet lists parameters related to File & FTP and Database adapter related parameters.
Other technology adapters are covered in BPEL Composite sheet.
1.8 BPEL PM Composite
This sheet lists all composites under LnP Test consideration. This sheet also list parameters
affecting performance at composite level.
1.9 Database
This sheet lists attributes related to databases which host SOAINFRA, MDS and any custom
schemas. One may need to modify this sheet to incorporate additional parameters in case of
custom schemas.
2 During LnP Test
One needs to monitor environment proactively during LnP Testing to find out any unusual
behavior or gradual degradation of system. One should monitor following items during LnP
Testing:
CPU and RAM usage and availability pattern
Operating systems resource availability and usage patterns (like handles to open file,
open sockets, running processes, etc.)
JVMs heap usage, garbage collection and memory leak if any
Database connections availability and usage
Availability and usage pattern of peripheral resources like JMS queues, AQs, databases
This book doesnt assume any specific toolset for monitoring purpose and if tool is must in
particular scenario then FREE is the key word.
2.1 CPU and RAM usage and memory availability
Linux provides few commands which can be utilized to monitor CPU, RAM and Physical memory
usage as whole and specific process.
vmstat reports virtual memory statistics for linux operating system.
-
6
Exhibit 1: vmstat
Field Description for VM Mode
procs r: The number of processes waiting for run time.
b: The number of processes in uninterruptible sleep. memory swpd: the amount of virtual memory used. free: the amount of idle memory. buff: the amount of memory used as buffers. cache: the amount of memory used as cache. inact: the amount of inactive memory. (-a option) active: the amount of active memory. (-a option) swap si: Amount of memory swapped in from disk (/s). so: Amount of memory swapped to disk (/s).
IO bi: Blocks received from a block device (blocks/s). bo: Blocks sent to a block device (blocks/s). system in: The number of interrupts per second, including the clock. cs: The number of context switches per second.
cpu These are percentages of total CPU time
us: Time spent running non-kernel code. (user time, including nice time)
sy: Time spent running kernel code. (system time)
id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time. wa: Time spent waiting for IO. Prior to Linux 2.5.41, shown as zero.
free reports total amount of free and used physical memory in the system, as well as the
buffers used by the kernel. The shared memory column should be ignored; it is obsolete.
-
7
Exhibit 2: free
pmap reports memory map of a process or processes.
Exhibit 3: pmap
top reports dynamic real-time view of a running system. It can display system summary information as well as a list of tasks currently being managed by the Linux kernel. The types of
-
8
system summary information shown and the types, order and size of information displayed for tasks are all user configurable and that configuration can be made persistent across restarts.
Exhibit 4: top
sar B reports statistics on page swapping.
Exhibit 5: sar-B
pgpgin/s: Total number of kilobytes the system paged in from disk per second.
pgpgout/s: Total number of kilobytes the system paged out to disk per second.
fault/s: Number of page faults (major + minor) made by the system per second (post 2.5 kernels only). This is not a count of page faults that generate I/O, because some page faults can be resolved without I/O.
-
9
majflt/s: Number of major faults the system has made per second, those which have required loading a memory page from disk (post 2.5 kernels only).
cat /proc/meminfo reports memory size and usage.
Exhibit 6: meminfo
mpstat reports CPU related statics. It has option to execute repeatedly with certain time interval.
-
10
Exhibit 7: mpstat
CPU: Processor number. The keyword all indicates that statistics are calculated as averages
among all processors.
%user: Show the percentage of CPU utilization that occurred while executing at the user level
(application).
%nice: Show the percentage of CPU utilization that occurred while executing at the user level
with nice priority.
%system: Show the percentage of CPU utilization that occurred while executing at the system
level (kernel). Note that this does not include the time spent servicing interrupts or softirqs.
%iowait: Show the percentage of time that the CPU or CPUs were idle during which the system
had an outstanding disk I/O request.
%irq: Show the percentage of time spent by the CPU or CPUs to service interrupts.
%soft: Show the percentage of time spent by the CPU or CPUs to service softirqs. A softirq
(software interrupt) is one of up to 32 enumerated software interrupts which can run on
multiple CPUs at once.
%idle: Show the percentage of time that the CPU or CPUs were idle and the system did not
have an outstanding disk I/O request.
mpstat 2 4 Display four reports of global statistics among all processors at two second intervals.
Exhibit 8: mpstat 2 4
sar u also reports CPU utilization. Its output is similar to mpstat.
ps -eo pcpu,pid,user,args | sort -r -k1 | less will list top 10 CPU users.
-
11
One should create a script which can be executed during LnP via cron to capture statics
pertaining to cpu, ram and memory. These statics should be stored in a file for later analysis
and should also made available to monitoring team via email or dashboard for on the flay
actions.
Disk space availability is often over looked. Keep an eye on it.
df m
Exhibit 9: df-m
Keep on checking log files for resource issues:
cat /var/log/messages
To list open files use lsof . One should write output to a file using wc to parse it later.
To list running processes use ps. One should write output to a file using wc to parse it later
-
12
2.2 JVMs heap usage, garbage collection and any memory leak if any
To monitor JVM one can use freely available tools and can avoid huge license cost associated
with proprietary tools. The three favorite tools are jVisualVM, jConsole, and jRockit Mission
Control. For details refer Appendix K.
2.3 Database connections availability and usage
Since BPEL PM uses database extensively, usage monitoring of underlying database is very
important. For details please refer Appendix J.
2.4 Availability of peripheral resources like JMS queues, AQ and their usage
pattern
In any enterprise class deployment, JMS and AQ are very normal. To monitor JMS and AQ refer Appendix J.
3 After LnP Test
After LnP Test, primary source for data collection is SOAINFRA schema. There are two
approaches for that.
Get full dump of few important tables
Execute SQL queries as and when required
3.1 Get full dump of few important tables
This approach is useful when consecutive LnP tests are lined up in quick succession and
operation team need to purge database to make space for latest LnP Test. This approach poses
a inherent risk of losing some important data due to purge. This risk can be mitigated by
archiving data before purging.
In this approach, sometime file size of spreadsheet become unmanageable which forces
breaking of data on the basis of time or composites which makes analysis little bit more
complex.
Table under consideration are:
COMPOSITE_INSTANCE COMPONENT_INSTANCE COMPOSITE_INSTANCE_FAULT REJECTED_MESSAGE REJECTED_MSG_NATIVE_PAYLOAD COMPOSITE_INSTANCE_ASSOC CUBE_INSTANCE AUDIT_TRAIL AUDIT_DETAILS
-
13
WORK_ITEM DLV_MESSAGE DOCUMENT_DLV_MSG_REF XML_DOCUMENT
3.2 Execute SQL queries as and when required
This approach is preferred because of flexibility of fetching data as and when required for
analysis.
Few of the useful queries are:
To get the count of instances in different state for the BPEL Composites within a time
range
Select composite_name, count(*), DECODE(cube_instance.STATE, 0, 'INITIATED', 1, 'OPEN_RUNNING', 2, 'OPEN_SUSPENDED', 3, 'OPEN_FAULTED', 4, 'CLOSED_PENDING_CANCEL', 5, 'CLOSED_COMPLETED', 6, 'CLOSED_FAULTED', 7, 'CLOSED_CANCELLED', 8, 'CLOSED_ABORTED', 9, 'CLOSED_STALE', 10,'CLOSED_ROLLED_BACK', 'unknown') state from CUBE_INSTANCE where TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI:SS') >= '' and TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI')
-
14
Select * from dlv_message where dlv_type = 1 and state = 0 and composite_name = ''
o To check for rejected messages for a specific composite
Select count(*) from rejected_message where composite_name = ''
Time taken by each instance of a process to execute within specified duration
Select cikey, conversation_id, parent_id, ecid, state, status, domain_name, composite_name, cmpst_id, TO_CHAR(creation_date,'YYYY-MM-DD HH24:MI:SS') cdate, TO_CHAR(modify_date,'YYYY-MM-DD HH24:MI:SS') mdate, extract (day from (modify_date - creation_date))*24*60*60 + extract (hour from (modify_date - creation_date))*60*60 + extract (minute from (modify_date - creation_date))*60 + extract (second from (modify_date - creation_date)) execution_time from cube_instance where TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI') >= '' and TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI') = '' and TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI') = '' and TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI')
-
15
where TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI') >= '' and TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI') = '' and TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI') = '' and TO_CHAR(receive_date, 'YYYY-MM-DD HH24:MI') TRUNC(SYSDATE));
List of all BPEL instances, their state, average, minimum, and maximum durations, and
their counts:
select domain_name, component_name, DECODE(state, '0','INITIATED', '1','OPEN_RUNNING', '2','OPEN_SUSPENDED', '3','OPEN_FAULTED', '4','CLOSED_PENDING_CANCEL', '5','CLOSED_COMPLETED', '6','CLOSED_FAULTED', '7','CLOSED_CANCELLED', '8','CLOSED_ABORTED',
-
16
'9','CLOSED_STALE', '10','CLOSED_ROLLED_BACK') state, TO_CHAR(AVG((TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),12,2))*60*60) + (TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),15,2))*60) + TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),18,4))),'999990.000') AVG, TO_CHAR(MIN((TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),12,2))*60*60) + (TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),15,2))*60) + TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),18,4))),'999990.000') MIN, TO_CHAR(MAX((TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),12,2))*60*60) + (TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),15,2))*60) + TO_NUMBER(SUBSTR(TO_CHAR(modify_date creation_date),18,4))),'999990.000') MAX, COUNT(1) count from cube_instance where TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI:SS') >= '' and TO_CHAR(creation_date, 'YYYY-MM-DD HH24:MI:SS') = '' and TO_CHAR(creation_date, ' YYYY-MM-DD HH24:MI:SS') '' and composite_name IN ('') order by modify_date DESC
To get the information on currently running processes - Shortest and Longest running
instances
select * from ( select composite_name AS "ProcessName", TO_CHAR(MIN(creation_date),'YYYY-MM-DD HH:MI') AS "EarliestDate", COUNT(*) AS "TotalRunningProcesses", TO_NUMBER(SUBSTR(MIN(sysdate-creation_date), 1, INSTR(MIN(sysdate-creation_date), ' '))) AS "ShortestRunning (Days)", SUBSTR(MIN(sysdate-creation_date), INSTR(min(sysdate-creation_date),' ')+1,8) AS "ShortestRunning (Hours)", TO_NUMBER(SUBSTR(MAX(sysdate-creation_date), 1, INSTR(MAX(sysdate-creation_date), ' '))) AS "LongestRunning (Days)", SUBSTR(max(sysdate-creation_date),
-
17
INSTR(MAX(sysdate-creation_date),' ')+1,8) AS "LongestRunning (Hours)" from cube_instance where state = 1 group by composite_name ORDER BY "EarliestDate" DESC)
To find a string in message
select BLOBCONVERTER(DOCUMENT) , ci.cikey, ci.state, ci.status from cube_instance ci, dlv_message dlvm, document_dlv_msg_ref dlvmr, xml_document xmld where ci.cikey = dlvm.cikey and dlvm.message_guid = dlvmr.message_guid and dlvmr.document_id = xmld.document_id and Modify_Date between to_date ('', ' YYYY-MM-DD HH24:MI:SS') and to_date ('', ' YYYY-MM-DD HH24:MI:SS ') and BLOBCONVERTER(DOCUMENT) like '% %'
To find number of composites took more than n, m, p seconds, average, min and max
time
DECLARE min_time float; max_time float; avg_time FLOAT; count_n NUMBER; count_m NUMBER; count_p NUMBER;
input_date VARCHAR2(20) := ''; input_date1 VARCHAR2(20) := '';
str VARCHAR2(1000);
CURSOR c1 IS select DISTINCT composite_name from cube_instance where To_Date(To_char(creation_date,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') between to_date(input_date,'DD-MM-YYYY HH24:MI:SS') and to_date(input_date1,'DD-MM-YYYY HH24:MI:SS');
BEGIN
Dbms_Output.put_line('COMPOSITE NAME'||','||'MIN TIME'||','||'MAX TIME'||','||'AVG TIME'||','||' > n SECONDS'||','||' > m SECONDS'||','||' > p SECONDS');
FOR i IN c1 LOOP
select Min(To_Number(substr(to_char(modify_date - creation_date),18,(length(to_char(modify_date - creation_date)))))) into min_time from cube_instance where composite_name = i.composite_name and To_Date(To_char(creation_date,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') between to_date(input_date,'DD-MM-YYYY HH24:MI:SS') and to_date(input_date1,'DD-MM-YYYY HH24:MI:SS') order by modify_date-creation_date;
-
18
select Max(To_Number(substr(to_char(modify_date - creation_date),18,(length(to_char(modify_date - creation_date)))))) into max_time from cube_instance where composite_name =i.composite_name and To_Date(To_char(creation_date,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') between to_date(input_date,'DD-MM-YYYY HH24:MI:SS') and to_date(input_date1,'DD-MM-YYYY HH24:MI:SS') order by modify_date-creation_date;
select Avg(To_Number(substr(to_char(modify_date - creation_date),18,(length(to_char(modify_date - creation_date)))))) into avg_time from cube_instance where composite_name =i.composite_name and To_Date(To_char(creation_date,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') between to_date(input_date,'DD-MM-YYYY HH24:MI:SS') and to_date(input_date1,'DD-MM-YYYY HH24:MI:SS') order by modify_date-creation_date;
select Count(*) into count_n from cube_instance where composite_name=i.composite_name and To_Number(substr(to_char(modify_date - creation_date),18,(length(to_char(modify_date - creation_date))))) > n and To_Date(To_char(creation_date,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') between to_date(input_date,'DD-MM-YYYY HH24:MI:SS') and to_date(input_date1,'DD-MM-YYYY HH24:MI:SS');
select Count(*) into count_m from cube_instance where composite_name=i.composite_name and To_Number(substr(to_char(modify_date-creation_date),18,(length(to_char(modify_date-creation_date))))) > m
and To_Date(To_char(creation_date,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') between to_date(input_date,'DD-MM-YYYY HH24:MI:SS') and to_date(input_date1,'DD-MM-YYYY HH24:MI:SS');
select Count(*) into count_p from cube_instance where composite_name=i.composite_name and To_Number(substr(to_char(modify_date-creation_date),18,(length(to_char(modify_date-creation_date))))) > p
and To_Date(To_char(creation_date,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') between to_date(input_date,'DD-MM-YYYY HH24:MI:SS') and to_date(input_date1,'DD-MM-YYYY HH24:MI:SS');
Dbms_Output.put_line(i.composite_name||','||min_time||','||max_time||','||avg_time||','||count_n ||','||count_m ||','||count_p);
END LOOP; END;
-
19
3.3 Analysis of AWR Report
Analyze AWR report to get information on bottlenecks on database queries and objects. To get
better understanding on AWR report, refer Appendix H.
3.4 Analysis of SOA server memory usage
Refer Appendix J
4 Reference
Linux man pages: http://www.linuxmanpages.com
Visual VM: http://visualvm.java.net
jVisualVM Documentation:
http://docs.oracle.com/javase/6/docs/technotes/tools/share/jvisualvm.html
jConsole Documentation:
http://docs.oracle.com/javase/6/docs/technotes/guides/management/jconsole.html
Monitor jRockit using jRockit Mission Control:
http://docs.oracle.com/cd/E13222_01/wls/docs90/ConsoleHelp/taskhelp/monitoring/
MonitorTheJRockitVirtualMachine.html
jRockit documentation:
http://www.oracle.com/technetwork/middleware/jrockit/overview/missioncontrol-
whitepaper-june08-1-130357.pdf
top related