practical advice and guidance on the use of peopletools performance monitor david kurtz go-faster...
TRANSCRIPT
Practical Advice and Guidance on the use of PeopleTools Performance
Monitor
David KurtzGo-Faster Consultancy Ltd.
www.go-faster.co.uk
Practical PPM ©2009 www.go-faster.co.uk 2
Who Am I?
• Oracle Database Specialist– Independent consultant
• System Performance tuning– PeopleSoft ERP– Oracle RDBMS
• Book– www.psftdba.com
• UKOUG– PeopleSoft Technology SIG
Committee•
Practical PPM ©2009 www.go-faster.co.uk 3
Resources
• If you can’t hear me say so now.
• Please feel free to ask questions as we go along.
• The presentation is available from• UKOUG Conference Website
• www.go-faster.co.uk
• blog.psftdba.com
Practical PPM ©2009 www.go-faster.co.uk 4
Using PeopleSoft Performance Monitor
• Very Quick Overview• Performance Tuning the Performance Monitor• PPM Bugs & Fixes• General Analyses
– Events
– Components
– Network Latency
• Performance Trace Demo
Practical PPM ©2009 www.go-faster.co.uk 5
Think!
• The way you think about performance is more important than the tools you use.– Mostly, you need to focus on time.
• Further reading:– The Goal – Eli Goldratt
– Optimising Oracle Performance - Millsap & Holt• www.method-r.com,
• http://oreilly.com/catalog/9780596005276/preview.html
Practical PPM ©2009 www.go-faster.co.uk 6
PPM – Key Features
• PeopleTools is used to Monitor PeopleTools– You don’t have to buy any more software– Monitoring system –v- Monitored systems
• Can and should use one monitor for many monitored
• Eg. Consolidated Portal/Application
– Any version >= 8.44 can monitor any version• When it works!
Practical PPM ©2009 www.go-faster.co.uk 7
WebServer
WebServer
Application Server
(application logic)
Application Server
(application logic)
APPQ PSAPPSRV DBMS
(application data & meta-data
DBMS
(application data & meta-data
SQLTuxedo
Messagehttp / https
Browser
(presentation & JavaScript)
Browser
(presentation & JavaScript)
Monitoring System
PIAServlet
PSPPMSRV
Monitored System
Web Server
(presentation logic)
Web Server
(presentation logic)
PIAServlet
DBMS
(application data & meta-data
DBMS
(application data & meta-data
Tuxedo Message
http / https
Browser
(presentation & JavaScript)
Browser
(presentation & JavaScript)
ScreenPaint
JavaScript
Application Server
(application logic)
Application Server
(application logic)
APPQ PSAPPSRV
PSMONITORSRV
SQL
PPMIServlet
MonitorServlet
Practical PPM ©2009 www.go-faster.co.uk 8
Performance Monitor Metrics
• Transactions– User activities in PIA that cause communications with
application server
– Sampled
– Enabled to form a trace
• Events– Periodic samples
– Usually initiated by monitoring agents
– eg. CPU, Tuxedo counters
Practical PPM ©2009 www.go-faster.co.uk 9
Performance Monitor Transactions
• User activity in PIA• Performance Monitoring
Unit– Hierarchy of transactions
• Similar to Oracle event 10046 trace– recursive actions
Practical PPM ©2009 www.go-faster.co.uk 10
Transactions
• Stored to PSPMTRANSCURR table– As PMUs are closed moved to PSPMTRANSHIST– Later deleted or archived to PSPMTRANSARCH
• ERD downloadable from Metalink– And you will need to get to grips with it.
Practical PPM ©2009 www.go-faster.co.uk 11
ERD of TransactionPSTRANSHIST
(C)
PSPMSYSDEFN(B)
PSPMAGENT(A)
PSPMCONTEXTDEFN(C1)
PSPMMETRICDEFN(M1)
PSPMCONTEXTDEFN(C2)
PSPMCONTEXTDEFN(C3)
PSPMMETRICDEFN(M2)
PSPMMETRICDEFN(M3)
PSPMMETRICDEFN(M4)
PSPMMETRICDEFN(M5)
PSPMMETRICDEFN(M6)
PSPMMETRICDEFN(M7)
PM_CONTEXTID_n
(n=1-3)
PM_SYSTEMID
PM_TRANS_DEFN_SET, PM_TRANS_DEFN_ID
PM_AGENTID
PM_METRICIDn
(n=1-7)
search criteria
PSPMTRANSDEFN(E)
Practical PPM ©2009 www.go-faster.co.uk 12
Metrics
• Metric IDs specified on transaction definition PSPMTRANSDEFN– Metrics Types defined on PSPMMETRICDEFN
• Type 1: Counters (including timers)– Metric 4: Total Servlet Request time (ms)
• Type 2: Gauges– Metric 102: %CPU Used
• Type 3: Numeric Identifier– Metric 20: HTTP response code
• Type 4: String Identifier– Metric 27: File Name
Practical PPM ©2009 www.go-faster.co.uk 13
Transaction 101
• Reported at entry and exit of PIA servlet – Context 1
Action=View Page
– Context 2IP Address=10.0.0.3
– Context 3Session ID=AN7tpzSwpZc4kt9k8 . . .
– Additional Descriptionhttp://go-faster-3:7201/psc/ps/EMPLOYEE/HRMS/c/UTILITIES.PTPERF_TEST.GBL
Practical PPM ©2009 www.go-faster.co.uk 14
Transaction 101
• 4 metrics– Metric 19: Response Size (bytes)
=17613
– Metric 20: Response Code =200
– Metric 22: Static Content Count =0
– Metric 23: Is this a Pagelet? =0
Practical PPM ©2009 www.go-faster.co.uk 15
Transaction Query ResultsPM_TOP_INST_ID PM_INSTANCE_ID PM_PARENT_INST_ID DBNAMEPM_HOST_PORTPM_DOMAIN_NAME PM_AGENT_TYPEPM_INSTANCE PM_AGENT_STRT_DTTM PM_MON_STRT_DTTMOPRID PM_PERF_TRACE PM_PROCESS_IDPM_TRANS_DEFN_ID DESCR60'CONTEXT1:'||C.PM_CONTEXTID_1||'-'||C1.PM_CONTEXT_LABEL||'='||C.PM_CONTEXT_VALUE …PM_TRANS_DURATION'METRIC1:'||M1.PM_METRICLABEL||'='||C.PM_METRIC_VALUE1 …PM_ADDTNL_DESCR-------------------------------------------------------------------------------- 824633721163 824633721163 0 HR88go-faster-3:7201:7202ps WEBSERVER-1 16:12:07 14.06.2004 16:12:09 14.06.2004PS PS: 2004-06-14 16:01:11 0 101 Reported at entry and exit of PIA servletContext1:3-Session ID=AN7tpzSwpZc4kt9k8QNaCcYUWWh9FaFt!1963244185!1087224685145Context2:2-IP Address=10.0.0.3Context3:1-Action=View Page 1322Metric1:Response Size (bytes)=17613Metric2:Response Code=200Metric3:Static Content Count=0Metric4:Is this a Pagelet?=0Metric5:=0Metric6:=0Metric7:=http://go-faster-3:7201/psc/ps/EMPLOYEE/HRMS/c/UTILITIES.PTPERF_TEST.GBL
Practical PPM ©2009 www.go-faster.co.uk 16
Events
• Do not have an explicit context– Collecting agent provide context
• Stored in PSPMEVENTHIST– Later deleted or archived to PSPMEVETARCH
Practical PPM ©2009 www.go-faster.co.uk 17
ERD of EventsPSEVENTHIST
(C)PSPMEVENTDEFN
(E)
PSPMSYSDEFN(B)
PSPMAGENT(A)
PSPMMETRICDEFN(M1)
PSPMMETRICDEFN(M2)
PSPMMETRICDEFN(M3)
PSPMMETRICDEFN(M4)
PSPMMETRICDEFN(M5)
PSPMMETRICDEFN(M6)
PSPMMETRICDEFN(M7)
PM_AGENTIDPM_EVENT_DEFN_SET, PM_EVENT_DEFN_ID
PM_SYSTEMIDsearch criteria
PM_METRICIDn
(n=1-7)
Practical PPM ©2009 www.go-faster.co.uk 18
Event Query ResultsDBNAME PM_HOST_PORTPM_AGENT_TYPE PM_DOMAIN_NAMEPM_INSTANCE PM_AGENT_DTTM PM_INSTANCE_IDPM_EVENT_DEFN_ID DESCR60'METRIC1:'||M1.PM_METRICLABEL||'='||C.PM_METRIC_VALUE1'METRIC2:'||M2.PM_METRICLABEL||'='||C.PM_METRIC_VALUE2'METRIC3:'||M3.PM_METRICLABEL||'='||C.PM_METRIC_VALUE3'METRIC4:'||M4.PM_METRICLABEL||'='||C.PM_METRIC_VALUE4'METRIC5:'||M5.PM_METRICLABEL||'='||C.PM_METRIC_VALUE5'METRIC6:'||M6.PM_METRICLABEL||'='||C.PM_METRIC_VALUE6'METRIC7:'||M7.PM_METRICLABEL||'='||C.PM_METRIC_VALUE7PM_ADDTNL_DESCR--------------------------------------------------------------------------------HR88 go-faster-3:7201:7202WEBSERVER ps-1 16:12:08 14.06.2004 824633721166 600 PSPING metrics fowarded from browserMetric1:Network Latency (ms)=435Metric2:WebServer Latency (ms)=100Metric3:AppServer Latency (ms)=561Metric4:DB Latency (millisecs)=451Metric5:=0Metric6:=0Metric7:IP Address=10.0.0.3PS;AN7tpzSwpZc4kt9k8QNaCcYUWWh9FaFt!1963244185!1087224685145
Practical PPM ©2009 www.go-faster.co.uk 19
Tuning Performance Monitor
• Some of the delivered analytics do not perform well with even moderate data volumes– Set up the monitoring system to self monitor– Then you can generate PPM traces on the analytics– You will need additional indexes
• http://blog.psftdba.com/2006/04/performance-tuning-performance-monitor.html (YMMV)
Practical PPM ©2009 www.go-faster.co.uk 20
Purge Process
• Data normally held in history tables– PSPMTRANSHIST, PSPMEVENTHIST
• Clone tables– PSPMTRANSHISTCL, PSPMEVENTHISTCL
• PPM writes to tables specified in PSPMTABLEMAP– Archive process switches this to clone tablesselect * from pspmtablemap;PM_TRANS_TBL_NAME PM_EVENT_TBL_NAME------------------ ------------------PSPMTRANSHIST PSPMEVENTHIST
Practical PPM ©2009 www.go-faster.co.uk 21
PSPMTABLEMAP
• Archive/Purge switches PPM destination– Prevents concurrent INSERT and DELETE/Query
operations
– Saves read consistency problems on Oracle
– Saves page locks on other databases
• PPM appears not to collect data during this processing– But it is written to clone tables
– Archive process moves it to main hist tables after purge
Practical PPM ©2009 www.go-faster.co.uk 22
Purge Process Can’t Keep Up
• Platform generic query leads to full scan – Even if data has been deleted (manually) high water
marks (HWM) on tables not reset
• Customisation– Oracle specific statement– May need to rebuild HIST tables to reset HWM
• In which case manually set PSPMTABLEMAP to clones, rebuild history tables, run archive process
Practical PPM ©2009 www.go-faster.co.uk 23
Performance Fix for Purge
• Vanilla Code&TransHistSQL.Open("SELECT …AND %DateTimeDiff(X.PM_MON_STRT_DTTM, %CurrentDateTimeIn) >=
(PM_MAX_HIST_AGE * 24 * 60)");
• Expands to … AND ROUND((CAST(( SYSDATE) AS DATE) -
CAST((X.PM_MON_STRT_DTTM) AS DATE)) * 1440, 0) >= (PM_MAX_HIST_AGE * 24 * 60)
• My Suggestions… AND X.PM_MON_STRT_DTTM < SYSDATE - Z.PM_MAX_HIST_AGE
• See blog entry http://blog.psftdba.com/2008/05/performance-tuning-performance-monitor.html
Practical PPM ©2009 www.go-faster.co.uk 24
How much data?
• Control sampling– Proportion of transactions collected
• Depends upon activity on system • On busy self-service system as little as 1 in 5000
– Event sampling frequency• For each agent• 5 minutes – 15 minutes• Depends on whether you want to be able to see short-
lived behaviours.
Practical PPM ©2009 www.go-faster.co.uk 25
Practical PPM ©2009 www.go-faster.co.uk 26
Recent problems in PT8.49
• Prior to patch 8.49.14 – Ports left open in close_wait– Unix systems run out of ports
• Get ‘Application Server is down’ errors
– No Limit on Windows• But the system does progressively slow down
– POC 752524 applied to 8.49.06• Tuxedo Connections capped at 121-127
Practical PPM ©2009 www.go-faster.co.uk 27
Outstanding problems in 8.49.14
• Tuxedo Queuing not reported– Events 300 and 301
• Tuxedo Connections not reported– Event 300
Practical PPM ©2009 www.go-faster.co.uk 28
Practical Examples
• Simple Graphs of Events
• Cumulative Frequency Distributions
• Network Latency
• Performance Trace
Practical PPM ©2009 www.go-faster.co.uk 29
Simple Event Graphs
• You set an event collection interval• All domains collect at that interval
– But each has its own clock– Each collects at different times.
• If you have multiple web/app servers?– Need to aggregate for system wide view– Interpolate between points?
• |(PL/SQL package see notes for this slide)
Practical PPM ©2009 www.go-faster.co.uk 30
Simple Event Graphs
• Raw data from PSPMEVENTHIST or PSPMEVENTARCH– Extract into working storage tables
• Possibly two levels
– Aggregating as you go
Practical PPM ©2009 www.go-faster.co.uk 31
JVM% Free
0
10
20
30
40
50
60
70
80
90
100
Sat 11.10.08 Sat 18.10.08 Sat 25.10.08 Sat 1.11.08 Sat 8.11.08
%J
VM
Us
ed
CS_PROD C
Practical PPM ©2009 www.go-faster.co.uk 32
JVM Sessions
0
500
1000
1500
2000
2500
Sat 11.10.08 Sat 18.10.08 Sat 25.10.08 Sat 1.11.08 Sat 8.11.08
JV
M S
essio
ns
CS_PROD C
Practical PPM ©2009 www.go-faster.co.uk 33
JVM Busy Threads
0
10
20
30
40
50
60
70
80
Sat 11.10.08 Sat 18.10.08 Sat 25.10.08 Sat 1.11.08 Sat 8.11.08
JV
M B
usy T
hre
ad
s
CS_PROD C
Practical PPM ©2009 www.go-faster.co.uk 34
Application Server Requests
0
500
1000
1500
2000
2500
3000
3500
4000
Sat 11.10.08 0:00 Sat 18.10.08 0:00 Sat 25.10.08 0:00 Sat 1.11.08 0:00 Sat 8.11.08 0:00
Re
qu
ests
/ S
am
ple
Peri
od
CS_PROD C PSAPPSRV
Practical PPM ©2009 www.go-faster.co.uk 35
Jolt Message Sizes
• Transaction 115– Size of Jolt Messages into and out of Tuxedo– Message Written to disk
• If message larger than specfied size
• or would cause queue to become ¾ full
– Default Queue Size is 64Kb• Kernel Parameter (windows too)
– Most systems need 128-256Kb
Practical PPM ©2009 www.go-faster.co.uk 36
Cumulative Frequency – ntile()
SELECT pctile, MIN(<value>)
FROM (
SELECT
NTILE(100)
OVER (ORDER BY <value>) AS pctile
FROM <table>
)
GROUP BY <key>, pctile
Practical PPM ©2009 www.go-faster.co.uk 37
1,000
10,000
100,000
1,000,000
10,000,000
0 10 20 30 40 50 60 70 80 90 100
%Tile
Jo
lt M
essag
e S
ize (
byte
s)
JOLT_BYTES_SEND JOLT_BYTES_RCVD
Practical PPM ©2009 www.go-faster.co.uk 38
Anatomy of a Transaction
• Simple PIA Transaction
• 101 – PIA entry/exit– 115 – Jolt Message
• 400- Tuxedo Service– 401 ICPanel
– 410 ICScript
Practical PPM ©2009 www.go-faster.co.uk 39
Anatomy of a Transaction
• Portal PIA Transaction– PMU is consolidated across databases.
• 106 – Portlet– 115 – Jolt Message
• 400- Tuxedo Service– 401 ICPanel
– 410 ICScript
Practical PPM ©2009 www.go-faster.co.uk 40
Transaction Duration Distribution
0.001
0.01
0.1
1
10
100
0 10 20 30 40 50 60 70 80 90 100
%Tile
Du
rati
on
(m
s)
DUR401
DUR400
DUR115
DUR101
QDUR
Practical PPM ©2009 www.go-faster.co.uk 41
Individual Components
• Now try the same analysis for specific components– Determine the top-n components by cumulative execution
time• PPM Analytic uses only event 401
– Doesn’t take web server or queuing time into account.
• I prefer to use event 101, 106– But you have to join the transactions.
– Component identification from transaction 401 contexts
Practical PPM ©2009 www.go-faster.co.uk 42
UC_SAQ_PHOTO.GBL, UC_SAQ_PHOTOGRAPH2, #ICOK
0.001
0.01
0.1
1
10
100
1000
0 10 20 30 40 50 60 70 80 90 100
%Tile
Du
rati
on
(m
s)
DUR401
DUR400
DUR115
DUR101
QDUR
This area is Time spent in web server JVM
Component that uploads an attachment
Practical PPM ©2009 www.go-faster.co.uk 43
UC_SAQ_PHOTO.GBL UC_SAQ_PHOTOGRAPH2 #ICOK
1,000
10,000
100,000
1,000,000
10,000,000
0 10 20 30 40 50 60 70 80 90 100
%Tile
Siz
es (
by
tes)
JOLT_BYTES_SEND
JOLT_BYTES_RCVD
COMPONENT_BUFFER
Jolt Message to App Server increases – large attachments
Component that uploads an attachment
Practical PPM ©2009 www.go-faster.co.uk 44
UC_ENROL_QCKAPPRVL.GBL, UC_ENROL_QCKAPPRVL, Launch Page/Search Page
0.01
0.1
1
10
100
1000
0 10 20 30 40 50 60 70 80 90 100
%Tile
Du
rati
on
(m
s)
DUR401
DUR400
DUR115
DUR101
QDUR
Time spent in ICPanel Service – possibly database or PeopleCode
Interesting jump in response time
Typical Component
Practical PPM ©2009 www.go-faster.co.uk 45
UC_CG_INQUIRY.GBL, UC_CG_INQUIRY, Click PeopleCode Command Button for Field UC_CG_WRK.REFRESH_BTN
0.001
0.01
0.1
1
10
100
1000
0 10 20 30 40 50 60 70 80 90 100
%Tile
Du
rati
on
(m
s)
DUR401
DUR400
DUR115
DUR101
QDUR
Time spent in ICPanel Service – possibly database or PeopleCode
Significant jump in response time
Custom PeopleCode Button
Practical PPM ©2009 www.go-faster.co.uk 46
UC_SAQ_PHOTO.GBL UC_SAQ_PHOTOGRAPH2 #ICCancel
0.001
0.01
0.1
1
10
100
1000
0 10 20 30 40 50 60 70 80 90 100
%Tile
Du
rati
on
(m
s)
DUR401
DUR400
DUR115
DUR101
QDUR
Time spent in web server JVM
Component that uploads attachment
Practical PPM ©2009 www.go-faster.co.uk 47
Network Latency
• Most transactions are sampled
• But three transactions are always recorded• See PMTRANSDEFN.PM_SAMPLING_ENABLE
– 108: User Session logout, expiration, timeout, or error
– 109: User Session began (user logged in)– 116: Redirected round trip time (network latency)
Practical PPM ©2009 www.go-faster.co.uk 48
Transaction 116
• Network round trip from webserver to browser and back again– Includes network transmission time– Browser response time– Client IP address
• Although that could be load balancer or NAT
– Operator ID• LOCATION from HR database?
Practical PPM ©2009 www.go-faster.co.uk 49
Practical PPM ©2009 www.go-faster.co.uk 50
Analysis by Client IP
• IP addresses are of Routers not Clients
Mon Jun 30 page 1 Login Durations by Client IP
IP Address MIN AVG MED MAX VAR NUM_EVENTS---------------- ------ ------ ------ ---------- ------------ ----------174.149.127.223 0.000 0.128 0.094 0.390 18.3 12174.149.1.200 0.047 0.356 0.321 3.046 57.8 290174.149.126.149 0.015 0.604 0.156 14.077 1,835.0 1288192.168.1.171 0.032 0.692 0.157 6.392 2,064.6 1226174.149.126.147 0.031 0.712 0.156 15.062 2,322.9 1302192.168.1.172 0.031 0.742 0.172 61.321 5,218.6 1234192.168.1.170 0.047 0.748 0.172 12.989 2,342.9 1274174.149.126.148 0.031 0.762 0.141 44.918 3,963.8 1300193.113.139.184 0.422 5.380 5.423 20.951 589.8 803
Practical PPM ©2009 www.go-faster.co.uk 51
HR location of OPRID
• First three lines are different network topology• Operator may not actually be at stated location
* especially home workers
Mon Jun 30 page 1 Login Durations by Location (min 10 logins)
LOCATION CITY Cty MIN AVG MED MAX VAR EVENTS---------- ---------------- --- ------ ------ ------ ------- ---------- ------EXT-00-IN Bangalore IND 0.109 5.488 5.468 61.321 5,099.0 688EXT-00-CZ Brno CZE 0.109 4.991 5.218 20.951 2,125.1 555EXT-BK-UK Buckingham GBR 0.062 0.862 0.281 6.578 3,100.8 224
HOME Home Worker* GBR 0.031 0.299 0.188 9.109 377.7 489
INT-L3 Liverpool GBR 0.062 0.211 0.110 2.406 140.6 40INT-BK Buckingham GBR 0.032 0.233 0.078 1.266 114.4 30INT-WS Walsall GBR 0.031 0.086 0.063 0.359 3.5 46
Practical PPM ©2009 www.go-faster.co.uk 52
Login Redirect Duration
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Fri 27.6.08 00:00 Sat 28.6.08 00:00 Sun 29.6.08 00:00 Mon 30.6.08 00:00 Tue 1.7.08 00:00 Wed 2.7.08 00:00 Thu 3.7.08 00:00 Fri 4.7.08 00:00
Time
Du
rati
on
(s)
After we fixed the problem…
Practical PPM ©2009 www.go-faster.co.uk 53
Analytics: Top Components
Practical PPM ©2009 www.go-faster.co.uk 54
Performance Trace
• Generates a group of PMUs for activity in a user session– Choose an ID to identify
records later
Practical PPM ©2009 www.go-faster.co.uk 55
Performance Trace
Practical PPM ©2009 www.go-faster.co.uk 56
Performance Monitoring Unit
• Look at PMU Tree
• Demonstration
Practical Advice and Guidance on the use of PeopleTools Performance
Monitor
David KurtzGo-Faster Consultancy Ltd.
www.go-faster.co.uk