Showing posts with label Instrumentation. Show all posts
Showing posts with label Instrumentation. Show all posts

Thursday, November 02, 2017

nVision Performance Tuning: 5. Additional Instrumentation of nVision

This blog post is part of a series that discusses how to get optimal performance from PeopleSoft nVision reporting as used in General Ledger.

One of the challenges of tuning and monitoring nVision is to be able to identify each report being run. Calls to Oracle instrumentation package dbms_application_info  were added to the component processor in PeopleTools 8.50, and to Application Engine in PeopleTools 8.52.  However, COBOL, nVision, and SQR were never instrumented.
Therefore, there is still a place for the psftapi package and trigger.  When a PeopleSoft batch process starts, it sets the status on its request record on the Process Scheduler request record, psprcsrqst to 7, thus indicating that it is processing.  A trigger on that table fires on that update and calls the psftapi package.  The package sets module and action to the process name and process instance, and also stores the process instance number in a package global variable that can be read with another procedure in the package.  Every scheduled process will have module and action set to something meaningful.  Any PeopleSoft instrumentation will simply overwrite these values.  A sessions module and action are picked up by Oracle monitoring tools, in particular, they are also stored in the Active Session History (ASH).
However, nVision reports always run as the same process name, either NVSRUN for a single report, RPTBOOK for a report book of many reports, or DRILLDWN for a nVision drill-down query.  Knowing the process instance is useful because then we can look up the operator and the run control ID
However, we also need to know the report ID being run.  When each individual nVision report starts it queries the runtime parameters from the PS_NVS_REPORT PeopleTools table.  There is no update, so it is not possible to capture this with a DML trigger.  Instead, you can create a fine-grained audit1 policy on the query with a PL/SQL handler.  The handler package is then invoked by the audit policy.
BEGIN
  DBMS_FGA.ADD_POLICY(
   object_schema      => 'SYSADM',
   object_name        => 'PS_NVS_REPORT',
   policy_name        => 'PS_NVS_REPORT_SEL',
   handler_module     => 'GFC_FGA_NVISION_HANDLER',
   enable             =>  TRUE,
   statement_types    => 'SELECT',
   audit_trail        =>  DBMS_FGA.DB + DBMS_FGA.EXTENDED);
END;
/
The handler package runs in the session that triggered the audit.  It can access the audit record and extract the string of colon-separated bind variables thus obtaining the report ID and business unit.  It updates the session action attribute in the same way as psftapi.sql.
CREATE OR REPLACE PROCEDURE sysadm.gfc_fga_nvision_handler
(object_schema VARCHAR2
,object_name   VARCHAR2
,policy_name   VARCHAR2)
AS
  l_sqlbind VARCHAR2(4000);
  l_parm1   VARCHAR2(30);
  l_parm2   VARCHAR2(30);
  l_parm3   VARCHAR2(30);
  l_parm4   VARCHAR2(30);
BEGIN
  BEGIN
    SELECT x.lsqlbind
    ,      SUBSTR(x.lsqlbind,x.start1,LEAST(30,NVL(x.end1,x.lensqlbind+1)-x.start1)) parm1
    ,      SUBSTR(x.lsqlbind,x.start2,LEAST(30,NVL(x.end2,x.lensqlbind+1)-x.start2)) parm2
    ,      SUBSTR(x.lsqlbind,x.start3,LEAST(30,NVL(x.end3,x.lensqlbind+1)-x.start3)) parm3
    ,      SUBSTR(x.lsqlbind,x.start4,LEAST(30,NVL(x.end4,x.lensqlbind+1)-x.start4)) parm4
    INTO   l_sqlbind, l_parm1, l_parm2, l_parm3, l_parm4
    FROM   (
      SELECT l.*
      ,      NULLIF(REGEXP_INSTR(lsqlbind,' #[0-9]+\([0-9]+\)\:',1,1,1,'i'),0) start1
      ,      NULLIF(REGEXP_INSTR(lsqlbind,' #[0-9]+\([0-9]+\)\:',1,2,0,'i'),0) end1
      ,      NULLIF(REGEXP_INSTR(lsqlbind,' #[0-9]+\([0-9]+\)\:',1,2,1,'i'),0) start2
      ,      NULLIF(REGEXP_INSTR(lsqlbind,' #[0-9]+\([0-9]+\)\:',1,3,0,'i'),0) end2
      ,      NULLIF(REGEXP_INSTR(lsqlbind,' #[0-9]+\([0-9]+\)\:',1,3,1,'i'),0) start3
      ,      NULLIF(REGEXP_INSTR(lsqlbind,' #[0-9]+\([0-9]+\)\:',1,4,0,'i'),0) end3
      ,      NULLIF(REGEXP_INSTR(lsqlbind,' #[0-9]+\([0-9]+\)\:',1,4,1,'i'),0) start4
      ,      NULLIF(REGEXP_INSTR(lsqlbind,' #[0-9]+\([0-9]+\)\:',1,5,1,'i'),0) end4
      ,      LENGTH(lsqlbind) lensqlbind
      FROM   sys.fga_log$ l
    ) x
    WHERE  x.sessionid = USERENV('SESSIONID')
    AND    x.entryid   = USERENV('ENTRYID')
    AND    x.obj$name  = 'PS_NVS_REPORT';
  EXCEPTION
    WHEN no_data_found THEN
      RAISE_APPLICATION_ERROR(-20000,'GFC_FGA_NVISION_HANDER: No Audit Row');
  END;

  IF l_parm4 IS NULL THEN
    l_parm4 := l_parm3;
    l_parm3 := l_parm2;
    l_parm2 := l_parm1;
  END IF;

  IF l_parm4 IS NULL THEN
    l_parm4 := l_parm3;
    l_parm3 := l_parm2;
  END IF;

  IF l_parm4 IS NULL THEN
    l_parm4 := l_parm3;
  END IF;

  dbms_output.put_line(l_sqlbind);
  dbms_output.put_line(l_parm1);
  dbms_output.put_line(l_parm2);
  dbms_output.put_line(l_parm3);
  dbms_output.put_line(l_parm4);

  dbms_application_info.set_action(SUBSTR('PI='||psftapi.get_prcsinstance()||':'||l_parm4||':'||l_parm3,1,64));
--EXECUTE IMMEDIATE 'ALTER SESSION SET TRACEFILE_IDENTIFIER=''PI='||psftapi.get_prcsinstance()||':'||l_parm4||':'||l_parm3||'''';
END;
/
The action attribute is then picked up by the ASH data.  It is easy to extract the report ID and business unit from the action string with regular expressions, as in this example query.
set lines 160 trimspool on
column module format a12
column action format a32
column client_id format a12 
column prcsinstance format a9 heading 'Process|Instance'
column business_unit format a8 heading 'Business|Unit'
column report_id format a10
select DISTINCT module, action, client_id
,      REGEXP_SUBSTR(h.action,'[[:digit:]]+') prcsinstance
,      substr(regexp_substr(h.action,':([[:alnum:]])+',1,2),2) business_unit
,      substr(regexp_substr(h.action,':([A-Za-z0-9_-])+',1,1),2) report_id
from   v$active_session_History h
where  program like 'PSNVS%'
/
Here you can see how process instance, report ID and business unit are held in action and how they can be extracted.  Now, it is possible to profile ASH data for nVision processes, find the long-running SQL and determine which layout it came from.
                                                           Process   Business
MODULE       ACTION                           CLIENT_ID    Instance  Unit     REPORT_ID
------------ -------------------------------- ------------ --------- -------- ----------
RPTBOOK      PI=1780508:GBGL224S:UK001        GBNVISION    1780508   UK001    GBGL113S
RPTBOOK      PI=1780509:GBGL010E:UK002        GBNVISION    1780509   UK002    GBGL010E
RPTBOOK      PI=1780502:GBGL91PF:UK001        GBNVISION    1780502   UK001    GBGL91PF
RPTBOOK      PI=1780502:GBGL91FR:UK001        GBNVISION    1780502   UK001    GBGL91FR
RPTBOOK      PI=1780502:GBGL91GB:UK001        GBNVISION    1780502   UK001    GBGL91GB
RPTBOOK      PI=1780502:GBGL91DM:UK002        GBNVISION    1780502   UK002    GBGL91DM
RPTBOOK      PI=1780506:GBEXP2AM:UK001        GBNVISION    1780506   UK001    GBEXP2AM
RPTBOOK      PI=1780509:Processing            GBNVISION    1780509            Processing
RPTBOOK      PI=1780500:GBGL113S:UK003        GBNVISION    1780500   UK003    GBGL113S
RPTBOOK      PI=1780509:GBGL010E:UK000        GBNVISION    1780508   UK000    GBGL010E 
This code in this blog is available on github.

Other recommendations

  • Create an index on SYS.FGA_LOG$ to support the query in the FGA handler package.
CREATE INDEX sys.fga_log$_obj$name
ON sys.fga_log$ (obj$name, sessionid, entryid)
TABLESPACE sysaux PCTFREE 1 COMPRESS 1 
/
  • Put a regular purge of the FGA_LOG$ table in place, to purge rows after, say, 31 days.  Otherwise, it will grow indefinitely, one row will be added for every nVision report run.
DELETE FROM fga_log$ 
WHERE obj$name = 'PS_NVS_REPORT' 
AND ntimestamp# 
  • Move SYS.AUD$ and SYS.FGA_LOG$ from the SYSTEM tablespace to another ASSM tablespace using the instructions in Oracle support note 1328239.1.
1. My thanks to Tyler McPheeters for the original idea of using FGA

Friday, November 23, 2012

PeopleTools 8.52 Application Engine sets MODULE and ACTION

Update 4.3.2015: See also Undocumented Application Engine Parameter: EnableAEMonitoring

I have written and spoken often about the huge importance of instrumentation in an application. PeopleSoft introduced internal instrumentation to PeopleTools 8.44 for its own Performance Monitor that works on any platform. Since PeopleTools 7.53, when running on an Oracle database, PeopleTools also calls the Oracle supplied package dbms_application_info package to set the module and action for the session. The values set were not particularly useful, so I wrote a PL/SQL package (psftapi) and trigger to set the module and action attributes when a process started.

In PeopleTools 8.50 this instrumentation was enhanced to set module and action to the component and page name in the PIA. These values appear in Oracle Enterprise Manager and Active Session History (ASH). They can be used to profile components.

Now, in PeopleTools 8.52 Application Engine has been enhanced to set the action to the full Application Engine step name.

To illustrate what happens, I have produced an Oracle Extended SQL Trace for the process, but I have shown only the lines in a SQL Trace file with 3 asterisks. So you can see MODULE and ACTION being set (and some other timestamp information). The lines in italic were emitted when module and action were set by my psftapi package, in which I simply set the MODULE to the Application Engine program name.
*** 2012-11-22 21:47:38.282
*** SESSION ID:(7.2237) 2012-11-22 21:47:38.282
*** CLIENT ID:(PS) 2012-11-22 21:47:38.282
*** SERVICE NAME:(HCM91) 2012-11-22 21:47:38.282
*** MODULE NAME:(PSPMCSOSUM) 2012-11-22 21:47:38.282
*** ACTION NAME:(PI=867:Processing) 2012-11-22 21:47:38.282
*** MODULE NAME:(PSAE.PSPMCSOSUM.1448) 2012-11-22 21:47:38.419
*** ACTION NAME:(PSPMCSOSUM) 2012-11-22 21:47:38.419
*** ACTION NAME:(PSPMCSOSUM.MAIN.STATS.S) 2012-11-22 21:47:38.420
*** 2012-11-22 21:47:38.588
*** ACTION NAME:(PSPMCSOSUM.MAIN.CTL.P) 2012-11-22 21:47:38.919
*** ACTION NAME:(PSPMCSOSUM.MAIN.SELECT.C) 2012-11-22 21:47:38.937
*** ACTION NAME:(PSPMCSOSUM.GETCNT.CNT.P) 2012-11-22 21:47:38.963
…
*** ACTION NAME:(PSPMCSOSUM.MAIN.SELECT.C) 2012-11-22 21:48:15.168
*** ACTION NAME:(PSPMCSOSUM.GETCNT.CNT.P) 2012-11-22 21:48:15.168
*** ACTION NAME:(PSPMCSOSUM.MAIN.Close.P) 2012-11-22 21:48:15.197
*** ACTION NAME:(PI=867:Success) 2012-11-22 21:48:15.201

Application Engine also sets the MODULE attribute.  In the above example it was set to PSAE.PSPMCSOSUM.1448.  The string is composed of three parts.
  • PSAE indicates that it was running Application Engine. It sets the same value for both stand-alone and Tuxedo server Application Engine processes.
  • PSPMCSOSUM was the name of the Application Engine Program
  • 1448 was the operating system process ID of the Application Engine process.  This value is recorded in the column SESSIONIDNUM on the table PSPRCSQUE.
The inclusion of the SESSIONIDNUM is useful because it is possible to determine the process instance.  If you have multiple instances of the same Application Engine program running concurrently, you can work out which ASH samples correspond to which process instance by matching the session ID.

SELECT  r.prcsinstance, h.module, h.action, sum(1) ash_secs
FROM v$active_Session_history h
, psprcsque q
, psprcsrqst r
WHERE r.prcsinstance = q.prcsinstance
AND h.module = 'PSAE.'||q.prcsname||'.'||q.sessionidnum
AND h.sample_time BETWEEN r.begindttm AND NVL(r.enddttm,SYSDATE)
and r.prcsinstance = 867
GROUP BY r.prcsinstance, h.module, h.action
ORDER BY ash_secs DESC
/

PRCSINSTANCE MODULE               ACTION                    ASH_SECS
------------ -------------------- ------------------------- --------
         867 PSAE.PSPMCSOSUM.1448 PSPMCSOSUM.GETCNT.CNT.P          7
         867 PSAE.PSPMCSOSUM.1448 PSPMCSOSUM.MAIN.STATS.S          1

Unfortunately, this change to MODULE also means that it is no longer possible to enable trace by setting a watchpoint (as I blogged previously).  It no longer works with Application Engine because it is not possible to predict the value of the SESSIONIDNUM in MODULE.

I am testing with 8.52.12 and have found a significant problem. Application Engine doesn't set the action on DO SELECT statements. The first statement in the trace extract below does come from PSPMCSOSUM.MAIN.CTL PeopleCode, however, the second SQL statement actually comes from PSPMCSOSUM.MAIN.SELECT.D, but the absence of another action line would lead you to believe the statement came from the preceding PeopleCode step.
*** ACTION NAME:(PSPMCSOSUM.MAIN.CTL.P) 2012-11-22 21:47:38.919
WAIT #0: nam='SQL*Net message to client' ela= 0 driver id=1413697536 #bytes=1 p3=0 obj#=-40016373 tim=257220814597
WAIT #0: nam='SQL*Net message from client' ela= 740 driver id=1413697536 #bytes=1 p3=0 obj#=-40016373 tim=257220815371
CLOSE #348647416:c=0,e=33,dep=0,type=1,tim=257220815447
=====================
PARSING IN CURSOR #348528704 len=98 dep=0 uid=45 oct=3 lid=45 tim=257220815496 hv=1424819941 ad='7ff2559e550' sqlid='98af7ppafu1r5'
END OF STMT
PARSE #348528704:c=0,e=35,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1148792852,tim=257220815496
EXEC #348528704:c=0,e=30,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1148792852,tim=257220815572
WAIT #348528704: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-40016373 tim=257220815612
FETCH #348528704:c=0,e=2824,p=0,cr=16,cu=0,mis=0,r=1,dep=0,og=1,plh=1148792852,tim=257220818462
STAT #348528704 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=16 pr=0 pw=0 time=2828 us)'
STAT #348528704 id=2 cnt=3075 pid=1 pos=1 obj=228740 op='INDEX FULL SCAN PS_PSPMCSO_CHART (cr=16 pr=0 pw=0 time=333 us cost=0 size=13 card=1)'
WAIT #348528704: nam='SQL*Net message from client' ela= 123 driver id=1413697536 #bytes=1 p3=0 obj#=-40016373 tim=257220818671
FETCH #348528704:c=0,e=1,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=1148792852,tim=257220818696
WAIT #348528704: nam='SQL*Net message to client' ela= 0 driver id=1413697536 #bytes=1 p3=0 obj#=-40016373 tim=257220818714
WAIT #348528704: nam='SQL*Net message from client' ela= 702 driver id=1413697536 #bytes=1 p3=0 obj#=-40016373 tim=257220819434
CLOSE #349389936:c=0,e=16,dep=0,type=1,tim=257220819496
=====================
PARSING IN CURSOR #349702624 len=555 dep=0 uid=45 oct=3 lid=45 tim=257220819548 hv=1888777338 ad='7ff2559ce90' sqlid='5phgqq9s98x3u'
SELECT DISTINCT OPRID , TO_CHAR(CAST((LOGINDTTM) AS TIMESTAMP),'YYYY-MM-DD-HH24.MI.SS.FF') , TO_CHAR(CAST((LOGOUTDTTM) AS TIMESTAMP),'YYYY-MM-DD-HH24.MI.SS.FF') , PM_SIGNON_TYPE , PM_SIGNOFF_TYPE FROM PSPMCSOLOG_VW WHERE '1' <> '2' AND TO_DATE(TO_CHAR(LOGINDTTM,'YYYY-MM-DD'),'YYYY-MM-DD') >= TO_DATE('2005-07-06','YYYY-MM-DD') AND TO_DATE(TO_CHAR(LOGOUTDTTM,'YYYY-MM-DD'),'YYYY-MM-DD') >= TO_DATE('2005-07-06','YYYY-MM-DD') AND OPRID NOT IN ( SELECT OPRID FROM PSPMCSOCHRTXOP) AND LOGIPADDRESS NOT IN ( SELECT LOGIPADDRESS FROM PSPMCSOCHRTXIP) ORDER BY 2
END OF STMT
PARSE #349702624:c=0,e=35,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4226533831,tim=257220819547
EXEC #349702624:c=0,e=47,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4226533831,tim=257220819669
WAIT #349702624: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-40016373 tim=257220819723
WAIT #349702624: nam='SQL*Net more data to client' ela= 76 driver id=1413697536 #bytes=8145 p3=0 obj#=-40016373 tim=257220831271
FETCH #349702624:c=15600,e=11621,p=0,cr=214,cu=0,mis=0,r=201,dep=0,og=1,plh=4226533831,tim=257220831375
WAIT #349702624: nam='SQL*Net message from client' ela= 988 driver id=1413697536 #bytes=1 p3=0 obj#=-40016373 tim=257220832412
WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-40016373 tim=257220832481
WAIT #0: nam='SQL*Net message from client' ela= 40 driver id=1413697536 #bytes=1 p3=0 obj#=-40016373 tim=257220832541
*** ACTION NAME:(PSPMCSOSUM.MAIN.SELECT.C) 2012-11-22 21:47:38.937

However, MAIN.SELECT.D duly appears in the AE batch timings report.


                               C o m p i l e    E x e c u t e    F e t c h        Total           
SQL Statement                  Count   Time     Count   Time     Count   Time     Time    
------------------------------ ------- -------- ------- -------- ------- -------- --------
…
AE Program: PSPMCSOSUM

MAIN.CTL.D                           1      0.0       1      0.0       2      0.0      0.0
MAIN.SELECT.D                        1      0.0       1      0.0     556      0.0      0.0
MAIN.STATS.S                         1      0.0       1      0.5       0      0.0      0.5
…

This is a serious problem  If you profile the top SQL statements in an Application Engine (using either ASH as I have done above, or by profiling a trace file and looking for the top SQL statement by searching through the raw trace file) it will lead you to the wrong conclusion!  Time spent in PSPMCSOSUM.MAIN.SELECT.D will be accounted as having been spent in PSPMCSOSUM.GETCNT.CNT.P.  It is not until you look for the specific SQL statement in Application Designer that you realise that the source code doesn't match the instrumentation.
 
Nevertheless, this instrumentation is a very welcome and significant improvement for Application Engine. It will help to localise performance problems. However, until the omissions are fixed, you will need to be aware of their implications and avoid falling into the traps.

My thanks to Wolfgang Breitling for telling me about the feature.

Thursday, June 26, 2008

Discrepancy in Timings between Application Engine and Oracle SQL Trace

I would like to draw attention to the risk of uncritically accepting performance information from either a tool or an instrumented program. It is always necessary to consider whether you can believe what you are being told, or whether something is lying to you.

In this case, I want to highlight in the Application Engine Timings report. Take the following example from Time and Labor. The AE Timings report claims that the process spends 91.5% of the execution time in SQL. This would lead you to believe that this process is affected by the performance of the SQL, and that you need to do some SQL performance tuning.

---------------------------------------------------------------------------------
Total run time : 1411.6
Total time in application SQL : 1291.7 Percent time in application SQL : 91.5%
Total time in PeopleCode : 119.4 Percent time in PeopleCode : 8.5%
Total time in cache : 0.1 Number of calls to cache : 3
---------------------------------------------------------------------------------

However, all is not quite as it appears. Faced with a potential SQL problem, I usually turn to Oracle SQL Trace. The following is taken from a TKPROF analysis of the trace file for the same process.

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 153883 1.80 2.41 0 4 0 0
Execute 192806 429.94 464.96 8398 276000 588543 123140
Fetch 52740 2.75 3.55 590 202419 0 15265
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 399429 434.49 470.93 8988 478423 588543 138405

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 196521 0.00 0.13
SQL*Net message from client 196521 0.32 849.98
...
SQL*Net more data to client 114884 0.00 1.91
SQL*Net more data from client 250825 0.05 1.88
...

The database reports 849.98s for the idle event SQL*Net message from client, and only 470.93s on non-idle activity. So only 36% of the total elapsed time is spent in the database, the other 64% is time that the Application Engine process is busy executing code.

Why the discrepancy?

This is certainly a very chatty process. There are nearly 200,000 round-trip SQL*net messages between AE and Oracle. It could be a network latency problem. However, very little time is spent on the SQL*Net more data events so it is unlikely that there is a significant network component.

Let's look at the statements that have the longest times in AE timing report .

                         PeopleSoft Application Engine Timings
(All timings in seconds)

C o m p i l e E x e c u t e F e t c h Total
SQL Statement Count Time Count Time Count Time Time
------------------------------ ------- -------- ------- -------- ------- -------- --------
PeopleCode

DELETE PS_SCH_ADHOC_DTL 36540 60.6 0 0.0 60.6
INSERT PS_SCH_ADHOC_DTL 71887 885.6 0 0.0 885.6
Record.SelectByKey SCH_ADHOC_DTL 36596 118.0 36596 0.0 118.0
--------
1291.7

In less than 24 minutes, some statement have been executed very frequently. So, there were lots of SQL*Net messages, because there were lots of SQL statements. These SQL statements are submitted from PeopleCode. It is very likely we are in some sort of procedural loop, and therein lies the problem.

PeopleSoft programs can only measure the time taken to execute SQL from the perspective of the client program. In this case the timings are calculated by instrumentation inside Application Engine and PeopleCode. Meanwhile, the timings in the Oracle trace are obtained from instrumentation within the Oracle kernel.

There is still a fair amount of code in PeopleSoft that executes during the time counted as execution of SQL Statements. This is not just the SQL*Net libraries. I think that the timings include most of the time it takes for sqlexec() and Row set functions to execute. The discrepancy with the Oracle time mounts with every SQL operation.

In this case, there is very little to be gained by tuning the SQL. We can't do much about the PeopleCode because it is delivered vanilla functionality. All we can do is to bring more CPU to bear upon the problem by running multiple concurrent instances of the process.