Friday, March 13, 2009

Using Oracle Enterprise Manager (Grid Control) with PeopleSoft

If you use Oracle Grid Control to monitor your PeopleSoft system, here is a simple tip that will help you identify batch processes.

Oracle provides two columns on the session information (v$session) to hold context information. They provide a PL/SQL package DBMS_APPLICATION_INFO, which has procedures to read and update these values. The idea is that application developers will instrument their programs and update these values. Oracle’s Applications (that it has developed itself), such as E-Business Suite does this. PeopleSoft was rather slow to make use of this. They do set the module and action, but not to very useful values.

However, you can create a trigger on the Process Scheduler request table that will update these values when a process starts.

(Updated 19.4.2009) I have created a PL/SQL package psftapi that contains several procedures that I have used from triggers and other PL/SQL programs. It contains a function that sets the ACTION for the session with the process instance and the description of the status.
...
PROCEDURE set_action
(p_prcsinstance INTEGER
,p_runstatus VARCHAR2
) IS
l_runstatus VARCHAR2(10 CHAR);
BEGIN
BEGIN
SELECT x.xlatshortname
INTO l_runstatus
FROM psxlatitem x
WHERE x.fieldname = 'RUNSTATUS'
AND x.fieldvalue = p_runstatus
AND x.eff_status = 'A'
AND x.effdt = (
SELECT MAX(x1.effdt)
FROM psxlatitem x1
WHERE x1.fieldname = x.fieldname
AND x1.fieldvalue = x.fieldvalue
AND x1.effdt <= SYSDATE); EXCEPTION WHEN no_data_found THEN l_runstatus := 'Status:'||p_runstatus; END; sys.dbms_application_info.set_action( action_name => SUBSTR('PI='||p_prcsinstance||':'||l_runstatus,1,32) );
END set_action;
...

This procedure can be called from a trigger:
CREATE OR REPLACE TRIGGER sysadm.psftapi_store_prcsinstance
BEFORE UPDATE OF runstatus ON sysadm.psprcsrqst
FOR EACH ROW
WHEN ((new.runstatus IN('3','7','8','9','10') OR
old.runstatus IN('7','8')) AND new.prcstype != 'PSJob')
BEGIN
IF :new.runstatus = '7' THEN
psftapi.set_prcsinstance(p_prcsinstance => :new.prcsinstance);
psftapi.set_action(p_prcsinstance=>:new.prcsinstance
,p_runstatus=>:new.runstatus
,p_prcsname=>:new.prcsname);
ELSIF psftapi.get_prcsinstance() = :new.prcsinstance THEN
psftapi.set_action(p_prcsinstance=>:new.prcsinstance
,p_runstatus=>:new.runstatus);
END IF;
EXCEPTION WHEN OTHERS THEN NULL; --exception deliberately coded to suppress all exceptions
END;
/
What is the benefit? The MODULE and ACTION show up in Grid Control. So now you can immediately identify the name and Process Instance of those expensive processes.
Screenshot from Oracle Enterprise Manager
Unfortunately, it is not possible to do anything similar for sessions created by the Application Server. So all you know is what session belongs to what kind of server process. The Client Information is set at the top of each service, so you know the PeopleSoft Operator ID, but that is all.

It would be nice if perhaps the Component name and PeopleCode context was written to MODULE and ACTION. But it isn’t.

Updated 9.9.11: PeopleTools 8.50 does exactly this, there is another posting on this subject.

Thursday, March 12, 2009

Minimum Number of Application Server Processes

I have had two conversations recently about what happens if you have only a single PSAPPSRV process in a domain. One of which was on the DBA Forum.

Basically, you should always have at least two instances of any server process that has a non-zero recycle count.

It is rare to see only one PSAPPSRV process in Application Server domains that support the PIA, but customers who use the Integration Broker often have separate Application Server domains for the publication and subscription servers. These domains are often not heavily used, in which case they have been configured with just one of each server process.

This advice applies to the PSAPPSRV, PSQRYSRV, PSBRKHND, PSSUBHND, PSANALYTICSRV servers

The exceptions are
  • PSSAMSRV is only used by Windows clients in 3-tier mode (nVision and PS/Query)
  • PSMSGDSP, only a single process can be configured
  • PSAESRV, because in the Process Scheduler each PSAESRV has its own queue.
The problem occurs when the server process recycles. This occurs when the number of services handled reaches the recycle count. When the only remaining server process on a shared queue shuts down the queue is also deleted, and the advertised services are removed from the Tuxedo Bulletin Board. If a service requests arrives in the application server domain before the new server process has started, and updated the bulletin board with advertised processes, the Jolt handler (JSH) will determine that the service request is not advertised and will raise an error.

It is quite simple to demonstrate this in PeopleSoft. In my demo system, I set the recycle count on PSAPPSRV to just 10 and the minimum number of servers to 1.

[PSAPPSRV]
;=========================================================================
; Settings for PSAPPSRV
;=========================================================================

;-------------------------------------------------------------------------
; UBBGEN settings
Min Instances=1
Max Instances=2
Service Timeout=300

;-------------------------------------------------------------------------
; Number of services after which PSAPPSRV will automatically restart.
; If the recycle count is set to zero, PSAPPSRV will never be recycled.
; The default value is 5000.
; Dynamic change allowed for Recycle Count
Recycle Count=10

It is not long until the PSAPPSRV process recycles, and you get this message in the application server log.

PSAPPSRV.2140 (10) [03/11/09 06:55:15 PTWEBSERVER@GO-FASTER-4](0) Recycling server after 10 services

You can also see in the shutdown message in the TUXLOG file.

The last line is the error message from the JSH process that cannot enqueue the service request because the Application Server is down. If you suspect that you have been getting this problem look for that error message.

065655.GO-FASTER-4!BBL.2444.1760.0: LIBTUX_CAT:541: WARN: Server APPSRV/1 terminated
065655.GO-FASTER-4!BBL.2444.1760.0: LIBTUX_CAT:550: WARN: Cleaning up restartable server APPSRV/1
065655.GO-FASTER-4!cleanupsrv.1756.1204.-2: 03-11-2009: Tuxedo Version 8.1, 32-bit
065655.GO-FASTER-4!cleanupsrv.1756.1204.-2: CMDTUX_CAT:542: ERROR: Cannot find service to which to forward request
065655.GO-FASTER-4!cleanupsrv.1756.1204.-2: server APPSRV/1: CMDTUX_CAT:551: INFO: server removed
065655.GO-FASTER-4!JSH.2192.4860.-2: JOLT_CAT:1043: "ERROR: tpcall() call failed, tperrno = 6"

Hence, you should always have at least two PSAPPSRVs process, so that queue is not removed, and the other server(s) can handle requests. Of course there is a small chance that two servers could recycle at the same time, but that is very unlikely.

Thursday, March 05, 2009

Automatically Granting Privileges on Newly Created Tables

I saw an interesting question on the Oracle-L forum: We have PeopleSoft applications that create tables on the fly. Developers want access to those tables that will be created on the fly, in case the process that creates it ends abnormally. I looked into granting via DDL triggers, it seemed like granting access via them is a problem. Is there a way to grant access other than doing 'grant select any'.

I am finding it increasingly common for developers and support staff not to have direct access to the PeopleSoft OwnerID schema (SYSADM) in even development environments, but using personal database logins. They need to have SELECT privilege on tables. The problem described above also occurs when table is rebuilt by Application Designer. When it is dropped the granted privileges disappear with the table.

It is certainly true that you cannot issue DDL in DDL trigger on the same object that caused the trigger to fire. You will get an error caused by a deadlock in the recursive SQL.

CREATE OR REPLACE TRIGGER gfc_grant AFTER CREATE ON sysadm.schema
DECLARE
l_cmd VARCHAR2(1000 CHAR);
BEGIN
IF ora_dict_obj_type = 'TABLE' THEN
l_cmd := 'GRANT SELECT ON '||ora_dict_obj_owner||'.'||ora_dict_obj_name||' TO gofaster';
dbms_output.put_line('DDL:'||l_cmd);
EXECUTE IMMEDIATE l_cmd;
END IF;
END;
/
show errors
set serveroutput on
DDL:GRANT SELECT ON SYSADM.T TO gofaster
CREATE TABLE t (a NUMBER)
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-30511: invalid DDL operation in system triggers
ORA-06512: at line 9

Some sites have regular maintenance jobs that recreate any missing privileges.

However, there is a way to have the privileges automatically recreated soon after the table is built. You could use a DDL trigger to submit a job to the Oracle job scheduler to grant the privileges. You can't submit DDL directly via the job scheduler, so you need a procedure to which you can pass the DDL as a string parameter, and then execute it as dynamic SQL in the procedure.

CREATE OR REPLACE PROCEDURE myddl
(p_ddl IN VARCHAR2) IS
BEGIN
EXECUTE IMMEDIATE p_ddl;
END;
/

Then the DDL trigger can submit a job to call this procedure with the GRANT command in the parameter.

CREATE OR REPLACE TRIGGER gfc_grant
AFTER CREATE ON sysadm.schema
DECLARE
l_jobno NUMBER;
BEGIN
IF ora_dict_obj_type = 'TABLE' THEN
dbms_job.submit(l_jobno,'myddl(''GRANT SELECT ON '||ora_dict_obj_owner||'.'||ora_dict_obj_name||' TO gofaster'');');
END IF;
END;
/

If I create this table

CREATE TABLE t (a NUMBER);

I get this job

SELECT * FROM dba_jobs
/

JOB LOG_USER
---------- ------------------------------------------------------------------------------------------
PRIV_USER
------------------------------------------------------------------------------------------
SCHEMA_USER                                                                                LAST_DATE
------------------------------------------------------------------------------------------ -------------------
LAST_SEC                 THIS_DATE           THIS_SEC                 NEXT_DATE
------------------------ ------------------- ------------------------ -------------------
NEXT_SEC                 TOTAL_TIME BRO
------------------------ ---------- ---
INTERVAL
---------------------------------------------------------------------------------------------------------------
FAILURES
----------
WHAT
---------------------------------------------------------------------------------------------------------------
60 SYSADM
SYSADM
SYSADM
19:04:52 05/03/2009
19:04:52                          0 N
null

myddl('GRANT SELECT ON SYSADM.T TO gofaster');


After the job has run, which should normally only be a few seconds, I get these privileges

SELECT * FROM user_tab_privs WHERE table_name = 'T';

GRANTEE
------------------------------------------------------------------------------------------
OWNER
------------------------------------------------------------------------------------------
TABLE_NAME
------------------------------------------------------------------------------------------
GRANTOR
------------------------------------------------------------------------------------------
PRIVILEGE
---------------------------------------------------------------------------------------------------------------
GOFASTER
SYSADM
T
SYSADM
SELECT


Added 2.4.2009: Following this posting it was put to me that you could get Application Designer to build scripts with the commands to add the privilege by adding a second command to the create table DDL model. See part 2.

Friday, February 27, 2009

Performance Benefits of ReUse Statement Flag in Application Engine

I have achieved some significant performance improvements in some Application Engine programs by just enabling the ReUse Statement flag on certain steps. I thought I would share a recent example of how effective this can be. I don't think I can improve on the description of this feature in PeopleBooks: 
"One of the key performance features of PeopleSoft Application Engine is the ability to reuse SQL statements by dedicating a persistent cursor to that statement. Unless you select the ReUse property for a SQL action, %BIND fields are substituted with literal values in the SQL statement. The database has to recompile the statement every time it is executed. However, selecting ReUse converts any %BIND fields into real bind variables (:1, :2, and so on), enabling PeopleSoft Application Engine to compile the statement once, dedicate a cursor, and re-execute it with new data multiple times. This reduction in compile time can result in dramatic improvements to performance. In addition, some databases have SQL statement caching. Every time they receive SQL, they compare it against their cache of previously executed statements to see if they have seen it before. If so, they can reuse the old query plan. This works only if the SQL text matches exactly. This is unlikely with literals instead of bind variables." 
In fact, most databases do this, and Oracle certainly does. 
On Oracle, you could enable CURSOR_SHARING. Then Oracle effectively replaces the literals with bind variables at parse time. However, I certainly would not recommend doing this database-wide. Whenever I have tried this on a PeopleSoft system, it has had severe negative effects elsewhere. I have enabled cursor sharing at session level for specific batch programs (using a trigger), but even then it is not always beneficial. 
Instead, I do recommend using the ReUse Statement flag wherever possible. It cannot just be turned on indiscriminately, the same section in PeopleBooks goes on to describe some limitations (which is probably why the default value for the flag is false). To illustrate the kind of improvement you can obtain, here is a real-life example. 
This is an extract from the batch timings report at the end of the Application Engine trace file. We are interested in statements with the high compile count. ReUse Statement is not enabled on these 4 steps. They account for more that 50% of the overall execution time.
                          PeopleSoft Application Engine Timings
                              (All timings in seconds)

        C o m p i l e    E x e c u t e    F e t c h        Total
SQL Statement                  Count   Time     Count   Time     Count   Time     Time
------------------------------ ------- -------- ------- -------- ------- -------- --------
99XxxXxx.Step02.S                 8453      2.8    8453    685.6       0      0.0    688.4
99XxxXxx.Step03.S                 8453      5.0    8453   2718.8       0      0.0   2723.8
99XxxXxx.Step05.S                 8453      0.9    8453    888.4       0      0.0    889.3
99XxxXxx.Step06.S                 8453      0.4    8453     17.4       0      0.0     17.8

------------------------------------------------------------------------------------------
Total run time                :     8416.4
Total time in application SQL :     8195.0   Percent time in application SQL :       97.4%
Total time in PeopleCode      :      192.7   Percent time in PeopleCode      :        2.3%
Total time in cache           :        8.7   Number of calls to cache        :       8542
------------------------------------------------------------------------------------------
Now, I have enabled ReUse Statement on these steps. I have not changed anything else.
                         C o m p i l e    E x e c u t e    F e t c h        Total
SQL Statement                  Count   Time     Count   Time     Count   Time     Time
------------------------------ ------- -------- ------- -------- ------- -------- --------
99XxxXxx.Step02.S                    1      0.0    8453    342.3       0      0.0    342.3
99XxxXxx.Step03.S                    1      0.0    8453     83.3       0      0.0     83.3
99XxxXxx.Step05.S                    1      0.0    8453      8.7       0      0.0      8.7
99XxxXxx.Step06.S                    1      0.0    8453      7.6       0      0.0      7.6
------------------------------------------------------------------------------------------
Total run time                :     5534.1
Total time in application SQL :     5341.7   Percent time in application SQL :       96.5%
Total time in PeopleCode      :      190.8   Percent time in PeopleCode      :        3.4%
Total time in cache           :        1.1   Number of calls to cache        :         90
------------------------------------------------------------------------------------------
Notice that:
  • The number of compilations for each step has gone down to 1, though the number of executions remains the same
  • The execution time for the first three statements has fallen by nearly 90%.
  • The improvement in the 4th statement is quite modest because it did not contain any bind variables, but clearly, some of the time reported as execution time by Application Engine is associated with the preparation of a new SQL statement.
To emphasise the point, lets look at the effect on the database. The following are extracts from the TKPROF output for Oracle SQL trace files for these processes. First the TKPROF without ReUse Statement:
OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse   101063   2600.60    2602.83       6197     661559          4           0
Execute 101232   1817.96    3787.17    1572333   73729207   10617830     4770112
Fetch    96186    385.41    1101.47     374425   25986600          0       96285
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   298481   4803.97    7491.48    1952955  100377366   10617834     4866397

Misses in library cache during parse: 25498
Misses in library cache during execute: 90

Elapsed times include waiting on following events:
Event waited on                             Times   Max. Wait  Total Waited
----------------------------------------   Waited  ----------  ------------
db file sequential read                   1199472        0.36       2601.83
SQL*Net message from client                130345        1.57        296.50
db file scattered read                       8816        0.39        171.47

OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse   100002     13.51      13.57         17        820         94           0
Execute 131495     30.00      31.31       7025      29277      21164       74315
Fetch   141837    218.77     295.49     159969    3039304         12      519406
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   373334    262.28     340.38     167011    3069401      21270      593721

160446  user  SQL statements in session.
70478  internal SQL statements in session.
230924  SQL statements in session.
And now with ReUse Statement set on only those four steps
OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse    67238     10.24      10.75         47       4415          9           0
Execute 101160   1650.25    4040.88    1766325  129765633   11160830     4781797
Fetch    96123    385.50    1024.50     372737   26097251          0      103844
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   264521   2045.99    5076.14    2139109  155867299   11160839     4885641

Misses in library cache during parse: 73
Misses in library cache during execute: 21

Elapsed times include waiting on following events:
Event waited on                             Times   Max. Wait  Total Waited
----------------------------------------   Waited  ----------  ------------
db file sequential read                   1506834        0.61       2839.19
SQL*Net message from client                130312        1.53        258.81
db file scattered read                       8782        0.37        147.01

OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse     1331      0.46       0.46          0        173         16           0
Execute   4044      2.72       5.82      12923      33374      24353      113323
Fetch     5697      8.38      13.43      15550      55431         12       13449
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    11072     11.56      19.72      28473      88978      24381      126772

67425  user  SQL statements in session.
3154  internal SQL statements in session.
70579  SQL statements in session.
  • Nearly all the saving is in parse time of non-recursive statements, the rest is the reduction of recursive SQL because there is less parsing.
  • There is less parsing because there are fewer different SQL statements submitted by Application Engine. The number of user statements has fallen from 160446 to 67425.
  • The number of misses on the library cache has fallen from 25498 to just 73.
  • There has been a reduction in SQL*Net message from client (database idle time) from 296 seconds to 253 because the Application Engine program spends less time compiling SQL statements.
Conclusion Enabling ReUse Statement can have a very significant effect on the performance of Application Engine batches. It is most effective when SQL statements with %BIND() variables are executed within loops. Otherwise, for each execution of the loop, Application Engine must recompile the SQL statement with different bind variable values, which the database will treat as a new statement that must be parsed. SQL parsing is CPU intensive. Reducing excessive parse also reduces CPU consumption on the database server. It can also reduce physical I/O to the database catalogue. On PeopleSoft 8.x applications that use Unicode, the overhead of parsing is magnified by the use of length checking constraints on all character columns. This is no longer an issue in version 9 applications which use character semantics. If you use Oracle's Automatic Memory Management, excessive parsing can cause the database to allocate more memory to the Shared Pool at the expense of the Block Buffer Cache. This in turn can increase physical I/O and can degrade query performance. Bind Variables are a good thing. You should use them. Therefore, ReUse Statement is also a good thing. You should use that too!

Thursday, February 26, 2009

Do You Need More Temporary Table Instances?

When an Application Engine loads a program prior to execution, it attempts to allocate an instance of each temporary record specified in the program to itself. If the allocation of a table fails because there are no available instances, Application Engine is will use the shared instance (unless the program is configured to abort if non-shared tables cannot be assigned). In this case it will write an entry to the message log to warn that the shared instance of the record has been used.

When processes use the shared tables performance is likely to be degraded by contention on the table. The %TruncateTable() PeopleCode macro generates a DELETE by process instance on the shared table instead of a TRUNCATE command.
The problem is that unless you look in the message log, you will not know that this is happening. However, it easy to write a query that will look at the message log table and report whenever this has occurred.

REM tr_moreinst.sql
select  p.message_parm recname, r.prcsname
,  count(*) occurances
,  max(l.dttm_stamp_sec) last_occurance
,  max(p.process_instance) process_instance
from  ps_message_log l
,  ps_message_logparm p
 left outer join psprcsrqst r
 on r.prcsinstance = p.process_instance
where  l.message_set_nbr = 108
and    l.message_nbr = 544
and    p.process_instance = l.process_instance
and    p.message_seq = l.message_seq
and    l.dttm_stamp_sec >= sysdate - 7
group by p.message_parm, r.prcsname
order by 1,2
/
This report tells you which programs failed to allocated instances of which record, how many times that has happened within the last 7 days.
            Processes Unable to Allocate Non-Shared Temporary Record

                                                                  Last
Record          Process                  Last                   Process
Name            Name         Occurrences Occurrence            Instance
--------------- ------------ ----------- ------------------- ----------
TL_ABS_WRK      TL_TIMEADMIN           4 08:24:39 01/01/2009      12345
TL_ATTND_HST1   TL_TIMEADMIN          10 08:23:40 01/01/2009      12345
TL_COMP_BAL     TL_TIMEADMIN          11 08:23:40 01/01/2009      12345
...
NB: Just because an Application Engine could not allocate a non-shared table, does not automatically imply that you need more instances of that record. It could be that other processes had failed, but the temporary tables are still allocated to the process until the process is either restarted and completes successfully, or the process is deleted or cancelled.
You might choose to create some spare instances of records to allow for failed processes, but if you do not clear failed processes you will eventually run out of instances.