Sunday, December 21, 2008

Poor performance of PSPMSESSIONS_VW view affects Performance Monitor System Monitor Component

The System Performance Monitor component (PSPMSYSHEALTH; navigation: PeopleTools -> Performance Monitor -> System Performance) gives an overview of each system monitored by the PeopleSoft Performance Monitor. However, the poor performance of the view PSPMSESSIONS_VW can severely affect this component, to the extent that as transaction history builds up the component will not respond within the timeout.

Below is an extract from a PeopleTools trace. What happened is that the query ran until the Tuxedo service timed out. Tuxedo terminated application server process 31944, and spawned a new process (ID 3598). The user session received an error.
PSAPPSRV.31944 (70)   1-783    11.38.38    0.000072
Cur#1.31944.PMONITOR RC=0 Dur=0.000040 COM
Stmt=select count(*) from pspmsessions_vw where pm_agentid = :1
PSAPPSRV.31944 (70) 1-784 11.38.38 0.000008
Cur#1.31944.PMONITOR RC=0 Dur=0.000001
Bind-1 type=19 length=3 value=689
PSAPPSRV.3598 (112) 1-132 11.42.39 314.495972
Cur#1.3598.PMONITOR RC=0 Dur=0.000095 COM
Stmt=SELECT VERSION FROM PSVERSION WHERE OBJECTTYPENAME = 'SYS'

Even when each query finishes within a reasonable amount of time, the query can be run several times by the component.

This is the definition of PSPMSESSIONS_VW delivered by PeopleSoft.
SELECT T3.PM_CONTEXT_VALUE1
, T3.PM_AGENTID
, T3.PM_AGENT_STRT_DTTM
FROM PSPMTRANSHIST T3
WHERE T3.PM_TRANS_DEFN_ID = 116
AND PM_TRANS_STATUS = '1'
AND PM_TRANS_DURATION <> 0
AND T3.PM_CONTEXT_VALUE1 IN (
SELECT T.PM_CONTEXT_VALUE1
FROM PSPMTRANSHIST T
WHERE T.PM_TRANS_DEFN_SET = 1
AND T.PM_TRANS_DEFN_ID = 109
AND T.PM_MON_STRT_DTTM > %TimeAdd(%CurrentDateTimeIn, -720)
AND T.PM_PARENT_INST_ID <> PM_TOP_INST_ID
AND T.PM_CONTEXT_VALUE1 NOT IN (
SELECT T2.PM_CONTEXT_VALUE1
FROM PSPMTRANSHIST T2
WHERE T2.PM_TRANS_DEFN_SET = 1
AND T2.PM_TRANS_DEFN_ID = 108
AND T2.PM_CONTEXT_VALUE1 = T.PM_CONTEXT_VALUE1
AND T2.PM_MON_STRT_DTTM > %TimeAdd(%CurrentDateTimeIn, -720)))

Firstly, accurate object statistics are required on the transaction history table and its indexes.

An additional index is required on PSPMTRANSHIST:
CREATE INDEX PSPPSPMTRANSHIST ON PSPMTRANSHIST
(PM_TRANS_DEFN_SET
,PM_TRANS_DEFN_ID
,PM_CONTEXT_VALUE1
,PM_MON_STRT_DTTM)
TABLESPACE PSINDEX PCTFREE 0
PARALLEL NOLOGGING COMPRESS 3
/
ALTER INDEX PSPPSPMTRANSHIST NOPARALLEL LOGGING
/

Finally, I have changed the view.
  • The IN() operators have been changed to WHERE EXISTS(). The new index supports the efficient execution of these sub-queries.
  • The sub-queries are now both correlated back to the main query on T3.
  • The ROWNUM criteria have been added to restrict the number of rows the sub-queries can return.

SELECT T3.PM_CONTEXT_VALUE1 /*Session ID*/
, T3.PM_AGENTID
, T3.PM_AGENT_STRT_DTTM
FROM PSPMTRANSHIST T3
WHERE T3.PM_TRANS_DEFN_ID = 116 /*Redirect after login*/
AND T3.PM_TRANS_DEFN_SET = 1 /*added*/
AND T3.PM_TRANS_STATUS = '1'
AND T3.PM_TRANS_DURATION <> 0
AND EXISTS(
SELECT 'x'
FROM PSPMTRANSHIST T
WHERE T.PM_TRANS_DEFN_SET = 1
AND T.PM_TRANS_DEFN_ID = 109 /*User Session Began*/
AND T.PM_MON_STRT_DTTM > %TimeAdd(%CurrentDateTimeIn, -720)
AND T.PM_PARENT_INST_ID <> T.PM_TOP_INST_ID
AND T.PM_CONTEXT_VALUE1 = T3.PM_CONTEXT_VALUE1
AND NOT EXISTS(
SELECT 'x'
FROM PSPMTRANSHIST T2
WHERE T2.PM_TRANS_DEFN_SET = 1
AND T2.PM_TRANS_DEFN_ID = 108 /*User Session Ended*/
AND T2.PM_MON_STRT_DTTM > %TimeAdd(%CurrentDateTimeIn, -720)
AND T2.PM_CONTEXT_VALUE1 = T3.PM_CONTEXT_VALUE1
AND ROWNUM <= 1)
AND ROWNUM <= 1)

The effect of these changes is a significant improvement in the performance of the query. I ran a test query for a single agent on a system with over 3 million rows on PSPMTRANSHIST:
  • Original: 24895 consistent gets
  • New Index: 22547 consistent gets
  • and View Changes: 85 consistent gets
Your mileage may vary, but for me this made the difference between the component being usable and not.

Thursday, August 14, 2008

How to Clear the Application Server Cache Without Shutting it Down

Updated 4.9.2008: It is often necessary to clear the physical cache files on the PeopleSoft application server. You would think that after all the length of time that PeopleTools has been around, that development would have sorted out the problems in object version numbering. Sometimes, recently migrated changes do not take effect until the cache is cleared. Global Support will nearly always ask you to clear the cache if you are experiencing any kind of problem in the PIA. Normally, you would shut down the application server, delete the physical cache directories and restart the application server. From PeopleTools 8.48, each Application server process can only use one dedicated set of physical cache files that is in a directory whose name includes the name and ID number of the server, (previously the server process could use any unlock set cache directories). This allowed PeopleSoft to add an option to the psadmin utility to trigger each PSAPPSRV process to clear out its own physical cache (my thanks to the anonymous question asked after the original version of this posting that reminded me). This option also causes each server process to be recycled. It is the recommended and supported way to clear the physical cache files. This can also be invoked from the command line
psadmin -c purge -d <domain> [-noarch | -arch <archive_directory>] [-log <"log_comments">]
where 'domain' specifies domain name in PS_HOME and
   'archive_directory' specifies location to which to quarantine the purged cache,
   'log_comments' specifies any comments to be added to the purge cache log entry
However, even in previous versions, there has always been a way to invalidate all physical cache files. Any cached object older than the value of LASTREFRESHDTTM on the table PSSTATUS (it was on a different table prior to PeopleTools 8) is purged from the cache when the process that references that cache starts. Therefore, if that value is updated to the current time, the entire cache will be purged.
UPDATE PSSTATUS
SET    LASTREFRESHDTTM = SYSDATE
/
COMMIT
/
Sometimes, developers also have to clear the physical cache on their clients used by the Application Designer. Updating PSSTATUS also clears two-tier client caches. In fact, this behaviour is left over from the days when PeopleTools was a two-tier application, and it was necessary to clear cache files on users' desktop computers.
  • If you have multiple Application Servers on a single database, then you can shut each one down in turn without any loss of server. The users will fail over to the surviving servers. However, this can result in the load being unevenly distributed and could overload one server.
  • In small environments, and this includes most development and test systems, there is only a single application server. Shutting down an application server requires downtime and can be disruptive.
If an application server has at least two instances of a server process then each process can be recycled one by one, and there is no loss of service because the other processes can handle incoming requests. You could do this manually by entering the stop and boot commands in tmadmin (I discussed this in Chapter 13 of PeopleSoft for the Oracle DBA, but didn't combine it with the idea of clearing the Application Server cache). I have produced a script, tuxcycle.sh (available from GitHub) that uses the Tuxedo command line interface utility, tmadmin, to find out which PSAPPSRV processes are running, and then sequentially recycles each of them. I don't think I would completely automate these two operations into a single process, nor would I run them unattended, but you can now clear cache files without taking a system away from users or developers. Remember, that users will experience a reduction in performance after clearing the cache because it needs to be rebuilt by querying the PeopleTools table. The Process Scheduler, Master Process Scheduler, Distribution Server, and Application Engine processes also have physical caches. These are also cleared when LASTREFRESHDTTM is updated and the processes restarted. However, you should not shut down the Process Scheduler while processes are executing. The server process will shut down and the Application Engine will be killed if using an Application Engine server process (PSAESRV). Any stand-alone processes, including psae, will be killed by the Process Scheduler when it is restarted.

Monday, August 04, 2008

Serious fault with PeopleSoft Performance Monitor fixed in PeopleTools 8.49.14

Regular readers of my ramblings will know that I spend a lot of my time working with the PeopleSoft Performance Monitor (PPM). It is an extremely valuable source of performance metrics that are mostly not available, or if they are just difficult to obtain and process.

In PeopleTools 8.49, PPM has a serious fault. It is very similar to scenario reported for PeopleTools 8.44 in GSC Solution 200769188: PerfMon: Too many httpd processes spawned/sockets opened when using PerfMon.

"When a 404 is returned from the monitor server the calling agent code only checks the HTTP return code but does not flush the input stream. This causes the connection to remain open until the GC finally closes the input steam which closes the open connection. This is easily detected by the increasing list of sockets in CLOSE_WAIT state [, as reported by netstat]. On NT there is no hard limit on file descriptors and therefore a crash is not noticed. On Unix platforms this can cause the process to run out of file descriptors before the JVM GC has a chance to run and free up the sockets."

Hence, when running the application server on Unix, you will reach a point where JSH processes cannot open new ports and PIA users can receive the 'Application Server is Down' message. So, this can cause the system to become unavailable until the application server is rebooted. To prevent this occurring, the PPM must be deconfigured. The Monitor URL should be set to 'NONE'.

According to PeopleSoft Global Support this problem will be resolved in PeopleTools 8.49 patch 14.

Tuesday, July 01, 2008

Sequence Number Allocation in PeopleSoft

In PeopleSoft for the Oracle DBA Chapter 8 (p168-170) in a section called Sequence Numbers and Concurrency I discuss the implications of how this is done in HR8.4 as a part of the save time processing in PeopleCode FUNCLIB_HR.EMPLID.FieldFormula.

[FUNCLIB_HR.EMPLID.FieldFormula]
SQLExec("Update PS_INSTALLATION Set EMPLID_LAST_EMPL = EMPLID_LAST_EMPL + 1");
SQLExec("Select EMPLID_LAST_EMPL From PS_INSTALLATION", &EMPLID);


The problem is that while this update remains uncommitted, nobody else can obtain a new employee ID because the row is locked. This design has scalability issues. PeopleSoft applications generally get away with this because the update is performed as a part of save time processing which is committed when that is complete. Thus the lock is held for only a short time, and never waits for a human response. It executes in the application server which should be close to the database, so there is no network latency.

However, I was looking at this same PeopleCode in HR9.0 and I have noticed that is has changed. The change appears to have been introduced in HR8.8, and uses a new built-in PeopleCode function, GetNextNumberWithGapsCommit(record.field, max_number, increment [, WHERE_Clause, paramlist]), that was introduced in PeopleTools 8.44.

PeopleBooks describes the function thus:
"Use the GetNextNumberWithGapsCommit function to return the sequence number value plus increment for the given field residing in the given record. This function also enables you to specify a SQL Where clause as part of the function for maintaining multiple sequence numbers in a single record.

This function is typically used for obtaining a new sequence number for the application ... The sequence number (record.field ) is incremented right away and it doesn't hold any database internal row lock beyond the execution of this function.

Note. A secondary database connection is used to increment and retrieve record.field. The default behaviour is to keep the secondary database connection persistent in order to improve performance for the next GetNextNumberWithGapsCommit usage. If the database administrator finds the persistent connection too high an overhead for the production environment (which should not be the case since PeopleSoft uses application server to multiplex the database connection), the database administrator can change the default behaviour to use an on-demand connection method. The persistent second connection is disabled using DbFlags bit eight in the application server and process scheduler configuration files. The second connection can be completely disabled using DbFlags bit four in the application server and process scheduler configuration files

The following restrictions apply to the GetNextNumberWithGapsCommit function:
  • The function is enabled for Application Engine programs running in stand-alone mode (via PSAE). This function is disabled for Application Engine programs that execute through the PeopleSoft Application Engine server (PSAESRV).
  • This function is enabled for PeopleSoft Application Engine programs running under Unix System Services on z/OS.
  • PeopleSoft does not recommend Using both the GetNextNumberWithGapsCommit function and the GetNextNumber function in the same application, on the same table, in the same unit of work. This can lead to lock contention or deadlocking.
  • For a DB2 z/OS database, isolate the table that contains the sequence number to its own tablespace and set the locksize parameter to row."

Hence, if migrating data into a system via a Component Interface (eg. employees from a legacy system where new EMPLIDs will be allocated), and you need to do this with multiple concurrent process to achieve throughput, it is advantageous to use PSAE rather than PSAESRV processes, otherwise they are likely to serialise.

This is the PeopleTools trace produced when I added an new person to an HR system. You can see the second connection to the database is created at line 51935, and then the row is updated, queried and immediately committed. The lock is only held for only 35ms (on my laptop), instead of holding it until all the save-time processing completes.

PSAPPSRV.4180 1-51932 23.26.14 0.000000 >>> start-ext Nest=01 assign_employee_id FUNCLIB_HR.EMPLID.FieldFormula
PSAPPSRV.4180 1-51933 23.26.14 0.019000 Cur#2.4180.HCM89 RC=0 Dur=0.015000 Connect=Secondry/HCM89/SYSADM/
PSAPPSRV.4180 1-51934 23.26.14 0.000000 SamCreateSecondary ---- Successful obtain Second DB connection
PSAPPSRV.4180 1-51935 23.26.14 0.000000 Cur#2.4180.HCM89 RC=0 Dur=0.000000 COM Stmt=UPDATE PS_INSTALLATION SET EMPLID_LAST_EMPL = EMPLID_LAST_EMPL + 1
PSAPPSRV.4180 1-51936 23.26.14 0.028000 Cur#2.4180.HCM89 RC=0 Dur=0.028000 EXE
PSAPPSRV.4180 1-51937 23.26.14 0.000000 Cur#2.4180.HCM89 RC=0 Dur=0.000000 COM Stmt=SELECT EMPLID_LAST_EMPL FROM PS_INSTALLATION
PSAPPSRV.4180 1-51938 23.26.14 0.005000 Cur#2.4180.HCM89 RC=0 Dur=0.005000 EXE
PSAPPSRV.4180 1-51939 23.26.14 0.000000 Cur#2.4180.HCM89 RC=0 Dur=0.000000 Fetch
PSAPPSRV.4180 1-51940 23.26.14 0.002000 Cur#2.4180.HCM89 RC=0 Dur=0.002000 Commit
PSAPPSRV.4180 1-51941 23.26.14 0.000000 > GetNextNumberWithGapsCommit info ( INSTALLATION, EMPLID_LAST_EMPL, (null) )


This approach is certainly an improvement over the previous behaviour. It is similar to an Oracle Sequence without any caching which increments a value stored on SYS.SEQ$ in an autonomous transaction which it immediately commits without affecting the main transaction (except that Oracle does this in the same session, and here PeopleSoft have created a separate session).

Also, like an Oracle sequence, if the main transaction fails you are left with a gap in your numbers. Some of the dictionary definitions of the word sequence imply a fixed increment or decrement between values (usually 1). In some ways, it would be more helpful to think of this a way of generating a unique identifier rather than a sequence number.

By looking at v$session you can see that a second session is created for the same client process.
Session  Serial Program                          Client    Shadow
ID Number Name Process Proces
------- ------- -------------------------------- --------- ------
Client Info LOGON_TIME
----------------------------------------------------------- -------------------
147 35 PSAPPSRV.exe 4180:7720 5824
PS,,GO-FASTER-4.london.go-faster.co.uk,HCM89,PSAPPSRV.exe, 23:08:40 18/06/2008

139 367 PSAPPSRV.exe 4180:7720 5824
23:26:14 18/06/2008

The DbFlags field in the Application Server and Process Scheduler parameter filess controls how PSAPPSRV and PSAE in creates and terminates the secondary session.
;-------------------------------------------------------------------------
; DbFlags Bitfield
;
; Bit Flag
; --- ----
; 4 - Disable Second DB Connection
; 8 - Disable Persistent Secondary DB Connection
DbFlags=0

By default, the second session persists until the PSAPPSRV session terminates when either the application server is shut down or the PSAPPSRV process recycles when the number of services reaches the recycle_count specified in the application server configuration file.

Obviously, disabling the second database connection would effectively negate the advantages of this new functionality.

There is additional overhead to the second connection, and if DbFlags is set to 8 the secondary session would be shut down after the incremented sequence number has been returned. However, in a large, highly active system, where sequence numbers are requested frequently, this would lead to frequent creation of secondary sessions. The process of allocating new sequence numbers would also have to wait for the new session to be created.

Note, CLIENT_INFO is not set in the secondary session. However, it is possible to copy the CLIENT_INFO from the primary session using an ON LOGON trigger (which can also be dowloaded from the Go-Faster website).
REM second_client_info.sql
rem (c)Go-Faster Consultancy 2008

spool second_client_info

rem requires following privileges to be granted explicitly
GRANT EXECUTE ON sys.dbms_application_info TO sysadm;
GRANT SELECT ON sys.v_$session TO sysadm;
GRANT SELECT ON sys.v_$process TO sysadm;
GRANT SELECT ON sys.v_$mystat TO sysadm;

CREATE OR REPLACE TRIGGER sysadm.connect_secondary_session
AFTER LOGON
ON sysadm.schema
DECLARE
l_client_info VARCHAR2(64);
BEGIN
SELECT SUBSTR(p.client_info||'Secondary Session',1,64)
INTO l_client_info
FROM v$session p --primary session
, v$session s --secondary session
WHERE s.program = p.program --same program
AND s.process = p.process --same process
AND s.machine = p.machine --same machine
AND s.user# = p.user# --same user
AND s.sid != p.sid --different session
AND s.client_info IS NULL --client info not already set
AND s.sid = (SELECT m.sid
FROM v$mystat m
WHERE rownum = 1)
AND p.logon_time < client_info="">l_client_info);
sys.dbms_application_info.set_action(action_name=>'Secondary Session');

EXCEPTION WHEN NO_DATA_FOUND THEN NULL;
END;
/

show errors
The result is that when the second session is created the trigger fires and copies the client info from the first session and appends the string ',Second'.
LOGON_TIME          Client Info
------------------- ----------------------------------------------------------------
21:44:37 26/06/2008 PS,,GO-FASTER-4.london.go-faster.co.uk,HCM89,PSAPPSRV.exe,
08:01:29 01/07/2008 PS,,GO-FASTER-4.london.go-faster.co.uk,HCM89,PSAPPSRV.exe,Second

Thursday, June 26, 2008

Discrepancy in Timings between Application Engine and Oracle SQL Trace

I would like to draw attention to the risk of uncritically accepting performance information from either a tool or an instrumented program. It is always necessary to consider whether you can believe what you are being told, or whether something is lying to you.

In this case, I want to highlight in the Application Engine Timings report. Take the following example from Time and Labor. The AE Timings report claims that the process spends 91.5% of the execution time in SQL. This would lead you to believe that this process is affected by the performance of the SQL, and that you need to do some SQL performance tuning.

---------------------------------------------------------------------------------
Total run time : 1411.6
Total time in application SQL : 1291.7 Percent time in application SQL : 91.5%
Total time in PeopleCode : 119.4 Percent time in PeopleCode : 8.5%
Total time in cache : 0.1 Number of calls to cache : 3
---------------------------------------------------------------------------------

However, all is not quite as it appears. Faced with a potential SQL problem, I usually turn to Oracle SQL Trace. The following is taken from a TKPROF analysis of the trace file for the same process.

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 153883 1.80 2.41 0 4 0 0
Execute 192806 429.94 464.96 8398 276000 588543 123140
Fetch 52740 2.75 3.55 590 202419 0 15265
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 399429 434.49 470.93 8988 478423 588543 138405

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 196521 0.00 0.13
SQL*Net message from client 196521 0.32 849.98
...
SQL*Net more data to client 114884 0.00 1.91
SQL*Net more data from client 250825 0.05 1.88
...

The database reports 849.98s for the idle event SQL*Net message from client, and only 470.93s on non-idle activity. So only 36% of the total elapsed time is spent in the database, the other 64% is time that the Application Engine process is busy executing code.

Why the discrepancy?

This is certainly a very chatty process. There are nearly 200,000 round-trip SQL*net messages between AE and Oracle. It could be a network latency problem. However, very little time is spent on the SQL*Net more data events so it is unlikely that there is a significant network component.

Let's look at the statements that have the longest times in AE timing report .

                         PeopleSoft Application Engine Timings
(All timings in seconds)

C o m p i l e E x e c u t e F e t c h Total
SQL Statement Count Time Count Time Count Time Time
------------------------------ ------- -------- ------- -------- ------- -------- --------
PeopleCode

DELETE PS_SCH_ADHOC_DTL 36540 60.6 0 0.0 60.6
INSERT PS_SCH_ADHOC_DTL 71887 885.6 0 0.0 885.6
Record.SelectByKey SCH_ADHOC_DTL 36596 118.0 36596 0.0 118.0
--------
1291.7

In less than 24 minutes, some statement have been executed very frequently. So, there were lots of SQL*Net messages, because there were lots of SQL statements. These SQL statements are submitted from PeopleCode. It is very likely we are in some sort of procedural loop, and therein lies the problem.

PeopleSoft programs can only measure the time taken to execute SQL from the perspective of the client program. In this case the timings are calculated by instrumentation inside Application Engine and PeopleCode. Meanwhile, the timings in the Oracle trace are obtained from instrumentation within the Oracle kernel.

There is still a fair amount of code in PeopleSoft that executes during the time counted as execution of SQL Statements. This is not just the SQL*Net libraries. I think that the timings include most of the time it takes for sqlexec() and Row set functions to execute. The discrepancy with the Oracle time mounts with every SQL operation.

In this case, there is very little to be gained by tuning the SQL. We can't do much about the PeopleCode because it is delivered vanilla functionality. All we can do is to bring more CPU to bear upon the problem by running multiple concurrent instances of the process.

Tuesday, June 03, 2008

Oracle Optimizer Statistics and Optimizer Dynamic Sampling with PeopleSoft Temporary Records

PeopleSoft Temporary Records are used for working storage during Application Engine programs. Typically, AE programs truncate and repopulate the tables before using them. PeopleSoft recognised the need to keep the statistics on these tables in line with the data that they contain, and so used the %UpdateStats macro in many places in delivered programs to update the statistics.

However, frequently gathering statistics on even small tables can become time consuming. Recently, I have been working on PeopleSoft Time and Labor. This makes heavy use of temporary records. In a single execution of TL_TIMEADMIN, several tables associated with temporary records are truncated, repopulated and analyzed many times. I discussed the problem with excessive use of truncate elsewhere giving rise to Local Write Wait.

Oracle also recognised this problem, and in version 9 of the database they introduced Optimizer Dynamic Sampling, where the database samples the data to generate statistics at statement parse time.

I am still testing, but on Oracle 10gR2 (version 10.2.0.3) I have obtained improvements in performance and stability of T&L AE processes by:
  1. Deleting optimizer statistics on tables associated with temporary record in order to force the optimizer to sample at parse time
  2. Locking optimizer statistics to prevents the %UpdateStats macro from putting them back on. Tables with locked statistics are also omitted by GATHER_SCHEMA_STATS and GATHER_DATABASE_STATS (unless the force option is specified) and hence also by the delivered maintenance window job to refresh stale statistics.
  3. Implementing alternative DDL model that uses a PL/SQL packaged function to suppress the error when attempting to collect statistics on table whose statistics are locked (see %UpdateStats() -v- Optimizer Dynamic Sampling). This also addresses the the mix-up in the DDL models
  4. The final piece of the puzzle has been to set OPTIMIZER_DYNAMIC_SAMPLING to 4 at instance level. I certainly have had problems with this parameter set to the default of 2.
The dynamic sampling levels are described in the Performance Tuning Guide 14.5.6.4.
  • Level 2: Apply dynamic sampling to all unanalyzed tables.
  • Level 3: As Level 2, plus all tables for which standard selectivity estimation used a guess for some predicate that is a potential dynamic sampling predicate.
  • Level 4: As Level 3, plus all tables that have single-table predicates that reference 2 or more columns.
So the next stage is to identify working storage records and their associated tables.
I started off looking for tables that had recently been analysed.


The following script identifies all instances of temporary tables associated with temporary records, and then deletes and locks the statistics. I started by restricting it to list of specific tables, but I think it would be perfectly reasonable to take this approach with all temporary records.


BEGIN
 FOR x IN (
  SELECT /*+LEADING(o i r v)*/ t.table_name, t.last_analyzed, t.num_rows
  ,      s.stattype_locked
  FROM pstemptblcntvw i
    INNER JOIN psrecdefn r
    ON r.recname = i.recname
    AND r.rectype = '7'
  , psoptions o
  , user_tables t
     LEFT OUTER JOIN user_tab_statistics s
     ON  s.table_name = t.table_name
     AND s.partition_name IS NULL
  , (SELECT rownum row_number
     FROM   psrecdefn 
     WHERE  ROWNUM <= 100) v                 
  WHERE  v.row_number <= i.temptblinstances + o.temptblinstances
  AND    t.table_name = DECODE(r.sqltablename,' ','PS_'||r.recname,r.sqltablename)
                      ||DECODE(v.row_number*r.rectype,100,'',LTRIM(TO_NUMBER(v.row_number))) 
/*---------------------------------------------------------------------            
--AND    r.recname IN('TL_PMTCH1_TMP' --TL_TA000600.SLCTPNCH.STATS1.S…
--                   ,'TL_PMTCH2_TMP' --TL_TA000600.CALC_DUR.STATS1.S…)
-----------------------------------------------------------------------*/
  AND   (/*  t.num_rows        IS NOT NULL --not analyzed 
        OR   t.last_analyzed   IS NOT NULL --not analyzed
        OR*/ s.stattype_locked IS     NULL --stats not locked
        ) 
) LOOP
  IF x.last_analyzed IS NOT NULL THEN --delete stats
   dbms_output.put_line('Deleting Statistics on '||user||'.'||x.table_name);
   dbms_stats.delete_table_stats(ownname=>user,tabname=>x.table_name,force=>TRUE);
  END IF;
  IF x.stattype_locked IS NULL THEN --lock stats
   dbms_output.put_line('Locking Statistics on '||user||'.'||x.table_name); 
   dbms_stats.lock_table_stats(ownname=>user,tabname=>x.table_name);
  END IF;
 END LOOP;
END;
/


Updated 11.2.2009: I have updated my advice on the use of Optimiser Dynamic Sampling (see %UpdateStats -v- Optimizer Dynamic Sampling. I still consider this to be a useful feature, but I have found scenarios where Oracle has not chosen a better plan that it did choose with explicitly gathered statistics. Therefore, I still suggest locking statistics on temporary working storage record, but where batch programs have been coded to explicitly update statistics then dbms_stats should be called with the force option to override the lock. I have updated my
DDL model wrapper script accordingly.

The scripts in this posting can be downloaded from my website

Monday, May 26, 2008

Performance Tuning the Performance Monitor Archive Process

I have been working with a PeopleSoft Performance Monitor (PPM) system that is collecting data for 7 monitored systems. Three of those systems are very active. We want to keep data for a rolling 7 days. We reached the point where the PPM archive process could not rows as fast as they were arriving! I have, of course, reduced the sample frequency of events and transactions, but the PPM is receiving between 20 and 30 million rows of history data per week.

It is also worth remarking here that since the application servers, web servers and Process Schedulers were moved to commodity Intel hardware there are now a larger number of these components, and PPM is receiving more event data.

I have taken a look inside the PPM archive process. PPM history data is removed by a PeopleCode program (PSPM_ARCHIVE.ARCHIVE.GBL.default.ARCPCODE.OnExecute) that is called from the Application Engine.

Rows of transaction date to be archived are identified by the following SQL (and there is a similar one for event data):

&TransHistSQL.Open("SELECT X.PM_INSTANCE_ID, X.PM_TRANS_DEFN_SET, X.PM_TRANS_DEFN_ID, X.PM_AGENTID, X.PM_TRANS_STATUS, X.OPRID, X.PM_PERF_TRACE, X.PM_CONTEXT_VALUE1, X.PM_CONTEXT_VALUE2, X.PM_CONTEXT_VALUE3, X.PM_CONTEXTID_1, X.PM_CONTEXTID_2, X.PM_CONTEXTID_3, X.PM_PROCESS_ID, %DateTimeOut(X.PM_AGENT_STRT_DTTM), %DateTimeOut(X.PM_MON_STRT_DTTM), X.PM_TRANS_DURATION, X.PM_PARENT_INST_ID, X.PM_TOP_INST_ID, X.PM_METRIC_VALUE1, X.PM_METRIC_VALUE2, X.PM_METRIC_VALUE3, X.PM_METRIC_VALUE4, X.PM_METRIC_VALUE5, X.PM_METRIC_VALUE6, X.PM_METRIC_VALUE7, X.PM_ADDTNL_DESCR, Z.PM_ARCHIVE_MODE FROM PSPMTRANSHIST X, PSPMAGENT Y, PSPMSYSDEFN Z WHERE X.PM_AGENTID=Y.PM_AGENTID AND Y.PM_SYSTEMID=Z.PM_SYSTEMID AND (Z.PM_ARCHIVE_MODE='1' OR Z.PM_ARCHIVE_MODE='2') AND %DateTimeDiff(X.PM_MON_STRT_DTTM, %CurrentDateTimeIn) >= (PM_MAX_HIST_AGE * 24 * 60)");

Lets look at the last condition in the WHERE clause (in bold). It uses the %DateTimeDiff PeopleCode macro to calculate the age of the row of data. This expands to.

… AND ROUND((CAST(( SYSDATE) AS DATE) - CAST((X.PM_MON_STRT_DTTM) AS DATE)) * 1440, 0) >= (PM_MAX_HIST_AGE * 24 * 60)

However, putting a function around a column prevents the database from using any index on that column. In this case it is not possible to create an function-based index to satisfy this expression because it contains SYSDATE.

The reaction of Oracle's Optimizer is to full scan the PSPMTRANSHIST table. With millions of rows of data in this table, a full scan is simply not going to perform adequately.

Instead, I would rather code something simple, but Oracle specific (and if I am going to do Oracle specific date arithmetic then there is no need to use %CurrentDateTimeIn).

… AND X.PM_MON_STRT_DTTM <= SYSDATE - PM_MAX_HIST_AGE


The PeopleCode can be adjusted to be sensitive to the database platform like this.


/*-- SELECT THE ROWS FROM PSPMTRANSHIST ELIGLIBLE FOR ARCHIVING */
If %DbType = "ORACLE" Then /*dmk 19.5.2008 oracle specific performance tuning*/
&TransHistSQL.Open("SELECT X.PM_INSTANCE_ID, X.PM_TRANS_DEFN_SET, X.PM_TRANS_DEFN_ID, X.PM_AGENTID, X.PM_TRANS_STATUS, X.OPRID, X.PM_PERF_TRACE, X.PM_CONTEXT_VALUE1, X.PM_CONTEXT_VALUE2, X.PM_CONTEXT_VALUE3, X.PM_CONTEXTID_1, X.PM_CONTEXTID_2, X.PM_CONTEXTID_3, X.PM_PROCESS_ID, %DateTimeOut(X.PM_AGENT_STRT_DTTM), %DateTimeOut(X.PM_MON_STRT_DTTM), X.PM_TRANS_DURATION, X.PM_PARENT_INST_ID, X.PM_TOP_INST_ID, X.PM_METRIC_VALUE1, X.PM_METRIC_VALUE2, X.PM_METRIC_VALUE3, X.PM_METRIC_VALUE4, X.PM_METRIC_VALUE5, X.PM_METRIC_VALUE6, X.PM_METRIC_VALUE7, X.PM_ADDTNL_DESCR, Z.PM_ARCHIVE_MODE FROM PSPMTRANSHIST X, PSPMAGENT Y, PSPMSYSDEFN Z WHERE X.PM_AGENTID=Y.PM_AGENTID AND Y.PM_SYSTEMID=Z.PM_SYSTEMID AND (Z.PM_ARCHIVE_MODE='1' OR Z.PM_ARCHIVE_MODE='2') AND X.PM_MON_STRT_DTTM <= SYSDATE - PM_MAX_HIST_AGE


and similarly for event data


/*-- SELECT THE ROWS FROM PSPMEVENTHIST ELIGLIBLE FOR ARCHIVING */
If %DbType = "ORACLE" Then
/*dmk 19.5.2008 - oracle specific performance tuning*/
&EventHistSQL.Open("SELECT X.PM_INSTANCE_ID, X.PM_EVENT_DEFN_SET, X.PM_EVENT_DEFN_ID, X.PM_AGENTID, %DateTimeOut(X.PM_AGENT_DTTM), %DateTimeOut(X.PM_MON_DTTM), X.PM_PROCESS_ID, X.PM_FILTER_LEVEL,X.PM_METRIC_VALUE1, X.PM_METRIC_VALUE2, X.PM_METRIC_VALUE3, X.PM_METRIC_VALUE4, X.PM_METRIC_VALUE5, X.PM_METRIC_VALUE6, X.PM_METRIC_VALUE7, X.PM_ADDTNL_DESCR, Z.PM_ARCHIVE_MODE FROM PSPMEVENTHIST X, PSPMAGENT Y, PSPMSYSDEFN Z WHERE X.PM_AGENTID=Y.PM_AGENTID AND Y.PM_SYSTEMID=Z.PM_SYSTEMID AND (Z.PM_ARCHIVE_MODE='1' OR Z.PM_ARCHIVE_MODE='2') AND X.PM_MON_DTTM <= SYSDATE - PM_MAX_HIST_AGE


The resulting query needs an index on PM_AGENTID and PM_MON_STRT_DTTM (on both PSPMTRANSHIST and PSPMEVENTHIST). There are delivered indexes that lead on the PM_AGENTID and have PM_MON_STRT_DTTM further down, but a dedicated index is beneficial.

Sunday, May 18, 2008

Enabling Oracle Database Trace on PeopleSoft processes with a Trigger (improved)

In Chapter 11 of PeopleSoft for the Oracle DBA, I suggested using a database trigger to enable Oracle's SQL Trace facility. Then, in chapter 14, I introduced a further trigger to disable trace for Application Engine processes, in case they are run with the PSAESRV process.

This was not a good idea because the cursors opened by the Application Engine program are still open when the AE program completed, and the process updates its status from 7 (processing) to 9 (success) or 3 (error). This causes the unset_trace trigger to fire and disable SQL Trace. Consequently, the STAT lines for these cursors are never emitted to the trace file. The STAT lines contain the execution plans and a wealth of other information.

Updated 26.4.2021: Instead, I have a new version of the trace trigger (trace_trigger.sql can be downloaded from my Github repository of PeopleSoft scripts).
spool trace_trigger
rem (c)Go-Faster Consultancy Ltd. 2008

rem 1.5.2008 - moved logic to disable trace from unset_trace trigger into main trigger 
rem because cursors in AE not shut until after status is changed away from processing

ROLLBACK;
----------------------------------------------------------------
REM explicit grants by sys required on following privileges
----------------------------------------------------------------
GRANT ALTER SESSION TO sysadm;
GRANT EXECUTE ON sys.dbms_monitor TO sysadm;
GRANT EXECUTE ON sys.dbms_application_info TO SYSADM;
----------------------------------------------------------------

CREATE OR REPLACE TRIGGER sysadm.set_trace
BEFORE UPDATE OF runstatus ON sysadm.psprcsrqst
FOR EACH ROW
WHEN (new.runstatus = 7 AND old.runstatus != 7 AND new.prcstype != 'PSJob')(1)
DECLARE
  l_waits BOOLEAN := TRUE;(2)
  l_binds BOOLEAN := FALSE;
BEGIN
  --set module and action whether we are tracing or not
  sys.dbms_application_info.set_module((3)
 module_name => :new.prcsname,
 action_name => 'PI='||:new.prcsinstance
  );
   
  IF ( :new.runcntlid LIKE 'TRACEME%'(5)
  ----------------------------------------------------------------
  --code conditions for enabling trace here instead of when clause
  ----------------------------------------------------------------
  --  OR (    SUBSTR(:new.prcsname,1,3) = 'TL_'
  --      AND :new.rqstdttm <= TO_DATE('20080509','YYYYMMDD'))
  ----------------------------------------------------------------
     ) THEN

    --if we are going to trace, then set tracefile identifier
    EXECUTE IMMEDIATE 'ALTER SESSION SET TRACEFILE_IDENTIFIER(4) = '''||
      TRANSLATE(:new.prcstype     ,' -','__')||'_'||
      TRANSLATE(:new.prcsname     ,' -','__')||'_'||
      :new.prcsinstance||'_'||
       TRANSLATE(:new.servernamerun,' -','__')||
       '''';

    EXECUTE IMMEDIATE 'ALTER SESSION SET TIMED_STATISTICS = TRUE';
    EXECUTE IMMEDIATE 'ALTER SESSION SET MAX_DUMP_FILE_SIZE = 2097152'; --1Gb
    EXECUTE IMMEDIATE 'ALTER SESSION SET STATISTICS_LEVEL=ALL';

    ----------------------------------------------------------------
    --logic to determine whether you want to trace binds also
    ----------------------------------------------------------------
    IF :new.runcntlid LIKE 'TRACE%BIND%' THEN(6)
      l_binds := TRUE;
    END IF;
    ----------------------------------------------------------------

    sys.dbms_monitor.session_trace_enable(waits=>TRUE,binds=>l_binds);(7)
--  EXECUTE IMMEDIATE 'ALTER SESSION SET EVENTS ''10046 TRACE NAME CONTEXT FOREVER, LEVEL 12''';

  ELSIF :new.prcstype = 'Application Engine' THEN(8)
    --explicitly disable trace if application server process
    sys.dbms_monitor.session_trace_disable;

    --reset max dump file size AFTER disabling trace
    EXECUTE IMMEDIATE 'ALTER SESSION SET MAX_DUMP_FILE_SIZE = 5M';

    --if not tracing, then reset tracefile identifier
    EXECUTE IMMEDIATE 'ALTER SESSION SET TRACEFILE_IDENTIFIER(4) = ''''';
  END IF;
EXCEPTION WHEN OTHERS THEN NULL;
END;
/

show errors

DROP TRIGGER sysadm.unset_trace;

rem test that the trigger fires by updating something
UPDATE  sysadm.psprcsrqst new
SET     runstatus = 7
WHERE   runstatus != 7
AND     new.prcstype != 'PSJob'
AND    rownum < 1
--AND 1=2
;

ROLLBACK;
spool off
1. The trigger fires when any process begins. Previously I used logic in the WHEN clause to restrict when it fired.
2. Wait events are collected in the trace by default.
3. Irrespective of whether trace is enabled, the trigger also calls the dbms_application_info package to set the module name to the PeopleSoft Process Name, and action to the PeopleSoft Process Instance number.
4. Also irrespective of whether trace is enabled, the trace file identifier is set so that the trace file name includes the Process Type, Process Name, Process Instance Number, and the name of the Process Scheduler.
(Updated 6.2.2009) Previous versions of this trigger set tracefile identifier irrespective of whether trace was enabled. Unfortunately, this causes a small trace file to be generated for every process.5. Logic in the trigger determines whether a SQL Trace will be enabled. The supplied version of the trigger traces processes whose run controls begin with 'TRACE'
6. Additional logic has been added to also enable bind variables to be captured if the run control begins with 'TRACEBIND'.
7. The dbms_monitor package is new in Oracle 10g and is the only officially supported way to enable extended trace. If using the trigger on earlier versions of the database, it should be replaced with the commented code to set event 10046.
8. If it is not required to SQL Trace a process, and the process is an Application Engine program, then trace is explicitly disabled. Thus, if a PSAESRV process has traced an AE program, trace is disabled when it next handles a request. This has the advantage that the stat lines for the cursors are written to the trace file. However, the disadvantage is that the process continues to trace between requests, and the time between requests is reported as 'SQL*Net message from client'.

Wednesday, April 30, 2008

Bulk Re-Sending Batch Output to Report Repository

If output files from a batch process fail to post to the report repository, you can request that they are resent within Process Monitor. However, if you have had a problem with posting report output for a period of time you could have a large number of processes that have not posted, and it isn't reasonable to request that each of them are resent individually. I have not found anything delivered to repost all unposted content.

If you trace the Process Monitor while resending content for a single process you can see what is updated to force a resend.

PSAPPSRV.20142 (1880)    1-16     11.07.51    0.000039 Cur#2.20142.PMONITOR RC=0 Dur=0.000025 COM Stmt=UPDATE PSPRCSRQST SET DISTSTATUS = :1 WHERE PRCSINSTANCE = :2
PSAPPSRV.20142 (1880) 1-17 11.07.51 0.000009 Cur#2.20142.PMONITOR RC=0 Dur=0.000001 Bind-1 type=2 length=1 value=7
PSAPPSRV.20142 (1880) 1-18 11.07.51 0.000008 Cur#2.20142.PMONITOR RC=0 Dur=0.000001 Bind-2 type=8 length=4 value=458
PSAPPSRV.20142 (1880) 1-19 11.07.51 0.000886 Cur#2.20142.PMONITOR RC=0 Dur=0.000023 COM Stmt=UPDATE PSPRCSQUE SET DISTSTATUS = :1 WHERE PRCSINSTANCE = :2
PSAPPSRV.20142 (1880) 1-20 11.07.51 0.000009 Cur#2.20142.PMONITOR RC=0 Dur=0.000001 Bind-1 type=2 length=1 value=7
PSAPPSRV.20142 (1880) 1-21 11.07.51 0.000008 Cur#2.20142.PMONITOR RC=0 Dur=0.000001 Bind-2 type=8 length=4 value=458
PSAPPSRV.20142 (1880) 1-22 11.07.51 0.000424 Cur#2.20142.PMONITOR RC=0 Dur=0.000027 COM Stmt=UPDATE PS_CDM_LIST SET DISTSTATUS = '8',TRANSFERINSTANCE = 0 WHERE PRCSINSTANCE = :1 AND DISTSTATUS <> '5'
PSAPPSRV.20142 (1880) 1-23 11.07.51 0.000009 Cur#2.20142.PMONITOR RC=0 Dur=0.000001 Bind-1 type=8 length=4 value=458

Hence it is easy to build this simple PL/SQL loop to request to all unsent content is reposted.
BEGIN
FOR x in (
SELECT prcsinstance
FROM psprcsrqst
WHERE diststatus = 4 --that are not posted
AND runstatus = 9 --successful processes
)
LOOP
UPDATE PSPRCSRQST
SET DISTSTATUS = '7'
WHERE PRCSINSTANCE = x.prcsinstance;

UPDATE PSPRCSQUE
SET DISTSTATUS = '7'
WHERE PRCSINSTANCE = x.prcsinstance;

UPDATE PS_CDM_LIST
SET DISTSTATUS = '8'
, TRANSFERINSTANCE = 0
WHERE PRCSINSTANCE = x.prcsinstance
AND DISTSTATUS <> '5'; --posted
END LOOP;
END;
/
COMMIT
/

As always, the standard rules for updating PeopleTools tables apply. Don't do it unless you are sure you know what you are doing, because it isn't supported, and if it goes wrong you are on your own!

Wednesday, January 09, 2008

Global Temporary Tables and PeopleSoft Temporary Records

At a round table at the UKOUG 2007 conference, the question of reducing redo during Application Engine batch programs was mentioned. For example, In both Global Payroll and Time & Labor, AE programs will shuffle data through a series of working storage tables before putting them into the final tables.

Using any RDBMS (not just Oracle) as a temporary working store is not efficient. The database will certainly store that data, but it will go to a lot of time and trouble to make sure that it is stored securely and can be recovered in the event of a disaster. In an Oracle database, every insert into these working storage tables will generate redo, and the user process must wait for the redo to be written to disk when the session commits. Excessive redo generation can become a performance issue, it can also impact on a Data Guard DR solution, and on the time taken to perform incremental backup and recovery. However, it is not necessary to be able to recover the data in the working storage tables, only the final result data.

Effect of Frequently Truncating Tables

I have observed significant amounts of time spent truncating working storage tables during batch programs, especially when concurrent process frequently truncate working storage tables. This is a typical example from a Statspack report.
Top 5 Timed Events                                Avg %Total
~~~~~~~~~~~~~~~~~~                               wait   Call
Event                           Waits Time (s)   (ms)   Time
----------------------------- ------- -------- ------ ------
local write wait                5,756    3,955    687   52.6
enq: RO - fast object reuse     1,500    2,685   1790   35.7

From Metalink, Oracle Note 334822.1 sums up the situation: "Processes that involve temporary tables being truncated and repopulated in multiple, concurrent batch streams may present this situation. The underlying problem is we have to write the object's dirty buffers to disk prior to actually truncating or dropping the object. This ensures instance recoverability and avoids a stuck recovery. It seems at first glance perfectly reasonable to simply truncate a temporary table, then repopulate for another usage. And then to do the temporary populate/truncate operations in concurrent batches to increase throughput. However, in reality the concurrent truncates get bogged down as dbwr gets busy flushing those dirty block buffers from the buffer cache. You will see huge CI enqueue waits. The multiple truncate operations in concurrent streams absolutely kill throughput. This is specially critical with large buffers.

"There was also a discussion in Bug: 4147840 (non-publish) where a PeopleSoft process was causing this behaviour because of the above explanation and they seemed to fix it by changing some PeopleSoft code to implement delete rather than truncate on small temporary tables."

But if we go back to deleting rows from these tables we will copy the contents to the redo log. Furthermore, deleting data will not reset the high watermark on the table.

GTTs

One option is to recreate working storage tables as Oracle Global Temporary Tables (GTT). There are two major advantages to GTTs.
  • They do not generate redo, although they do generate undo. Most of the time batch processes will be inserting data into these tables. The undo on the insert is just to delete the data and so is much smaller. If the table cleared by truncation rather rather than deleting the data, there is virtually no undo generated. The problem with local write wait and contention on the RO enqueue does not occur with GTTs.
  • For each session that references the same GTT, Oracle will create a separate physical copy of the table in the temporary segment. The Global Payroll engine is written in COBOL, and so it uses ordinary PeopleSoft records for working storage. If multiple 'streams' are used to run different sets of data simultaneously, the processes share the same working storage tables. Each session will have to scan through and discard working storage data from other sessions, and when updating or deleting from temporary tables the Oracle may have to generate read consistent copies of these data blocks. The same happens in Application Engine when there are no unlocked instances of working storage tables available. This effect can severely restrict the ability to scale the number of concurrent processes. If these tables were recreated as GTTs, then these problems are resolved, and the number of payroll streams can be scaled without creating any contention between them.
There are some other aspects of GTTs that it is important to understand
  • Because a copy of the GTT is created for each referencing database session, GTTs cannot be used to pass data between database sessions, nor can they be used for any on-line process in PeopleSoft because there is no guarantee which application server process will handle which service request. So, for example, you cannot use GTTs to hold dynamic nVision selectors if you plan to run nVision windows client in 3-tier mode. The service request that populates the dynamic selector may be executed by one PSQRYSRV process, but the service to run the query may be run by another.
  • The GTT will persist until either the end of the transaction or the session depending on how the GTT is created. In PeopleSoft, GTTs should be created PRESERVE ON COMMIT because we do not want to loose the data when process commits.
  • Although it is possible to collect optimizer statistics on GTTs, only one set can be stored on each table. So one session could end up using statistics collected by another session (see demonstration of this behaviour). It is probably better to remove statistics on from the GTTs, and rely upon Optimizer Dynamic Sampling (I discussed this in my article in Issue 31 of Oracle Scene Magazine.
GTTs work well in batch programs in PeopleSoft. They can be intrduced easily into SQR and COBOL processes because the processes run and terminate. When the process terminates, the database session is closed and Oracle automatically drops the physical copy of the GTT from the temporary segment relinquishing the space consumed. In Application Engine the position is a little more complicated.
  • Some AE programs can be restarted after a failure. They save their state information in the database as they progress, and can be restarted from the point at which they last committed. But, working storage data will not be preserved if it is held in a GTT, either because the database session will have terminated or because it will be restarted on a different PSAESRV server process. Therefore, restart must be disabled before GTTs can be introduced. This involves a customisation to the program.
  • Ideally, AE programs that use GTTs should run by a stand-alone psae executable, not a PSAESRV server process. PSAESRV server processes where introduced in PeopleTools 8.44 when the Process Scheduler became a fully fledged Tuxedo domain. These Tuxedo server processes each maintain a persistent connection to the database, so they do not have to sign into the database for every AE program request. They are only recommended for systems where there are a very large number of short AE program request, such as CRM. However, it is advantageous for the GTTs to be cleared out at the end of each process, otherwise they will continue to have an overhead in the temporary segment. Even delivered AE programs are not always conscientious about this.
There are operational aspects to disabling restart on Application Engine programs. Often rows of data are allocated to a specific process by writing the process instance on those rows. When a process fails it can leave these rows allocated to that process, and require some manual tidying up. This is a significant discouragement to disabling restart and implementing GTTs, which leaves us with the 'local write wait' problem. PeopleTools Temporary Records PeopleSoft recognised the contention problems caused by different processes sharing the same working storage table and introduced Temporary Record in PeopleTools 8.0. On Oracle it merely degrades performance due to the overhead of producing read-consistent copies of data block, but on other database platforms, read consistency is achieved by using page-level locks (although the subsequent release of SQL Server 2005 with its multi-versioning option gave it similar functionality to Oracle). Version 8 applications also saw a move away from Cobol towards Application Engine, with its new capability to run PeopleCode, for batch processing (although Global Payroll was an exception to this). Temporary records were designed for use in AE processes. Each Application Engine program must declare which temporary tables it will use (stored on PSAEAPPLTEMPTBL), and how many instances of it will be run concurrently (stored on PSAEAPPLDEFN). A PeopleTools Temporary Record is created as an ordinary database table, but Application Designer creates multiple copies of the table. It uses the view PSTEMPTBLCNTVW to calculate the total number of instances of the temporary table required by all AE programs that require it, and then adds on the number of Global Instances (stored on PSOPTIONS).
Take for example a Time & Labor working storage record TL_IPT3. The build script generated by Application Designer creates the table PS_TL_IPT3 as expected, but it used by Application Engine program TL_TIMEADMIN that specifies 10 instances, and there are 3 global instances, so it also created PS_TL_IPT31 through PS_TL_OPT313. So I get 14 copies of the table and its indexes. It will therefore come as no surprise that Temporary Records are the one of the major reasons why there are so many tables in a PeopleSoft database. In a vanilla demo HR database, there are over 6400 temporary tables out of a total of 17600 tables! Now, what if, having met the above criteria, I want to recreate make my Temporary Records as Global Temporary Tables? It is tempting to think that there is no longer any need to have multiple instances of the record, but there is some special behaviour for temporary records coded into PeopleTools. If you run out of instances of temporary tables, you use the table without the numeric suffix, and this treated as a shared temporary table. Thus temporary records should have the field PROCESS_INSTANCE as a part of their unique key, and if they do not you get a warning when you try to save them in Application Designer.
However, there is also a change in behaviour to the %TruncateTable macro. It is fast and efficient to clear a working storage table with TRUNCATE rather than DELETE. It generates only a trivial amount of redo and resets the high water mark on the table. The PeopleCode macro %TruncateTable evaluates to a TRUNCATE command on database platforms that support this command.
%TruncateTable (%Table(TL_IPT3))
If you are working on a temporary instance of a table then you don't have to worry about anyone else, and this macro evaluates, as expected to a truncate command (this test was performed on an Oracle database).
-- 11.44.57 .......(TL_TA_RULES.C1_00200.Step02) (SQL)
TRUNCATE TABLE SYSADM.PS_TL_IPT34
/
-- Row(s) affected: 0
COMMIT
/
However, if you have run out of temporary table instances and you are working on the shared table, %TruncateTable exhibits a special behaviour, it evaluates to DELETE and it automatically appends the criteria to specify the current process instance number.
-- 11.52.30 .......(TL_TA_RULES.C1_00200.Step02) (SQL)
DELETE FROM PS_TL_IPT3 WHERE PROCESS_INSTANCE = 1085
/
-- Row(s) affected: 0
COMMIT
/
Putting it all Together As is always the case in PeopleSoft, the challenge is to use an advantageous database feature without an unacceptable degree of customisation in PeopleSoft. I have considered and discounted the possibility of converting the Temporary Records to permanent records. That would require customisation of every working storage table, and would change the behaviour of the %TruncateTable macro preventing multiple instances of the AE program from being able to execute concurrently. There is no doubt that it is nearly always preferable to clear out a working storage table by truncating it rather than deleting it. When you delete data you end up copying it as undo to the redo logs, and to the undo segment which as generates redo. So, we need to keep the all the instances of the Temporary Records, so that we keep the normal behaviour of %TruncateTable, but we need to make them all of the into GTTs, including the shared instance, so that we save the redo overhead and we can still truncate them to release space back to the temporary segment and reset the high water mark. The following query lists the Temporary Records where there is no referencing Application Engine program where restart is enabled, the number of instances of each table, including the global instances.
SELECT r.recname
,      o.temptblinstances+NVL(c.temptblinstances,0) temptblinstances
FROM   psoptions o
,      psrecdefn r
LEFT OUTER JOIN pstemptblcntvw c ON c.recname = r.recname
WHERE  r.rectype = 7
AND NOT EXISTS( --omit tables where restart not disabled on referencing AE
SELECT 'x'
FROM   psaeappltemptbl t
,      psaeappldefn a
WHERE  t.recname = r.recname
AND    a.ae_applid = t.ae_applid
AND    a.ae_disable_restart = 'N' --restart not disabled
AND    ROWNUM = 1)
DDL for GTTs in App Designer So now we turn to the question of how to generate the DDL to build the GTTs. It is probably not a good idea to implement this in development and unit test databases. One of the problems with GTTs is that you cannot see what somebody else stored in their copy of a GTT. This can make debugging rather difficult because you cannot see the temporary data. So I suggest that this should only be done in configuration and user acceptance test and performance test databases before implementing in production. It is possible to get Application Designer to do this, but it requires both changes to the DDL Models to enable customisations to the record to comment out the physical storage clauses and to insert the GLOBAL TEMPORARY keywords into the DDL. However, it does require a customisation to each record to specify DDL overrides. The DDL Models to create tables and indexes on Oracle should be changed as follows. 1: Create Table:
CREATE **GLOBTEMP** TABLE [TBNAME] ([TBCOLLIST]) **GTSPCOM1** TABLESPACE [TBSPCNAME] STORAGE (INITIAL **INIT** NEXT **NEXT** MAXEXTENTS **MAXEXT** PCTINCREASE **PCT**) PCTFREE **PCTFREE** PCTUSED **PCTUSED** **GTSPCOM2**;
2: Create Index
CREATE [UNIQUE] **BITMAP** INDEX [IDXNAME] ON [TBNAME] ([IDXCOLLIST]) **GTSPCOM1** TABLESPACE **INDEXSPC** STORAGE (INITIAL **INIT** NEXT **NEXT** MAXEXTENTS **MAXEXT** PCTINCREASE **PCT**) PCTFREE **PCTFREE** PARALLEL NOLOGGING **GTSPCOM2**;
The three new parameters (in bold) should default to a blank string in the DDL model.
  • GLOBTEMP will be used to specify the keyword GLOBAL TEMPORARY.
  • GTSPCOM1 will be used to start a comment in front on the physical table attributes. It is not legitimate to specify these parameters on a GTT. This variable will also be used to specify the GTT clause 'ON COMMIT PRESERVE ROWS'.
  • GTSPCOM2 will be used to close the comment at the end of physical table attributes.
DDL overrides need to be specified on the table
And all of its indexes.
It would be very tedious to add all these overrides manually, so I have produced two PL/SQL scripts that update the PeopleTools tables directly.
  • globtemp.sql: implements GTTs for all temporary records where there is no restartable AE program that references the table.
  • globtemp-undo.sql: reverts the record to normal tables where there is no restartable AE program (so that changes can be deliberately undone for testing).
Both scripts maintain PeopleTools object version numbers to cause the tables to cache properly in Application Designer. The scripts also maintain the items in an Application Designer project GLOBTEMP. The project should be created manually before the script is run. The scripts do not commit themselves, so that the person running them can decide whether to commit or rollback the changes. Sample output - this shows only has the shareable table
DROP TABLE PS_GPCHAL074_TMP
/
CREATE GLOBAL TEMPORARY TABLE PS_GPCHAL074_TMP (PROCESS_INSTANCE
DECIMAL(10) NOT NULL,
FROMDATE DATE,
TO_DATE DATE) ON COMMIT PRESERVE ROWS /* TABLESPACE GPAPP STORAGE
(INITIAL 40000 NEXT 100000 MAXEXTENTS UNLIMITED PCTINCREASE 0)
PCTFREE 10 PCTUSED 80 */
/
CREATE UNIQUE  INDEX PS_GPCHAL074_TMP ON PS_GPCHAL074_TMP
(PROCESS_INSTANCE,
FROMDATE,
TO_DATE) /* TABLESPACE PSINDEX STORAGE (INITIAL 40000 NEXT 100000
MAXEXTENTS UNLIMITED PCTINCREASE 0) PCTFREE 10  PARALLEL NOLOGGING */
/
ALTER INDEX PS_GPCHAL074_TMP NOPARALLEL LOGGING
/

Unfortunately the last ALTER INDEX command in the script raises an Oracle error - ORA-14451: unsupported feature with temporary table.
But this command is added automatically to the create index commands and cannot be removed. This error should simply be ignored.

Tuesday, January 08, 2008

Oracle/PeopleSoft have mixed up DDL Models used by %UpdateStats from PeopleTools 8.48

Last May, I wrote about Changes to DDL Models in PeopleTools 8.48. DDL models 4 and 5 are used by the %UpdateStats PeopleCode macro. Previously, PeopleSoft had delivered these models with ANALYZE TABLE commands. Now, in line with long standing Oracle RDBMS guidance, they call DBMS_STATS (see $PS_HOME/script/ddlora.dms). I certainly welcome that change.

However, I have recently noticed that the DDL models have been swapped over. I have commented on this elsewhere, but I felt it needed a separate posting.

I am certain that this is a mistake, but it is at least one that can be easily corrected by PeopleSoft customers. The problem is not obvious because the full compute DDL model actually only uses a 1% sample, and the automatic sample size calculated by Oracle is usually within an order of magnitude of this value, though it is often greater than 1%.

So let me be absolutely clear here that:

  • Model 4 is used by %UpdateStats([table],LOW);


  • Model 5 is used bt %UpdateStats([table],HIGH);


  • This can be easily verified. I wrote a simple Application Engine that collected statistics on two tables via the %UpdateStats macro. I implemented the delivery DDL models as specified in ddlora.dms. This is the Application Engine trace file.

    ...
    -- 17.35.40 .(DMK.MAIN.Step01) (SQL)
    RECSTATS PSLOCK LOW
    /
    -- Row(s) affected: 1
    /
    /
    Restart Data CheckPointed
    /
    COMMIT
    /
    ...
    -- 17.35.41 .(DMK.MAIN.Step02) (SQL)
    RECSTATS PSVERSION HIGH
    /
    -- Row(s) affected: 1
    /
    /
    Restart Data CheckPointed
    /
    COMMIT
    /
    ...


    Unfortunately the %UpdateStats macro is not fully traced in the PeopleTools trace either (it also reports the same information as the AE Trace file). The only way I know to find out what is being submitted to the database is to enable Oracle SQL Trace, and look in the trace file.

    ...
    =====================
    PARSING IN CURSOR #2 len=155 dep=0 uid=39 oct=47 lid=39 tim=259825298401 hv=1993983003 ad='6c1382b4'
    BEGIN DBMS_STATS.GATHER_TABLE_STATS (ownname=> 'SYSADM', tabname=> 'PSLOCK', estimate_percent=> 1, method_opt=> 'FOR ALL COLUMNS SIZE 1', cascade=> TRUE); END;
    END OF STMT
    PARSE #2:c=31250,e=386098,p=25,cr=105,cu=0,mis=1,r=0,dep=0,og=1,tim=259825298396
    ...
    =====================
    PARSING IN CURSOR #2 len=193 dep=0 uid=39 oct=47 lid=39 tim=259826057420 hv=2784637395 ad='6c138098'
    BEGIN DBMS_STATS.GATHER_TABLE_STATS (ownname=> 'SYSADM', tabname=>'PSVERSION', estimate_percent=> dbms_stats.auto_sample_size, method_opt=> 'FOR ALL INDEXED COLUMNS SIZE 1', cascade=> TRUE); END;
    END OF STMT
    PARSE #2:c=0,e=2195,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=259826057415
    ...


    So, you can see that RECSTATS HIGH corresponds to the default estimate, but RECSTATS LOW corresponds to the 1% sample size.
    If you look in ddlora.dms you can see that model 4 is the 1% sample and model 5 is the default estimate.

    ...
    4,2,0,0,$long
    DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=> [TBNAME], estimate_percent=> , method_opt=> 'FOR ALL COLUMNS SIZE 1', cascade=> TRUE);
    //
    5,2,0,0,$long
    DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=> [TBNAME], estimate_percent=> dbms_stats.auto_sample_size, method_opt=> 'FOR ALL INDEXED COLUMNS SIZE 1', cascade=> TRUE);
    //
    ...


    So you can also see how the DDL models have been confused, which as I have commented I consider to be a typographical error, and that should really have been 100% indicating a full compute. I think it would make sense to change the ddlora.dms script to read as follows:

    ......
    INSERT INTO PSDDLMODEL (
    STATEMENT_TYPE,
    PLATFORMID,
    SIZING_SET,
    PARMCOUNT,
    MODEL_STATEMENT)
    VALUES(
    :1,
    :2,
    :3,
    :4,
    :5)
    \
    $DATATYPES NUMERIC,NUMERIC,NUMERIC,NUMERIC,CHARACTER
    ...
    4,2,0,0,$long
    DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=> [TBNAME], estimate_percent=>dbms_stats.auto_sample_size, method_opt=> 'FOR ALL COLUMNS SIZE AUTO', cascade=> TRUE);
    //
    5,2,0,0,$long
    DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=> [TBNAME], estimate_percent=>, method_opt=> 'FOR ALL COLUMNS SIZE AUTO', cascade=> TRUE);
    //
    /
    ...


    Or, if you use the wrapper SQL that I proposed in %UpdateStats() -v- Optimizer Dynamic Sampling

    ...
    4,2,0,0,$long
    wrapper.ps_stats(p_ownname=> [DBNAME], p_tabname=> [TBNAME], p_estimate_percent=> 0);
    //
    5,2,0,0,$long
    wrapper.ps_stats(p_ownname=> [DBNAME], p_tabname=> [TBNAME], p_estimate_percent=> 1);
    //
    ...


    I have kept the 1% sample size for the compute model, but there is no reason why you could not choose a larger value. If you wanted %UpdateStats([table],HIGH) to continue to mean a full compute, then the value really should be 100%. It is really a matter of how long you want to spend analysing statistics on tables during batch programs. However, a higher sample size will not necessarily produce statistics that will lead to a better execution plan!

    Exactly how Oracle calculates the default sample size is not published. Values in the range 0.5% to 10% are typical. In a perverse sense, a 1% sample size will usually be smaller sample than the Oracle default sample size, so in the delivered DDL models, %UpdateStats([table],HIGH) will usually use a larger sample size that %UpdateStats([table],LOW)! However, I simply cannot believe that this is what was intended.

    You can see the value that Oracle calculates for auto_sample_size by tracing the dbms_stats call, and looking for the sample clause in the recursive SQL. Eg.

    ... from "SYSADM"."PSPCMNAME" sample ( .9490170771) t