Wednesday, January 09, 2008

Global Temporary Tables and PeopleSoft Temporary Records

At a round table at the UKOUG 2007 conference, the question of reducing redo during Application Engine batch programs was mentioned. For example, In both Global Payroll and Time & Labor, AE programs will shuffle data through a series of working storage tables before putting them into the final tables.

Using any RDBMS (not just Oracle) as a temporary working store is not efficient. The database will certainly store that data, but it will go to a lot of time and trouble to make sure that it is stored securely and can be recovered in the event of a disaster. In an Oracle database, every insert into these working storage tables will generate redo, and the user process must wait for the redo to be written to disk when the session commits. Excessive redo generation can become a performance issue, it can also impact on a Data Guard DR solution, and on the time taken to perform incremental backup and recovery. However, it is not necessary to be able to recover the data in the working storage tables, only the final result data.

Effect of Frequently Truncating Tables

I have observed significant amounts of time spent truncating working storage tables during batch programs, especially when concurrent process frequently truncate working storage tables. This is a typical example from a Statspack report.
Top 5 Timed Events                                Avg %Total
~~~~~~~~~~~~~~~~~~                               wait   Call
Event                           Waits Time (s)   (ms)   Time
----------------------------- ------- -------- ------ ------
local write wait                5,756    3,955    687   52.6
enq: RO - fast object reuse     1,500    2,685   1790   35.7

From Metalink, Oracle Note 334822.1 sums up the situation: "Processes that involve temporary tables being truncated and repopulated in multiple, concurrent batch streams may present this situation. The underlying problem is we have to write the object's dirty buffers to disk prior to actually truncating or dropping the object. This ensures instance recoverability and avoids a stuck recovery. It seems at first glance perfectly reasonable to simply truncate a temporary table, then repopulate for another usage. And then to do the temporary populate/truncate operations in concurrent batches to increase throughput. However, in reality the concurrent truncates get bogged down as dbwr gets busy flushing those dirty block buffers from the buffer cache. You will see huge CI enqueue waits. The multiple truncate operations in concurrent streams absolutely kill throughput. This is specially critical with large buffers.

"There was also a discussion in Bug: 4147840 (non-publish) where a PeopleSoft process was causing this behaviour because of the above explanation and they seemed to fix it by changing some PeopleSoft code to implement delete rather than truncate on small temporary tables."

But if we go back to deleting rows from these tables we will copy the contents to the redo log. Furthermore, deleting data will not reset the high watermark on the table.

GTTs

One option is to recreate working storage tables as Oracle Global Temporary Tables (GTT). There are two major advantages to GTTs.
  • They do not generate redo, although they do generate undo. Most of the time batch processes will be inserting data into these tables. The undo on the insert is just to delete the data and so is much smaller. If the table cleared by truncation rather rather than deleting the data, there is virtually no undo generated. The problem with local write wait and contention on the RO enqueue does not occur with GTTs.
  • For each session that references the same GTT, Oracle will create a separate physical copy of the table in the temporary segment. The Global Payroll engine is written in COBOL, and so it uses ordinary PeopleSoft records for working storage. If multiple 'streams' are used to run different sets of data simultaneously, the processes share the same working storage tables. Each session will have to scan through and discard working storage data from other sessions, and when updating or deleting from temporary tables the Oracle may have to generate read consistent copies of these data blocks. The same happens in Application Engine when there are no unlocked instances of working storage tables available. This effect can severely restrict the ability to scale the number of concurrent processes. If these tables were recreated as GTTs, then these problems are resolved, and the number of payroll streams can be scaled without creating any contention between them.
There are some other aspects of GTTs that it is important to understand
  • Because a copy of the GTT is created for each referencing database session, GTTs cannot be used to pass data between database sessions, nor can they be used for any on-line process in PeopleSoft because there is no guarantee which application server process will handle which service request. So, for example, you cannot use GTTs to hold dynamic nVision selectors if you plan to run nVision windows client in 3-tier mode. The service request that populates the dynamic selector may be executed by one PSQRYSRV process, but the service to run the query may be run by another.
  • The GTT will persist until either the end of the transaction or the session depending on how the GTT is created. In PeopleSoft, GTTs should be created PRESERVE ON COMMIT because we do not want to loose the data when process commits.
  • Although it is possible to collect optimizer statistics on GTTs, only one set can be stored on each table. So one session could end up using statistics collected by another session (see demonstration of this behaviour). It is probably better to remove statistics on from the GTTs, and rely upon Optimizer Dynamic Sampling (I discussed this in my article in Issue 31 of Oracle Scene Magazine.
GTTs work well in batch programs in PeopleSoft. They can be intrduced easily into SQR and COBOL processes because the processes run and terminate. When the process terminates, the database session is closed and Oracle automatically drops the physical copy of the GTT from the temporary segment relinquishing the space consumed. In Application Engine the position is a little more complicated.
  • Some AE programs can be restarted after a failure. They save their state information in the database as they progress, and can be restarted from the point at which they last committed. But, working storage data will not be preserved if it is held in a GTT, either because the database session will have terminated or because it will be restarted on a different PSAESRV server process. Therefore, restart must be disabled before GTTs can be introduced. This involves a customisation to the program.
  • Ideally, AE programs that use GTTs should run by a stand-alone psae executable, not a PSAESRV server process. PSAESRV server processes where introduced in PeopleTools 8.44 when the Process Scheduler became a fully fledged Tuxedo domain. These Tuxedo server processes each maintain a persistent connection to the database, so they do not have to sign into the database for every AE program request. They are only recommended for systems where there are a very large number of short AE program request, such as CRM. However, it is advantageous for the GTTs to be cleared out at the end of each process, otherwise they will continue to have an overhead in the temporary segment. Even delivered AE programs are not always conscientious about this.
There are operational aspects to disabling restart on Application Engine programs. Often rows of data are allocated to a specific process by writing the process instance on those rows. When a process fails it can leave these rows allocated to that process, and require some manual tidying up. This is a significant discouragement to disabling restart and implementing GTTs, which leaves us with the 'local write wait' problem. PeopleTools Temporary Records PeopleSoft recognised the contention problems caused by different processes sharing the same working storage table and introduced Temporary Record in PeopleTools 8.0. On Oracle it merely degrades performance due to the overhead of producing read-consistent copies of data block, but on other database platforms, read consistency is achieved by using page-level locks (although the subsequent release of SQL Server 2005 with its multi-versioning option gave it similar functionality to Oracle). Version 8 applications also saw a move away from Cobol towards Application Engine, with its new capability to run PeopleCode, for batch processing (although Global Payroll was an exception to this). Temporary records were designed for use in AE processes. Each Application Engine program must declare which temporary tables it will use (stored on PSAEAPPLTEMPTBL), and how many instances of it will be run concurrently (stored on PSAEAPPLDEFN). A PeopleTools Temporary Record is created as an ordinary database table, but Application Designer creates multiple copies of the table. It uses the view PSTEMPTBLCNTVW to calculate the total number of instances of the temporary table required by all AE programs that require it, and then adds on the number of Global Instances (stored on PSOPTIONS).
Take for example a Time & Labor working storage record TL_IPT3. The build script generated by Application Designer creates the table PS_TL_IPT3 as expected, but it used by Application Engine program TL_TIMEADMIN that specifies 10 instances, and there are 3 global instances, so it also created PS_TL_IPT31 through PS_TL_OPT313. So I get 14 copies of the table and its indexes. It will therefore come as no surprise that Temporary Records are the one of the major reasons why there are so many tables in a PeopleSoft database. In a vanilla demo HR database, there are over 6400 temporary tables out of a total of 17600 tables! Now, what if, having met the above criteria, I want to recreate make my Temporary Records as Global Temporary Tables? It is tempting to think that there is no longer any need to have multiple instances of the record, but there is some special behaviour for temporary records coded into PeopleTools. If you run out of instances of temporary tables, you use the table without the numeric suffix, and this treated as a shared temporary table. Thus temporary records should have the field PROCESS_INSTANCE as a part of their unique key, and if they do not you get a warning when you try to save them in Application Designer.
However, there is also a change in behaviour to the %TruncateTable macro. It is fast and efficient to clear a working storage table with TRUNCATE rather than DELETE. It generates only a trivial amount of redo and resets the high water mark on the table. The PeopleCode macro %TruncateTable evaluates to a TRUNCATE command on database platforms that support this command.
%TruncateTable (%Table(TL_IPT3))
If you are working on a temporary instance of a table then you don't have to worry about anyone else, and this macro evaluates, as expected to a truncate command (this test was performed on an Oracle database).
-- 11.44.57 .......(TL_TA_RULES.C1_00200.Step02) (SQL)
TRUNCATE TABLE SYSADM.PS_TL_IPT34
/
-- Row(s) affected: 0
COMMIT
/
However, if you have run out of temporary table instances and you are working on the shared table, %TruncateTable exhibits a special behaviour, it evaluates to DELETE and it automatically appends the criteria to specify the current process instance number.
-- 11.52.30 .......(TL_TA_RULES.C1_00200.Step02) (SQL)
DELETE FROM PS_TL_IPT3 WHERE PROCESS_INSTANCE = 1085
/
-- Row(s) affected: 0
COMMIT
/
Putting it all Together As is always the case in PeopleSoft, the challenge is to use an advantageous database feature without an unacceptable degree of customisation in PeopleSoft. I have considered and discounted the possibility of converting the Temporary Records to permanent records. That would require customisation of every working storage table, and would change the behaviour of the %TruncateTable macro preventing multiple instances of the AE program from being able to execute concurrently. There is no doubt that it is nearly always preferable to clear out a working storage table by truncating it rather than deleting it. When you delete data you end up copying it as undo to the redo logs, and to the undo segment which as generates redo. So, we need to keep the all the instances of the Temporary Records, so that we keep the normal behaviour of %TruncateTable, but we need to make them all of the into GTTs, including the shared instance, so that we save the redo overhead and we can still truncate them to release space back to the temporary segment and reset the high water mark. The following query lists the Temporary Records where there is no referencing Application Engine program where restart is enabled, the number of instances of each table, including the global instances.
SELECT r.recname
,      o.temptblinstances+NVL(c.temptblinstances,0) temptblinstances
FROM   psoptions o
,      psrecdefn r
LEFT OUTER JOIN pstemptblcntvw c ON c.recname = r.recname
WHERE  r.rectype = 7
AND NOT EXISTS( --omit tables where restart not disabled on referencing AE
SELECT 'x'
FROM   psaeappltemptbl t
,      psaeappldefn a
WHERE  t.recname = r.recname
AND    a.ae_applid = t.ae_applid
AND    a.ae_disable_restart = 'N' --restart not disabled
AND    ROWNUM = 1)
DDL for GTTs in App Designer So now we turn to the question of how to generate the DDL to build the GTTs. It is probably not a good idea to implement this in development and unit test databases. One of the problems with GTTs is that you cannot see what somebody else stored in their copy of a GTT. This can make debugging rather difficult because you cannot see the temporary data. So I suggest that this should only be done in configuration and user acceptance test and performance test databases before implementing in production. It is possible to get Application Designer to do this, but it requires both changes to the DDL Models to enable customisations to the record to comment out the physical storage clauses and to insert the GLOBAL TEMPORARY keywords into the DDL. However, it does require a customisation to each record to specify DDL overrides. The DDL Models to create tables and indexes on Oracle should be changed as follows. 1: Create Table:
CREATE **GLOBTEMP** TABLE [TBNAME] ([TBCOLLIST]) **GTSPCOM1** TABLESPACE [TBSPCNAME] STORAGE (INITIAL **INIT** NEXT **NEXT** MAXEXTENTS **MAXEXT** PCTINCREASE **PCT**) PCTFREE **PCTFREE** PCTUSED **PCTUSED** **GTSPCOM2**;
2: Create Index
CREATE [UNIQUE] **BITMAP** INDEX [IDXNAME] ON [TBNAME] ([IDXCOLLIST]) **GTSPCOM1** TABLESPACE **INDEXSPC** STORAGE (INITIAL **INIT** NEXT **NEXT** MAXEXTENTS **MAXEXT** PCTINCREASE **PCT**) PCTFREE **PCTFREE** PARALLEL NOLOGGING **GTSPCOM2**;
The three new parameters (in bold) should default to a blank string in the DDL model.
  • GLOBTEMP will be used to specify the keyword GLOBAL TEMPORARY.
  • GTSPCOM1 will be used to start a comment in front on the physical table attributes. It is not legitimate to specify these parameters on a GTT. This variable will also be used to specify the GTT clause 'ON COMMIT PRESERVE ROWS'.
  • GTSPCOM2 will be used to close the comment at the end of physical table attributes.
DDL overrides need to be specified on the table
And all of its indexes.
It would be very tedious to add all these overrides manually, so I have produced two PL/SQL scripts that update the PeopleTools tables directly.
  • globtemp.sql: implements GTTs for all temporary records where there is no restartable AE program that references the table.
  • globtemp-undo.sql: reverts the record to normal tables where there is no restartable AE program (so that changes can be deliberately undone for testing).
Both scripts maintain PeopleTools object version numbers to cause the tables to cache properly in Application Designer. The scripts also maintain the items in an Application Designer project GLOBTEMP. The project should be created manually before the script is run. The scripts do not commit themselves, so that the person running them can decide whether to commit or rollback the changes. Sample output - this shows only has the shareable table
DROP TABLE PS_GPCHAL074_TMP
/
CREATE GLOBAL TEMPORARY TABLE PS_GPCHAL074_TMP (PROCESS_INSTANCE
DECIMAL(10) NOT NULL,
FROMDATE DATE,
TO_DATE DATE) ON COMMIT PRESERVE ROWS /* TABLESPACE GPAPP STORAGE
(INITIAL 40000 NEXT 100000 MAXEXTENTS UNLIMITED PCTINCREASE 0)
PCTFREE 10 PCTUSED 80 */
/
CREATE UNIQUE  INDEX PS_GPCHAL074_TMP ON PS_GPCHAL074_TMP
(PROCESS_INSTANCE,
FROMDATE,
TO_DATE) /* TABLESPACE PSINDEX STORAGE (INITIAL 40000 NEXT 100000
MAXEXTENTS UNLIMITED PCTINCREASE 0) PCTFREE 10  PARALLEL NOLOGGING */
/
ALTER INDEX PS_GPCHAL074_TMP NOPARALLEL LOGGING
/

Unfortunately the last ALTER INDEX command in the script raises an Oracle error - ORA-14451: unsupported feature with temporary table.
But this command is added automatically to the create index commands and cannot be removed. This error should simply be ignored.

Tuesday, January 08, 2008

Oracle/PeopleSoft have mixed up DDL Models used by %UpdateStats from PeopleTools 8.48

Last May, I wrote about Changes to DDL Models in PeopleTools 8.48. DDL models 4 and 5 are used by the %UpdateStats PeopleCode macro. Previously, PeopleSoft had delivered these models with ANALYZE TABLE commands. Now, in line with long standing Oracle RDBMS guidance, they call DBMS_STATS (see $PS_HOME/script/ddlora.dms). I certainly welcome that change.

However, I have recently noticed that the DDL models have been swapped over. I have commented on this elsewhere, but I felt it needed a separate posting.

I am certain that this is a mistake, but it is at least one that can be easily corrected by PeopleSoft customers. The problem is not obvious because the full compute DDL model actually only uses a 1% sample, and the automatic sample size calculated by Oracle is usually within an order of magnitude of this value, though it is often greater than 1%.

So let me be absolutely clear here that:

  • Model 4 is used by %UpdateStats([table],LOW);


  • Model 5 is used bt %UpdateStats([table],HIGH);


  • This can be easily verified. I wrote a simple Application Engine that collected statistics on two tables via the %UpdateStats macro. I implemented the delivery DDL models as specified in ddlora.dms. This is the Application Engine trace file.

    ...
    -- 17.35.40 .(DMK.MAIN.Step01) (SQL)
    RECSTATS PSLOCK LOW
    /
    -- Row(s) affected: 1
    /
    /
    Restart Data CheckPointed
    /
    COMMIT
    /
    ...
    -- 17.35.41 .(DMK.MAIN.Step02) (SQL)
    RECSTATS PSVERSION HIGH
    /
    -- Row(s) affected: 1
    /
    /
    Restart Data CheckPointed
    /
    COMMIT
    /
    ...


    Unfortunately the %UpdateStats macro is not fully traced in the PeopleTools trace either (it also reports the same information as the AE Trace file). The only way I know to find out what is being submitted to the database is to enable Oracle SQL Trace, and look in the trace file.

    ...
    =====================
    PARSING IN CURSOR #2 len=155 dep=0 uid=39 oct=47 lid=39 tim=259825298401 hv=1993983003 ad='6c1382b4'
    BEGIN DBMS_STATS.GATHER_TABLE_STATS (ownname=> 'SYSADM', tabname=> 'PSLOCK', estimate_percent=> 1, method_opt=> 'FOR ALL COLUMNS SIZE 1', cascade=> TRUE); END;
    END OF STMT
    PARSE #2:c=31250,e=386098,p=25,cr=105,cu=0,mis=1,r=0,dep=0,og=1,tim=259825298396
    ...
    =====================
    PARSING IN CURSOR #2 len=193 dep=0 uid=39 oct=47 lid=39 tim=259826057420 hv=2784637395 ad='6c138098'
    BEGIN DBMS_STATS.GATHER_TABLE_STATS (ownname=> 'SYSADM', tabname=>'PSVERSION', estimate_percent=> dbms_stats.auto_sample_size, method_opt=> 'FOR ALL INDEXED COLUMNS SIZE 1', cascade=> TRUE); END;
    END OF STMT
    PARSE #2:c=0,e=2195,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=259826057415
    ...


    So, you can see that RECSTATS HIGH corresponds to the default estimate, but RECSTATS LOW corresponds to the 1% sample size.
    If you look in ddlora.dms you can see that model 4 is the 1% sample and model 5 is the default estimate.

    ...
    4,2,0,0,$long
    DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=> [TBNAME], estimate_percent=> , method_opt=> 'FOR ALL COLUMNS SIZE 1', cascade=> TRUE);
    //
    5,2,0,0,$long
    DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=> [TBNAME], estimate_percent=> dbms_stats.auto_sample_size, method_opt=> 'FOR ALL INDEXED COLUMNS SIZE 1', cascade=> TRUE);
    //
    ...


    So you can also see how the DDL models have been confused, which as I have commented I consider to be a typographical error, and that should really have been 100% indicating a full compute. I think it would make sense to change the ddlora.dms script to read as follows:

    ......
    INSERT INTO PSDDLMODEL (
    STATEMENT_TYPE,
    PLATFORMID,
    SIZING_SET,
    PARMCOUNT,
    MODEL_STATEMENT)
    VALUES(
    :1,
    :2,
    :3,
    :4,
    :5)
    \
    $DATATYPES NUMERIC,NUMERIC,NUMERIC,NUMERIC,CHARACTER
    ...
    4,2,0,0,$long
    DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=> [TBNAME], estimate_percent=>dbms_stats.auto_sample_size, method_opt=> 'FOR ALL COLUMNS SIZE AUTO', cascade=> TRUE);
    //
    5,2,0,0,$long
    DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=> [TBNAME], estimate_percent=>, method_opt=> 'FOR ALL COLUMNS SIZE AUTO', cascade=> TRUE);
    //
    /
    ...


    Or, if you use the wrapper SQL that I proposed in %UpdateStats() -v- Optimizer Dynamic Sampling

    ...
    4,2,0,0,$long
    wrapper.ps_stats(p_ownname=> [DBNAME], p_tabname=> [TBNAME], p_estimate_percent=> 0);
    //
    5,2,0,0,$long
    wrapper.ps_stats(p_ownname=> [DBNAME], p_tabname=> [TBNAME], p_estimate_percent=> 1);
    //
    ...


    I have kept the 1% sample size for the compute model, but there is no reason why you could not choose a larger value. If you wanted %UpdateStats([table],HIGH) to continue to mean a full compute, then the value really should be 100%. It is really a matter of how long you want to spend analysing statistics on tables during batch programs. However, a higher sample size will not necessarily produce statistics that will lead to a better execution plan!

    Exactly how Oracle calculates the default sample size is not published. Values in the range 0.5% to 10% are typical. In a perverse sense, a 1% sample size will usually be smaller sample than the Oracle default sample size, so in the delivered DDL models, %UpdateStats([table],HIGH) will usually use a larger sample size that %UpdateStats([table],LOW)! However, I simply cannot believe that this is what was intended.

    You can see the value that Oracle calculates for auto_sample_size by tracing the dbms_stats call, and looking for the sample clause in the recursive SQL. Eg.

    ... from "SYSADM"."PSPCMNAME" sample ( .9490170771) t