Friday, January 30, 2009

Managing Changes to the Number of Instances of Temporary Tables used in Application Engine Programs

When you run multiple copies of either the same Application Engine program, or different Application Engines that happen to use the same temporary work records, you need to worry about how many instances of the program or programs are likely to run concurrently, and hence how many instances of the temporary table to build.

But how do you manage these tables when you change the number of instances in the Application Engine properties?

How many tables are built for each Temporary Record?

The number of instances of each Application Engine program is set in the properties for that program (and stored on the PeopleTools table PSAEAPPLDEFN in the column TEMPTBLINSTANCES).

This screenshot is the Application Engine Program Properties for TL_TIMEADMIN. It is delivered configured for up to 10 concurrent instances.


The number of tables that are built for each Temporary Record is the sum of the instances in all the Application Engine programs in which the record is used, plus the number of Global Instances (in set up on the PeopleTools Options page.

Let’s take the record TL_PROF_LIST as an example. It is defined as a Temporary Record in 8 different Application Engine programs (in my HCM8.9 demo system).

SELECT a.recname, a.ae_applid, b.temptblinstances
FROM   psaeappltemptbl a, psaeappldefn b
WHERE  a.ae_applid = b.ae_applid
AND    a.recname = 'TL_PROF_LIST'
ORDER BY a.recname
/

RECNAME         AE_APPLID    TEMPTBLINSTANCES
--------------- ------------ ----------------
TL_PROF_LIST    TL_AGG_SECTN                1
TL_PROF_LIST    TL_OUTTCD                   5
TL_PROF_LIST    TL_PUB_TM1                  5
TL_PROF_LIST    TL_SCHHRSRPT                5
TL_PROF_LIST    TL_SCHRES_AE                9
TL_PROF_LIST    TL_ST_LIB                   5
TL_PROF_LIST    TL_TIMEADMIN               10
TL_PROF_LIST    TL_TRPROFILE               10

So, across all the programs 50 temporary tables are required.

SELECT * FROM pstemptblcntvw 
WHERE  recname = 'TL_PROF_LIST'
/

RECNAME         TEMPTBLINSTANCES
--------------- ----------------
TL_PROF_LIST                  50

The system has three global instances.

SELECT temptblinstances
FROM   psoptions
/

TEMPTBLINSTANCES
----------------
               3

So, Application Designer will build 54 tables based on this record.
  • 3 ‘global’ instances for Application Engines without a Process Instance number. The table names will be suffixed with numbers 1 to 3.
  • 50 ‘private’ instances for the number of requested instances of the Application Engines. The table names will be suffixed with numbers 4 to 53
  • 1 ‘shared’ version without a suffix (in other words the usual name) which is used if there is no instance no already allocated to a process instance.
And this is the top and bottom of the script to build the tables only generated by Application Designer.

CREATE TABLE PS_TL_PROF_LIST (PROCESS_INSTANCE DECIMAL(10) NOT NULL,
EMPLID VARCHAR2(11) NOT NULL,
EMPL_RCD SMALLINT NOT NULL,
START_DT DATE,
END_DT DATE) TABLESPACE TLWORK STORAGE (INITIAL 40000 NEXT 100000
MAXEXTENTS UNLIMITED PCTINCREASE 0) PCTFREE 10 PCTUSED 80
/
…
CREATE TABLE PS_TL_PROF_LIST53 (PROCESS_INSTANCE DECIMAL(10) NOT NULL,
EMPLID VARCHAR2(11) NOT NULL,
EMPL_RCD SMALLINT NOT NULL,
START_DT DATE,
END_DT DATE) TABLESPACE TLWORK STORAGE (INITIAL 40000 NEXT 100000
MAXEXTENTS UNLIMITED PCTINCREASE 0) PCTFREE 10 PCTUSED 80
/


Are there any temporary tables that have not been built but that should be built?

If you increase the number of instances of temporary tables on an Application Engine program then you may need to build the extra temporary tables. This query reports the missing tables.



I dropped PS_TL_PROF_LIST42, but I also increased the number of instance of TL_TIMEADMIN from 10 to 11. So the query reports that instances 42 and 54 of this table are missing, and all the other temporary tables for TL_TIMEADMIN also report a missing instance.

RECNAME           INSTANCE TEMPTBLINSTANCES
--------------- ---------- ----------------
…
TL_PMTCH_TMP1           19               19
TL_PMTCH_TMP2           19               19
TL_PROF_LIST            42               54
TL_PROF_LIST            54               54
TL_PROF_WRK             38               38
TL_PT_FINAL             29               29
…


The remedy is very simple. Application Designer will build a script for just the missing tables.

CREATE TABLE PS_TL_PROF_LIST42 (PROCESS_INSTANCE DECIMAL(10) NOT NULL,
EMPLID VARCHAR2(11) NOT NULL,
EMPL_RCD SMALLINT NOT NULL,
START_DT DATE,
END_DT DATE) TABLESPACE TLWORK STORAGE (INITIAL 40000 NEXT 100000
MAXEXTENTS UNLIMITED PCTINCREASE 0) PCTFREE 10 PCTUSED 80
/

Are there any temporary tables built that should not be built?

If you have reduced the number of temporary tables, then you may need to drop the excess tables.

This query reports tables that are beyond the number required.




I built the missing tables for the previous example, but then I reduced the number of instances on TL_TIMEADMIN back to 10. Now, the query reports that there is an extra table for each record beyond the number defined.

RECNAME         INSTANCE TEMPTBLINSTANCES TABLE_NAME
------------- -------- ---------------- ------------------
…
TL_PMTCH_TMP1       19               18 PS_TL_PMTCH_TMP119
TL_PMTCH_TMP2       19               18 PS_TL_PMTCH_TMP219
TL_PROF_LIST        54               53 PS_TL_PROF_LIST54
TL_PROF_WRK         38               37 PS_TL_PROF_WRK38
TL_PT_FINAL         29               28 PS_TL_PT_FINAL29
…


The problem is that Application Designer will not generate the DDL to drop any of these tables. If you reduce the number of temporary table instances, you will have to drop these tables yourself. However, now, it is easy to adjust the above query to produce the DDL to drop the tables.

SELECT 'DROP TABLE '||t.table_name||' PURGE;' cmd
FROM   …

The query then produces these commands.

…
DROP TABLE PS_TL_PMTCH_TMP119 PURGE;
DROP TABLE PS_TL_PMTCH_TMP219 PURGE;
DROP TABLE PS_TL_PROF_LIST54 PURGE;
DROP TABLE PS_TL_PROF_WRK38 PURGE;
DROP TABLE PS_TL_PT_FINAL29 PURGE;
…


The scripts in this posting can be downloaded from my website.

Monday, January 26, 2009

Factors Affecting Performance of Concurrent Truncate of Working Storage Tables

A longer version of this posting, with detailed experimental results, is available on my website.

Over the past year, I have seen problems with Local Write Wait in the Oracle database on two different Oracle systems. One occasion was in a PeopleSoft Time and Labour batch process, the other was in a custom PL/SQL process in non-PeopleSoft system.

In both cases, normal tables in the databases were being used for temporary working storage before that data was then written to another table. The content of the working storage tables was then cleared out by periodically truncating them. In order to increase overall batch throughput, several instances of the program were run in parallel. The resulting concurrent truncate operations contended with each other, and the processes did not scale well.

I have written about this subject previously. These problems have prompted me to do some research and testing, I am now able to make definite recommendations.

Oracle Note 334822.1 (which I have also quoted before) provides a good technical description of the database’s internal behaviour. Put simply; truncate (and drop) operations serialise. Only one process can truncate at any one time. If you have multiple concurrent processes all trying to truncate their own working storage tables, you could experience performance problems. Such processes not scale well as the number of concurrent processes increases.

Real Problems

In the case of the non-PeopleSoft PL/SQL process, I was able to recreate the working storage tables as Global Temporary Tables (GTTs) that deleted the rows on commit because the process committed only when each transaction was complete. Local write wait totally disappeared in this case. Temporary objects do not need to be recovered, so this mechanism does not apply to them.

The PeopleSoft scenario involved one of the ‘Time & Labor’ batch processes, TL_TIMEADMIN. However, GTTs cannot easily be introduced into the T&L batches because there are ‘restartable’. Therefore, the contents of temporary working storage tables need to be preserved after the process and its session terminates. This precludes the use of GTTs.

The combination of Local Write Wait and enq: RO - fast object reuse accounted for 31% of the total response time. This is a significant proportion of the total response time.
  • local write wait occurs, as the name suggests, when the session is waiting for its own write operations. The RO enqueue is used to protect the buffer cache chain while it is scanned for dirty blocks in an object for the database writer to then write to the data files.
  • enq: RO - fast object reuse occurs when a process waits to acquire the RO enqueue, in other words, while somebody else is truncating or dropping an object.
Two factors affect the time for which the RO enqueue is held:
  1. The time taken to write the blocks to disk. Processes that are frequently truncating temporary working storage are also doing a lot of DML operations to populate the working storage and other tables. The disks under the data files are going to be busy. If the disk becomes a bottleneck, the duration of the local write wait will certainly increase.
  2. The time taken to scan the buffer cache for dirty blocks to be written to disk and flushed from cache. The larger the buffer cache, the longer it will take to find these blocks.
The Metalink note also suggests using a different block size, saying that "a separate buffer pool for temporary tables will also reduce RO enqueue". It is not clear whether it is more important to have a different block size or a separate buffer pool. I wanted to find out which factor was more important.

Tests

I created a simple test to model the behaviour of T&L. I created pairs of simple tables, populated one of each pair, and then repeatedly copied the data back and forth between them, truncating the source after the copy. The test script has evolved into a PL/SQL package procedure, mainly so that the tests could be submitted to and run concurrently by the Oracle job scheduler. There are also procedures to create, populate, and drop the pairs of working storage tables. The scripts can be downloaded from my website.

I was able to run the same controlled test in a variety of scenarios. I have run the tests on Oracle 10.2.0.3 on various platforms with similar results. A detailed set of results are available in the longer version of this document on my website.

General Recommendations

If you have to store temporarily working data in a database table, it is much better to use a Global Temporary Table, although the design of the application may preclude this. It is not possible to do this with data used by restartable Application Engine processes, because the contents of the GTT would be lost when the process terminates.

The Metalink note references unpublished bug 414780 in which a PeopleSoft customer reported this problem, but “they seemed to fix it by changing some PeopleSoft code to implement delete rather than truncate on small temporary tables”. However, my tests show that this probably degraded performance further. The individual delete statements take longer than the truncate operations, and the overall test times increased. Although the truncate operations serialise on the RO enqueue and wait for local writes, this is still better than deleting the data and waiting for the undo information to be written to the redo log. Furthermore, although the truncate operations did not scale well, the delete operations exhibited negative scalability for the same volumes and concurrency. They became bottlenecked on redo log.
  • Using a recycle pool of the same block size as the rest of the database was not effective; possibly because these pools use the same LRU latches.
  • Using a larger non-default block size improved performance of truncate. The performance with 32Kb blocks was better than with 16Kb.
  • Using a larger uniform extent size also improved performance for just the truncate operations and the test as a whole. Fewer, larger extents were involved, and hence less time was spent on CPU and row cache lock. The overall throughput truncate operations degraded as the number of processes increased, although, the throughput of the test as whole did scale.
  • The presence or absence of indexes did not have a significant effect on the relative test timings, and does not alter my advice.
  • The effect of truncating with the REUSE STORAGE option is less clear cut. There are no waits on row cache lock because the blocks do not have to be cleared out of the buffer cache, but on the other hand more time is spent on local write wait because all the dirty blocks have to be written to disk, hence the RO enqueue is held for longer and more time is spent on enq: RO - fast object reuse. If you are using an AUTOALLOCATE tablespace then you would be better to use REUSE STORAGE option, but generally you would be slightly better to use a larger uniform extent size and not to use the REUSE STORAGE option.
PeopleSoft Recommendations

Over time, PeopleSoft batch processing has moved slightly away from SQR and COBOL. These types of process cannot be restarted, and so tables used for temporary working storage within the process can usually be recreated as Global Temporary Tables. This will produce better performance and scalability that any option that involves retaining the permanent table.

However, we are seeing more processing in PeopleSoft applications done with Application Engine. If restart has been disabled for an Application Engine program, then temporary records can also be rebuilt as Global Temporary Tables, because their contents does not need to be preserved for another session to pick up.

Otherwise, move the temporary records and their indexes to tablespace with a 32Kb block size. The change of assigned tablespace can be managed within Application Designer, and released like any other patch or customisation. A 32Kb buffer cache must be created in the database instance. Sizing this is going to be a trade-off between how much memory can be taken from other activities to cache just working storage tables, and how much physical I/O you are going to have to wait for. Oracle’s Automatic Shared Memory Management is of no assistance here (until Oracle 11g), the KEEP, RECYCLE, and other block size buffer caches must be sized manually (see Oracle Reference Manual for SGA_TARGET).

No change to the application code is required. There is no performance improvement to be obtained by customising the application code, either to add the REUSE STORAGE option to the TRUNCATE TABLE commands, nor to use DELETE commands instead.

Added 4.7.2009:Tests on Oracle 10.2.0.3 have shown that there is an advantage to putting the working storage tables into a Non-ASSM tablespace. ASSM introduces additional blocks to map the storage in a segment. These are also maintained during a truncate. No having to maintain these blocks saved me 25% on the elapsed run time on some T&L batch processes.

Oracle Bug 4224840/4260477

Added 2.4.2009:Unfortunately, nothing is quite as simple as it seems. If you have a transaction that locks more than 4095 rows in a 32Kb block you can encounter block corruption (bug 4224840). The fix/workaround in Oracle 10g (bug 4260477) is that a transaction will fail with this message before the corruption occurs.

There is an excellent explanation of this problem, and a test script to reproduce it, on Hermant's Oracle DBA Blog

ORA-08007: Further changes to this block by this transaction not allowed

This error will not be resolved until Oracle 11g. However, it does not occur with smaller block sizes. The workaround is either to commit more frequently, or to move the table concerned back to a tablespace with a smaller block size. I have run into this with Time & Labor in a particular scenario.

Tuesday, January 06, 2009

PeopleSoft's PSADMIN Role

Recently, I have had a number of conversations about the privileges that are granted to the PeopleSoft OWNERID account (usually SYSADM) via the PSADMIN role.

A posting on the PeopleSoft DBA Forum, referenced Tanel Poder's blog entry: Oracle Security: All your DBAs are SYSDBAs and can have full OS access. Essentially, there is a security bug in Oracle where users with IMP_FULL_DATABASE and BECOME_USER can gain SYSDBA access. This hole has been closed up a patch delivered in the July 2008 Critical Patch Update.

The PSADMIN role has both of these privileges, as well as another 22 that are not required for normal operation of a PeopleSoft system.

There are two morals to this story:
  1. There are good reasons to keep up to date with Oracle's CPUs.
  2. PSADMIN has privileges that if given to application accounts could compromise database security.
I think that PSADMIN could be stripped down to just the following 8 privileges:
  • ANALYZE ANY
  • ALTER SESSION
  • CREATE SESSION
  • CREATE TABLE
  • CREATE TRIGGER
  • CREATE_VIEW
  • SELECT_CATALOG_ROLE
  • ALTER USER
You only need CREATE USER when creating the PeopleSoft database with the Data Mover import. SELECT_CATALOG_ROLE is not in the delivered PSADMIN, but is required if you remove IMP_FULL_DATABASE and EXP_FULL_DATABASE.

Each privilege is discussed in Chapter 3 of PeopleSoft for the Oracle DBA.