Tuesday, July 28, 2009

Automatic Segment Space -v- Freelist Management for PeopleSoft Temporary Records

Earlier this year, I wrote about some research into the effects of concurrent TRUNCATE TABLE operations in concurrent PeopleSoft Application Engine process. Since then I have been prompted to look at the effect of Automatic Segment Space -v- Freelist (or Manual) Management.

ASSM was introduced in Oracle 9i for Locally Managed Tablespaces, and from Oracle 10g it is default option, It was designed for high concurrency systems, and avoids some of the problems, such as contention on header blocks, that can occur with Freelist Management. It uses tracking blocks to indicate utilisation of data blocks. The ASSM structure is similar to a B-tree index (see Tanel Poder's presentation: Freelists -v- ASSM in Oracle 9i).

Normally, using ASSM is preferable in multi-user OLTP systems. The reduction in contention on the freelists should outweigh the additional I/O for the tracking blocks.

However, non-shared instances of PeopleSoft Temporary Records are only used by a single Application Engine process at any one time. Therefore, there would be no contention on freelists for these objects, and so DML operations on these objects will not derive much benefit from ASSM. In fact, the contrary could easily be true. The additional tracking blocks are also maintained during DML and are cleared out by local writes during a truncate operation. Truncate operations are serialised in Oracle on the RO enqueue. If you run multiple concurrent Application Engine programs you can get contention between the truncate operations. The additional I/O on the ASSM tracking blocks causes the truncate to take longer and in extreme cases can aggravate truncate contention.

The local write operations that occur during truncate operations cannot be deferred. Significant truncate contention can be a symptom of disk contention. However, switching back to Freelist Management saves the I/O to the tracking blocks, and so improves the performance of truncate.

I ran a test where I truncated a table with 5000 rows. I examined an Oracle trace with waits of the truncate. I tested it in an ASSM tablespace, and Freelist Managed tablespace. I tested truncating just the table on its own, and sometimes with an index.































Number of Wait Events

Table only, without any indexes

With a single primary key index

Tablespace Typedb file sequential readenq: RO - fast object reuselocal writesdb file sequential readenq: RO - fast object reuselocal writes
Automatic Segment Space Management

8

1

4

16

2

9

Freelist Management

3

1

1

6

2

3


You can see from this table that the truncates in the ASSM tablespace required more single block reads (db_file_sequential_read) and more local writes.

Recommendation

In addition to my previous recommendation to move the all (non-shared instances) of temporary working storage tables, and their indexes, to tablespaces with a larger block size, I also recommend that those tablespaces should be specified with MANUAL segment space management.

Thursday, July 23, 2009

PeopleSoft and the Oracle Recycle Bin

If you are running PeopleSoft on Oracle 10g, what do you do about the Recycle Bin? It is a new feature in Oracle 10g, and it is enabled by default. So you are using it, unless you have taken a decision to the contrary.

It works just like the Windows recycle bin. You can drop a table and then flash it back (they didn't call it UNDROP because Oracle marketing now calls everything Flashback). So when you drop a table, Oracle marks it as dropped, and renames it with a system generated name beginning with BIN$. You can look at the contents of the Recycle Bin through a catalogue view.

>create table t (a number);
>drop table t;
>select * from user_recyclebin

OBJECT_NAME ORIGINAL_NAME OPERATION TYPE
------------------------------ -------------------------------- --------- -------------------------
TS_NAME CREATETIME DROPTIME DROPSCN
------------------------------ ------------------- ------------------- ----------
PARTITION_NAME CAN CAN RELATED BASE_OBJECT PURGE_OBJECT SPACE
-------------------------------- --- --- ---------- ----------- ------------ ----------
BIN$ZCLja8iAA9LgRAAOf+3h5A==$0 T DROP TABLE
PSDEFAULT 2009-03-02:13:35:12 2009-03-02:13:35:14 9.7561E+12
YES YES 776642 776642 776642 4


If you don't want a table to go into the recycle bin when you drop it, you can code DROP TABLE ... PURGE - but PeopleSoft doesn't do this.

PeopleSoft alter scripts usually drop and recreate tables. In a production system this is often the best option, otherwise you run the risk of causing rows to migrate to other blocks when you add new columns. In one system, I found 17000 objects in the recycle bin, occupying 1.3Gb.

My Opinion

Personally, I would disable the recycle bin by setting the initialisation parameter RECYCLEBIN = OFF in all PeopleSoft environments, with the possible exception of the development environment.

The RECYCLEBIN parameter can also be set dynamically at session or system level. You could perhaps turn it on prior to upgrade/data migration procedures, and then when satisfied turn it off again and manually purge the recycle bin.

I think Oracle features should be used knowingly. It doesn't matter whether you decide to use a feature or not. It is important that in making that decision you have thought about how to deal with the implications, rather than be surprised later.

Tuesday, July 14, 2009

Oracle Terminated Connection Timeout

I have recently come across situations on two different PeopleSoft sites where ad-hoc queries continue to run on the Oracle database server long after the Application Server process, which is the Oracle session client, has terminated. Often, queries perform poorly because they are poorly coded, but that is another story. To help guard against this situation Oracle has mechanism called Terminated Connection Timeout (also known as Dead Connection Detection (DCD) when it was introduced in Net8).

Oracle Support Note 615782.1 explains the mechanism. "DCD is initiated on the server when a connection is established. At this time SQL*Net reads the SQL*Net parameter files and sets a timer to generate an alarm. The timer interval is set by providing a non-zero value in minutes for the SQLNET.EXPIRE_TIME parameter in the sqlnet.ora file on the database server side. When the timer expires, SQL*Net on the server sends a 'probe' packet, essentially an empty SQL*Net packet, to the client. If the client end of the connection is still active, the probe is discarded, and the timer mechanism is reset. If the client has terminated abnormally, the server will receive an error from the send call issued for the probe, and SQL*Net on the server will signal the operating system to release the connection's resources."

Thus, if a PeopleSoft operator initiates an ad-hoc query that runs on the PSQRYSRV server for longer than the ICQuery service time-out (default 1200 seconds), then Tuxedo will terminate and restart the busy server process. However, the query will continue run on the database server until the current fetch operation returns. In the case of a query performing a large sort or hash operation, it might be a long time before the first row is returned. All the while, the query is continuing to consume resources on the database.

PeopleTools 8.44, also introduced the ability to kill a query via the Query Monitor that had reached a maximum run time. This is one of the functions of the PSMONITORSRV server process (see Oracle Support Note 624339.1. The maximum run time is specified in a permission list (see Security Administration PeopleBook Permission List Query Profile) and then the ability to kill queries that have timed out can be enabled or disabled system-wide (see PeopleSoft Query PeopleBook Query Administration. It will kill the application server process that submitted the query, but for the same reasons, the query may continue to run on the database.

Thus, setting Terminated Connection Timeout is not merely a good idea for a PeopleSoft system running on Oracle, it is effectively mandatory. Otherwise. some PeopleSoft functionality simply won't work as intended.

What is an appropriate value for SQLNET.EXPIRE_TIME?

The value for this parameter is the time between successive SQL*Net probes sent by the Oracle shadow server process to the client. Setting it is a balance between the maximum time that a query can be left to consume resources after a client process terminates, against the additional overhead of every client process sending a probe every few minutes.

The SQL*Net documents often talk about additional network traffic generated by DCD. This was a consideration in the past on client-server applications that ran across a wide area network. However, it is rarely a consideration in relatively modern systems such as PeopleTools 8, the database connections are made by Application Server and Process Scheduler, which are usually physically close to the database server.

The time-out can be set independently of any of the other time-outs for the Application Server and Web Server. Documents on Metalink often suggest 5 or 10 minutes, and I don't think that is unreasonable.

My thanks to Colin Kilpatrick who prompted me to look at this again.

Monday, July 06, 2009

Dynamically Switching PeopleSoft Temporary Records between Global Temporary and Normal Tables during Application Engine Programs

I am working on a Time & Labour system. We run the main T&L process in different modes. There is a company wide overnight batch process, but individual parts of the company can run the process for their section during the day to process exceptions and to generate schedules. This is done with a custom AE program that calls the delivered TL_TIMEADMIN.

Running on Oracle 10gR2, we have faced performance problems caused by contention between concurrent truncate operations in these processes (see Factors Affecting Performance of Concurrent Truncate of Working Storage Tables).

One way to evade this problem is to convert the working storage tables to be Global Temporary Tables (GTTs). The contention problem caused by serialisation of truncate operations does not occur with GTTs. Last year I wrote a blog entry (Global Temporary Tables and PeopleSoft Temporary Records) that discussed when it was possible to use GTTs in Application Engine (AE) programs. Essentially, you can only use GTTs when restart is disabled for all AE programs that reference a temporary records.

So we could consider running the small but frequent daytime processes with restart disabled. Then we can make the non-shared instances of the temporary records into Global Temporary tables for the non-restartable daytime processes. However, we want to continue to run the overnight processes with restart enabled so that we have the option to restart a process that fails overnight from where it crashed, rather than going back to the beginning. The same non-shared instances may need to be Global Temporary during the day but normal recoverable tables by night.

Previously, I have only converted tables to Global Temporary where they are not referenced by a restartable AE program. I have now devised a way of switching a table to being a GTT if it is allocated to a process for which restart is disabled, and switching it back to a normal table if not. This is the best of both worlds.

When a non-shared instance of a temporary record is allocated to a particular process instance, a row is inserted into PS_AETEMPTBLMGR. The value of the restart disable flag is also stored in PS_AETEMPTBLMGR. I have created a trigger that switches a non-shared instance of a temporary record from a normal table to a GTT if restart is disabled, or switches it back to a normal table if restart is enabled. The trigger will create the GTT and any indexes on it if necessary, and will rename the tables as necessary so that the correct version has the default name expected by the process.

So if, for example, I have instance number 11 of a record called WRK_SHIFT_TAO, then the table will be called PS_WRK_SHIFT_TAO11. If that is allocated to a non-restartable AE program, the trigger will check for a table called GT_WRK_SHIFT_TAO11. If it doesn't exist the trigger will create it, dynamically generating the DDL with the dbms_metadata package. Thus, the structure of the GTT will be identical to PS_WRK_SHIFT_TAO11, the leading PS will also be changed to GT. The same indexes will also be created. The original normal table PS_WRK_SHIFT_TAO11 will be renamed to XX_WRK_SHIFT_TAO11, and then the GTT, at that point called GT_WRK_SHIFT_TAO11 will be renamed to PS_WRK_SHIFT_TAO11. The indexes are not renamed. If later the same instance is allocated to a restartable process, the change will be reversed by renaming again. The GTT will not be dropped so that it does not need to be recreated again the next time the non-restartable program uses it.

All DDL generated and executed by the trigger is written to the Message Log for that process (click on the screen shot to enlarge).

The shared instance of a record (the one without a instance number suffix) is never rebuilt as a Global Temporary table because it is possible that a restartable and non-restartable process might both use the shared instance at the same time.

One complication is how to handle changes to the temporary records. Application Designer will only create normal tables. So, if the table is to be rebuilt, them it needs to be switched back to a normal table, and the corresponding GTT created by the trigger should then be dropped. The Application Designer can recreate the table in the usual way. Next time the non-restartable AE runs, it will recreate the GTT with the new structure.

An updated version of T_LOCK is available which handles PeopleSoft temporary records and prevent DDL on PS_ tables with corresponding GT_ tables. The commands which swap these tables back and forth are explicitly permitted by the new version of this trigger.