Wednesday, June 27, 2007

Data Guard Implications of NOLOGGING operations from PeopleTools 8.48

Stuff changes. I keep finding more changes in PT8.48.

From PeopleTools 8.48, PeopleSoft builds all indexes with the NOLOGGING option, and thne it alters them to logged objects. This is done to reduce redo generation during the index build process. It can save time by reducing redo, and can be useful in development environments.

However, this has implications when running Oracle Data Guard. If an operation is not logged on the primary, it will not be replicated to the standby databases.

Here is a simple test. We will create a table on the primary database, but the index will be created NOLOGGING, which exactly what PeopleSoft does. I have deliberately set the database to allow NOLOGGING operations

ALTER DATABASE NO FORCE LOGGING;

CREATE TABLE t (a number);

INSERT INTO t
SELECT rownum FROM dba_objects
WHERE ROWNUM <= 1000;

CREATE UNIQUE INDEX t_idx ON t (a) NOLOGGING;

ALTER INDEX t_idx LOGGING;

ALTER SYSTEM SWITCH LOGFILE;

Switching the log file forces the changes to be replicated to the secondary database. Now open the secondary database in read only:

SELECT /*+ FULL(t)*/ COUNT(a) FROM t;

COUNT(A)
----------
1000

So there are 1000 rows in the table, but if I force Oracle to count the rows in the index using a hint I get a corrupt block error:

SELECT /*+ INDEX_FFS(t t_idx)*/ COUNT(a) FROM t
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 1, block # 134419)
ORA-01110: data file 1: '/u01/oradata/hcm89/system01.dbf'
ORA-26040: Data block was loaded using the NOLOGGING option

This could happen every time you build an index using a script generated by the application designer. The status on the corrupt index is still VALID, so you would have no way of knowing whether this problem existed before you open your standby and run on it (because you can't run PeopleSoft on a standby database in read only mode).

So how can you prevent this from happening in the first place?

You could alter the DDL models to remove the NOLOGGING option from the CREATE INDEX DDL model. But the same argument applies to any NOLOGGING operation performed in the database. You are likely to get corrupt objects in your standby database. Although, I know of no other NOLOGGING operations in a vanilla PeopleSoft database, if any are introduced during development then DBAs are unlikely to be able to change them.

It was for exactly this reason that Oracle provided the ability to force NOLOGGING operations to generate redo information. It is set with the following command:

ALTER DATABASE FORCE LOGGING;

Now the database will log all operations regardless and all objects and operations will be fully replicated to the standby database. However, nothing is free. NOLOGGING is often used to improve performance by reducing redo, for example when populating working storage tables which will never need to be recovered. These performance improvements will be lost when logging is forced.

Database Login Information In Your Sqlplus Prompt

Have you ever run the wrong script in SQL*Plus in the wrong database?

It is common to have many databases in PeopleSoft development environments, and developers must often connect to different databases. To help reduce the chance of accidentally working in the wrong environment I have my login script on SQL*Plus that puts information about my session in the command prompt. When you connect to a database SQL*Plus automatically runs %ORACLE_HOME%/sqlplus/admin/glogin.sql. I have put my own login script gfclogin.sql into the same directory. It is called from glogin.sql like this:

@@gfclogin.sql

The double @ means that the called script is in the same directory as the calling script. gfclogin.sql runs a number of SQL queries and stores the output in the SQLPROMPT. Note that from SQL*Plus version 10, the login script is run on every connections, previously it was only run for the initial connection. This has one negative side effect. It will overwrite and SQL statement in the SQL*Plus command buffer when you reconnect (from SQL*Plus 10)

The some of the queries in the script only work if the connecting user has SELECT_CATALOG_ROLE (or individual privilege to query v$session, v$process, v$database. If the user doesn't have these privileges the prompt will just contain the user name

SCOTT>

On a PeopleSoft database it will also report the name of the PeopleSoft database from PS.PSDBOWNER, thus:

SYSADM-HCM89>

If the user has SELECT_CATALOG_ROLE it will also show the session ID, session serial number, database name, name of OS user running the database, and the node name of the database server.

SCOTT.147:4264.GOFASTER.SYSTEM.GO-FASTER-4>

or

SYSADM-HCM89.148:5420.HCM89.SYSTEM.GO-FASTER-4>

Friday, June 15, 2007

Unix Process Limits for PeopleSoft

I have been having some trouble with long running Component Interfaces/Application Engine processes. The amount of memory allocated to the process continues to grow throughout the life of the process, and eventually the process hits the limit imposed by Unix. This leads to two modes of error.

You will get an error message like this in the Application Engine log file.

SQL error. Stmt #: Error Position: Return: 8007 - Insufficient memory available

The process might continue to run, or it might fail (I am not sure what makes the difference here), but if it terminates you get will get a message like this also in the Application Engine log.

<Unix PID>: 1181303384: PSPAL::Abort: Unrecoverable signal received
<Unix PID>: 1181303384: PSPAL::Abort: Location: /vob/peopletools/src/pspal/exception_sigaction.cpp:553: PSPAL::SigactionSignalHandler::UnrecoverableSignalHandler(int, siginfo_t *, void *)
<Unix PID>: 1181303384: PSPAL::Abort: Generating process state report to <PS HOME>appserv/prcs/<Process Scheduler Name>/LOGS/psae.<Unix PID>/process_state.txt
<Unix PID>: 1181303384: PSPAL::Abort: Recursive call to Abort detected. Terminating process now.

A core file is also produced.

Application Engine seems to continue to allocate memory but not relinquish it again. I have found support notes on Customer Connection that recommend use of the CollectGarbage() function in PeopleCode in order to 'remove any unreachable application objects created by Application Classes'. However, my tests suggest that this function does not cause any memory to be released by the process.

Another suggestion from Customer Connection was to raise the Unix system limit for the soft data segment with ulimit -d. This has resolved the memory errors. The command ulimit -d can be user to set soft limits in the Unix shell. However, they cannot be set higher than hard limits. Hence it is necessary to set unlimited parameters in /etc/security/limits (this file can only be read or updated by the superuser: root).

It is common to disable the limits for Oracle, as shown in the example below. I think the same should be done for PeopleSoft accounts. Machine-wide parameter settings the set in the default section. Individual parameters can then be overridden for individual users. The following example is taken from AIX 5.3. The sizes are multiples of 512 byte blocks. A value of -1 means that the parameter is unlimited.

default:
fsize = 2097151
core = 2097151
cpu = -1
data = 262144
rss = 65536
stack = 65536
nofiles = 4000
root:
fsize = -1
data = 4194303
rss = 4194303
nofiles = 4000

oracle:
fsize = -1
data = -1
stack = -1

psoftusr:
data = -1

Changes to the limits take effect when the user next logs onto Unix. After changing the limits you need to log out and back into your Unix account and restart the process scheduler