Wednesday, May 30, 2007

PeopleTools 8.48 Process Scheduler Behaviour

Things change, and I recently got caught out by one of those changes.

In PeopleTools 8.4x, the Process Scheduler became a proper Tuxedo domain. By default, it is configured to run Application Engine server processes (PSAESRV). These are persistent Tuxedo server processes that handle Application Engine requests. This change was introduced to help CRM systems that typically run a very large number of Application Engine programs. Because the server process is persistent it saves the overhead of starting a new Application Engine process. An obvious side effect of this change in architecture is that if you shut the Process Scheduler down, it terminates the PSAESRV processes, and cancels any Application Engine program that is running.

You can choose to configure the Process Scheduler without PSAESRV processes, and it will start individual PSAE processes for each Application Engine request.

In previous versions of PeopleTools, it was possible to shut the Process Scheduler down and any running processes (except PSAESRVs) would continue to run. This is not the case in PeopleTools 8.48. On shutdown, the Process Scheduler attempts to cancel the PSAE process. If it is not successful the status of the process goes to Error, but the process continues to run.

PSPRCSRV.7150 (0) [...](3)    Server: PSUNX checking status...
PSPRCSRV.7150 (0) [...](3) Server action mode: Ok (looking for requests)
PSPRCSRV.7150 (0) [...](3) Server: PSUNX looking for work
PSPRCSRV.7150 (0) [...](3) Checking Process cancels...
PSPRCSRV.7150 (0) [...](3) Process 69 is still running as Session ID 7552
PSPRCSRV.7150 (0) [...](3) Application Engine : 1:3
PSPRCSRV.7150 (0) [...](0) Server is shutting down


So, the Process Scheduler knows that process 69 is still running but continues to shutdown

PSPRCSRV.1360 (0) [...](1) =================================Error===============================
PSPRCSRV.1360 (0) [...](1) Process 69 is marked 'Initiated' or 'Processing' but can not detect status of PID
PSPRCSRV.1360 (0) [...](3) Updating process instance's status to Error.
PSPRCSRV.1360 (0) [...](2) Process Name: PSQUERY
PSPRCSRV.1360 (0) [...](2) Process Type: Application Engine
PSPRCSRV.1360 (0) [...](2) Session Id: 7552
PSPRCSRV.1360 (0) [...](2) =====================================================================
PSPRCSRV.1360 (0) [...](3) Number of requests removed from Task Active List: 1
PSPRCSRV.1360 (0) [...](3) Server action mode: Initializing


But when it starts up again it cannot detect the PID of the process, and so it marks the status as 3 (Error).

I'm not sure exactly when this change was introduced, but this is the first time I have encountered it.

Tuesday, May 08, 2007

Column default values in PeopleTools 8.48 DDL

PeopleSoft’s Mobile Synchronization Framework is designed to support off-line access to a subset of the PeopleSoft application. Application Designer has been able to generate database triggers to maintain Timestamp and System ID fields on tables that support synchronization of mobile clients since PeopleTools 8.44. In 8.48 it is now possible to set column defaults on character and numeric fields via Application Designer. However, this has potential for much wider application in PeopleSoft applications.

There are two new checkboxes on the record properties dialogue box.


If the User -> Server box is checked, Application Designer will add column defaults to the fields in the generated create and alter table DDL. As an example, I created the following record in Application Designer.



And this was the DDL that Application Designer generated.

CREATE TABLE PS_DMK (
EMPLID VARCHAR2(11) DEFAULT ' ' NOT NULL,
EFFDT DATE,
SEX VARCHAR2(1) DEFAULT 'U' NOT NULL,
ACTION_DT DATE NOT NULL,
ARRIVAL_TIME DATE NOT NULL,
ACTION_DTTM DATE,
AA_PLAN_YR SMALLINT DEFAULT 42 NOT NULL,
ACCRUED_AMOUNT DECIMAL(15, 2) DEFAULT 0 NOT NULL)


• All of the numeric and character columns now have default values, but none of the date columns have defaults.
• If a literal value is specified in Application Designer it is carried forward to the DDL, otherwise a character field defaults to a single space and a numeric field to zero
• Whether a field is required or not does not affect the default value,
It is a pity that the system variables to specify current date or time do not get converted to SYSDATE in the DDL, thus


ACTION_DT DATE DEFAULT TRUNC(SYSDATE) NOT NULL,
ARRIVAL_TIME DATE DEFAULT SYSDATE NOT NULL,


I assume that this is because it is platform specific, but then so is the rest of the field list.

There is another problem to using the Up Sync check box. When checked, you are prevented you from making a field required. The unchecked box is greyed out, but fields that are already required remain so.

I think that the ability to specify column defaults could have advantages during batch processing. All numeric and character columns in PeopleSoft (with minor exceptions) are NOT NULL in the database. Thus, if a process is inserting a row into that table, it must provide a value for every mandatory column. Otherwise, you generate an Oracle error.

ORA-01400: cannot insert NULL into ("SYSADM"."table name"."field name")

Specifying a default value will prevent this error. It would also save developers from having to add this default values to their code, thus simplifying development and maintenance.

Many batch processes in PeopleSoft process working storage or reporting tables. Often they insert rows, filling in key and some other columns, and then update other columns later on. If these columns have defaults they do not need to be referenced in the insert statement.

When a new column is added to a table, it is necessary to find every place where a row is inserted, and add a value. Specifying a column default would save adding a space or zero to those inserts.

Wednesday, May 02, 2007

%UpdateStats() -v- Optimizer Dynamic Sampling

My previous post about the changes to DDL models in PeopleTools 8.48 made me to think about whether %UpdateStats() PeopleCode macro is the best solution to managing statistics on working storage tables in Oracle.

Optimizer Dynamic Sampling was introduced in Oracle 9.0.2. as a solution to the same problem. When a query is compiled Oracle can collect some optimiser statistics based upon a small random sample of blocks for tables that do not have statistics and that meet certain other criteria depending upon the parameter OPTIMIZER_DYNAMIC_SAMPLING. In Oracle 10g the default value for this parameter changed from 1 to 2 and so dynamic sampling applies to ALL unanalysed tables.

Thus, it should be possible to resolve the problem of incorrect statistics on a working storage table without explicitly collecting statistics during an Application Engine program, and therefore without needing a code change to add %UpdateStats(). Instead, simply delete statistics from the table, and lock them. A subsequent GATHER_SCHEMA_STATS will skip any locked tables. When a query references the table it will dynamically sample statistics and use them in determining the execution plan.

However, there is one more problem to overcome. GATHER_TABLE_STATS will raise an exception on a table with locked statistics. If you want to use Dynamic Sampling on a table where %UpdateStats() is already used to update the statistics, the PeopleCode macro will raise an exception that will cause Application Engine programs to terminate with an error. The workaround is to encapsulate GATHER_TABLE_STATS in a procedure that handles the exception, and reference the procedure in the DDL model. It is not possible to put a PL/SQL block in DDL model.


CREATE OR REPLACE PACKAGE BODY wrapper AS
PROCEDURE ps_stats(p_ownname VARCHAR2, p_tabname VARCHAR2, p_estpct NUMBER) IS
 table_stats_locked EXCEPTION;
 PRAGMA EXCEPTION_INIT(table_stats_locked,-20005);
 l_temporary VARCHAR2(1 CHAR);
 l_force BOOLEAN := TRUE;
BEGIN
 BEGIN
  SELECT temporary
  INTO   l_temporary
  FROM   all_tables
  WHERE  owner = p_ownname
  AND    table_name = p_tabname
  ;
 EXCEPTION WHEN no_data_found THEN
  RAISE_APPLICATION_ERROR(-20001,'Table '||p_ownname||'.'||p_tabname||' does not exist');
 END;

 IF l_temporary = 'Y' THEN
  l_force := FALSE; --don't force stats collect on GTTs
 ELSE
  l_force := TRUE; --don't force stats collect on GTTs
 END IF;

 IF p_estpct = 0 THEN
  sys.dbms_stats.gather_table_stats
  (ownname=>p_ownname
  ,tabname=>p_tabname
  ,estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE
  ,method_opt=>'FOR ALL COLUMNS SIZE AUTO'
  ,cascade=>TRUE
  ,force=>l_force
  );
 ELSE
  sys.dbms_stats.gather_table_stats
  (ownname=>p_ownname
  ,tabname=>p_tabname
  ,estimate_percent=>p_estpct
  ,method_opt=>'FOR ALL COLUMNS SIZE AUTO'
  ,cascade=>TRUE
  ,force=>l_force
  );
 END IF;
EXCEPTION
 WHEN table_stats_locked THEN NULL;
END ps_stats;
END wrapper;
/

At this time I have no data to determine which method is more likely to produce the better execution plan. However, when performance problems occur in production they are instinctively routed to the DBA, who is likely to have difficulty introducing a code change at short notice. Dynamic Sampling has some clear advantages.

Update 12.2.2009: Since writing this note I have come across some scenarios on Oracle 10.2.0.3 where the optimiser does not chose the best execution plan with dynamically sampled statistics, even with OPTIMIZER_DYNAMIC_SAMPLING set to the highest level, and I have had to explicitly collect statistics on working storage tables.

I have adjusted the wrapper procedure to call dbms_stats with the force option on permanent tables (so it collects statistics on tables whose statistics are locked and doesn't raise an exception, although it does not use the force option on Global Temporary Tables.

I still recommend that statistics should still be locked on tables related to PeopleSoft temporary records. The rationale for this is that you don't want any schema- or database-wide process that gathers statistics to process any working storage tables. Either the data in the table at the time the schema-wide statistics are gathered is not the same as when process runs and so you still want to use dynamic sampling, or else the process that populates the table has been coded to explicitly call %UpdateStats, and the new version of the wrapper will update these statistics.

Changes to DDL Models in PeopleTools 8.48

A recent thread on Oracle-L led me to look at how Oracle has changed the way that PeopleTools 8.48 collects Oracle Cost-Based Optimiser statistics. It now uses DBMS_STATS instead of the ANALYZE command. This has also caused me to reconsider some options for managing statistics for a PeopleSoft system.

Application Engine programs can collect Cost-Based Optimiser statistics on specific tables by calling the %UpdateStats([,high/low]); PeopleCode macro. This uses one of two DDL models depending on whether the high or low option is specified. However, these DDL models only exist for Oracle and DB2/MVS. %UpdateStats() has no function on other platforms.

This was PeopleSoft’s platform generic solution (before their takeover by Oracle, and before Dynamic Sampling was available in the Oracle database) to the very real problem that occurs when statistics on a working storage or reporting table, that is emptied, repopulated and used during a batch process, do not accurately represent the content of the table and hence cause the optimiser to choose an inappropriate execution plan. PeopleSoft provided a method of refreshing the statistics during the process, and introduced new DDL models because each database platform would have its own command. However, this approach relies upon developers to add the %UpdateStats() macro for every occasion where data has changed sufficiently to require refreshing the statistics. Unfortunately, developers are not always best placed to make that decision. There are still plenty of places in delivered PeopleSoft code where this macro could be usefully added.

Up to PeopleTools 8.47, PeopleSoft delivered two DDL models that used the ANALYZE command. The %UpdateStats(,high) ran a full compute of the table:

ANALYZE TABLE [TBNAME] COMPUTE STATISTICS;

While %UpdateStats(,low) estimated statistics with the default sample size:

ANALYZE TABLE [TBNAME] ESTIMATE  STATISTICS;

From PeopleTools 8.48, these DDL models now call the Oracle supplied PL/SQL package DBMS_STATS. The high option still performs a full compute of statistics.

DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=>[TBNAME], estimate_percent=>1, method_opt=> 'FOR ALL COLUMNS SIZE 1',cascade=>TRUE);

While the low option estimates statistics with the sample size determined by the pseudo-variable. DBMS_STATS.AUTO_SAMPLE_SIZE.

DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=>[TBNAME], estimate_percent=> dbms_stats.auto_sample_size, method_opt=> 'FOR ALL INDEXED COLUMNS SIZE 1',cascade=>TRUE);

So it would appear that PeopleSoft now follow the recommendations that Oracle have been making since version 8i of the database to use DBMS_STATS instead of the ANALYZE command. This is certainly a step in the right direction. It also makes good sense to use the automatic sample size. ESTIMATE_PERCENT defaults to DBMS_STATS.AUTO_SAMPLE_SIZE from Oracle 10g. Previously it was NULL, which caused a full compute.

However, there is a problem.

PeopleSoft have chosen to specify the METHOD_OPT as FOR ALL INDEXED COLUMNS SIZE 1. If you have specified histograms on any of your columns, or generated them automatically with DBMS_STATS, the PeopleSoft command will remove them from indexed columns and will leave any histograms on unindexed columns unchanged, and potentially out of date.

The default in Oracle 9i is FOR ALL COLUMNS SIZE 1. This removes all histograms on all columns, although this is at least the same behaviour as the ANALYZE command.

In Oracle 10g, METHOD_OPT defaults to FOR ALL COLUMNS SIZE AUTO. The Oracle manual states that the database ‘…determines the columns to collect histograms based on data distribution and the workload of the columns’. So, Oracle may remove histograms if it judges that they are not necessary.

I have no hesitation in recommending that value for METHOD the delivered DDL models should be changed. I would suggest using FOR ALL COLUMNS SIZE REPEAT.

Addendum 12.6.2007: I have noticed that the DDL models change again in PT8.48.07. The full compute is now for ALL COLUMNS, but the estimate is still for ALL INDEXED COLUMNS! Closer, but still no cigar!