The PeopleSoft DBA Blog: June 2008

Thursday, June 26, 2008

Discrepancy in Timings between Application Engine and Oracle SQL Trace

I would like to draw attention to the risk of uncritically accepting performance information from either a tool or an instrumented program. It is always necessary to consider whether you can believe what you are being told, or whether something is lying to you.

In this case, I want to highlight in the Application Engine Timings report. Take the following example from Time and Labor. The AE Timings report claims that the process spends 91.5% of the execution time in SQL. This would lead you to believe that this process is affected by the performance of the SQL, and that you need to do some SQL performance tuning.

---------------------------------------------------------------------------------
Total run time                : 1411.6
Total time in application SQL : 1291.7   Percent time in application SQL :  91.5%
Total time in PeopleCode      :  119.4   Percent time in PeopleCode      :   8.5%
Total time in cache           :    0.1   Number of calls to cache        :     3
---------------------------------------------------------------------------------

However, all is not quite as it appears. Faced with a potential SQL problem, I usually turn to Oracle SQL Trace. The following is taken from a TKPROF analysis of the trace file for the same process.

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse   153883      1.80       2.41          0          4          0           0
Execute 192806    429.94     464.96       8398     276000     588543      123140
Fetch    52740      2.75       3.55        590     202419          0       15265
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   399429    434.49     470.93       8988     478423     588543      138405

Elapsed times include waiting on following events:
Event waited on                             Times   Max. Wait  Total Waited
----------------------------------------   Waited  ----------  ------------
SQL*Net message to client                  196521        0.00          0.13
SQL*Net message from client                196521        0.32        849.98
...
SQL*Net more data to client                114884        0.00          1.91
SQL*Net more data from client              250825        0.05          1.88
...

The database reports 849.98s for the idle event SQL*Net message from client, and only 470.93s on non-idle activity. So only 36% of the total elapsed time is spent in the database, the other 64% is time that the Application Engine process is busy executing code.

Why the discrepancy?

This is certainly a very chatty process. There are nearly 200,000 round-trip SQL*net messages between AE and Oracle. It could be a network latency problem. However, very little time is spent on the SQL*Net more data events so it is unlikely that there is a significant network component.

Let's look at the statements that have the longest times in AE timing report .

                         PeopleSoft Application Engine Timings
                              (All timings in seconds)

                               C o m p i l e    E x e c u t e    F e t c h        Total       
SQL Statement                  Count   Time     Count   Time     Count   Time     Time
------------------------------ ------- -------- ------- -------- ------- -------- --------
PeopleCode

DELETE PS_SCH_ADHOC_DTL                           36540     60.6       0      0.0     60.6
INSERT PS_SCH_ADHOC_DTL                           71887    885.6       0      0.0    885.6
Record.SelectByKey SCH_ADHOC_DTL                  36596    118.0   36596      0.0    118.0
                                                                                  --------
                                                                                    1291.7

In less than 24 minutes, some statement have been executed very frequently. So, there were lots of SQL*Net messages, because there were lots of SQL statements. These SQL statements are submitted from PeopleCode. It is very likely we are in some sort of procedural loop, and therein lies the problem.

PeopleSoft programs can only measure the time taken to execute SQL from the perspective of the client program. In this case the timings are calculated by instrumentation inside Application Engine and PeopleCode. Meanwhile, the timings in the Oracle trace are obtained from instrumentation within the Oracle kernel.

There is still a fair amount of code in PeopleSoft that executes during the time counted as execution of SQL Statements. This is not just the SQL*Net libraries. I think that the timings include most of the time it takes for sqlexec() and Row set functions to execute. The discrepancy with the Oracle time mounts with every SQL operation.

In this case, there is very little to be gained by tuning the SQL. We can't do much about the PeopleCode because it is delivered vanilla functionality. All we can do is to bring more CPU to bear upon the problem by running multiple concurrent instances of the process.

PeopleSoft Temporary Records are used for working storage during Application Engine programs. Typically, AE programs truncate and repopulate the tables before using them. PeopleSoft recognised the need to keep the statistics on these tables in line with the data that they contain, and so used the %UpdateStats macro in many places in delivered programs to update the statistics.

However, frequently gathering statistics on even small tables can become time consuming. Recently, I have been working on PeopleSoft Time and Labor. This makes heavy use of temporary records. In a single execution of TL_TIMEADMIN, several tables associated with temporary records are truncated, repopulated and analyzed many times. I discussed the problem with excessive use of truncate elsewhere giving rise to Local Write Wait.

Oracle also recognised this problem, and in version 9 of the database they introduced Optimizer Dynamic Sampling, where the database samples the data to generate statistics at statement parse time.

I am still testing, but on Oracle 10gR2 (version 10.2.0.3) I have obtained improvements in performance and stability of T&L AE processes by:

Deleting optimizer statistics on tables associated with temporary record in order to force the optimizer to sample at parse time
Locking optimizer statistics to prevents the %UpdateStats macro from putting them back on. Tables with locked statistics are also omitted by GATHER_SCHEMA_STATS and GATHER_DATABASE_STATS (unless the force option is specified) and hence also by the delivered maintenance window job to refresh stale statistics.
Implementing alternative DDL model that uses a PL/SQL packaged function to suppress the error when attempting to collect statistics on table whose statistics are locked (see %UpdateStats() -v- Optimizer Dynamic Sampling). This also addresses the the mix-up in the DDL models
The final piece of the puzzle has been to set OPTIMIZER_DYNAMIC_SAMPLING to 4 at instance level. I certainly have had problems with this parameter set to the default of 2.

The dynamic sampling levels are described in the Performance Tuning Guide 14.5.6.4.

Level 2: Apply dynamic sampling to all unanalyzed tables.
Level 3: As Level 2, plus all tables for which standard selectivity estimation used a guess for some predicate that is a potential dynamic sampling predicate.
Level 4: As Level 3, plus all tables that have single-table predicates that reference 2 or more columns.

So the next stage is to identify working storage records and their associated tables.
I started off looking for tables that had recently been analysed.

ttitle 'Candidate for Locking - based on recent batches'
SELECT DISTINCT r.recname
FROM   psrecdefn r
,       (
SELECT A.RECNAME
,      SUM(B.TEMPTBLINSTANCES) TEMPTBLINSTANCES
FROM   PSAEAPPLTEMPTBL A
,      PSAEAPPLDEFN B
WHERE  A.AE_APPLID = B.AE_APPLID
AND    A.AE_APPLID LIKE 'TL%'
GROUP BY A.RECNAME
) /*pstemptblcntvw*/ i
, psoptions o
, user_tables t
, (SELECT rownum-1 row_number
  FROM   all_objects WHERE ROWNUM <= 100) v   WHERE  r.rectype IN(0,7)   AND    r.recname = i.recname   AND    v.row_number           <= i.temptblinstances + o.temptblinstances   AND    t.table_name       = DECODE(r.sqltablename,' '              ,'PS_'||r.recname,r.sqltablename)||        DECODE(v.row_number*r.rectype,0,''              ,LTRIM(TO_NUMBER(v.row_number)))    AND    t.last_analyzed > SYSDATE - 1
/
ttitle off

The following script identifies all instances of temporary tables associated with temporary records, and then deletes and locks the statistics. I started by restricting it to list of specific tables, but I think it would be perfectly reasonable to take this approach with all temporary records.

BEGIN
 FOR x IN (
  SELECT /*+LEADING(o i r v)*/ t.table_name, t.last_analyzed, t.num_rows
  ,      s.stattype_locked
  FROM pstemptblcntvw i
    INNER JOIN psrecdefn r
    ON r.recname = i.recname
    AND r.rectype = '7'
  , psoptions o
  , user_tables t
     LEFT OUTER JOIN user_tab_statistics s
     ON  s.table_name = t.table_name
     AND s.partition_name IS NULL
  , (SELECT rownum row_number
     FROM   psrecdefn 
     WHERE  ROWNUM <= 100) v                 
  WHERE  v.row_number <= i.temptblinstances + o.temptblinstances
  AND    t.table_name = DECODE(r.sqltablename,' ','PS_'||r.recname,r.sqltablename)
                      ||DECODE(v.row_number*r.rectype,100,'',LTRIM(TO_NUMBER(v.row_number))) 
/*---------------------------------------------------------------------            
--AND    r.recname IN('TL_PMTCH1_TMP' --TL_TA000600.SLCTPNCH.STATS1.S…
--                   ,'TL_PMTCH2_TMP' --TL_TA000600.CALC_DUR.STATS1.S…)
-----------------------------------------------------------------------*/
  AND   (/*  t.num_rows        IS NOT NULL --not analyzed 
        OR   t.last_analyzed   IS NOT NULL --not analyzed
        OR*/ s.stattype_locked IS     NULL --stats not locked
        ) 
) LOOP
  IF x.last_analyzed IS NOT NULL THEN --delete stats
   dbms_output.put_line('Deleting Statistics on '||user||'.'||x.table_name);
   dbms_stats.delete_table_stats(ownname=>user,tabname=>x.table_name,force=>TRUE);
  END IF;
  IF x.stattype_locked IS NULL THEN --lock stats
   dbms_output.put_line('Locking Statistics on '||user||'.'||x.table_name); 
   dbms_stats.lock_table_stats(ownname=>user,tabname=>x.table_name);
  END IF;
 END LOOP;
END;
/

Updated 11.2.2009: I have updated my advice on the use of Optimiser Dynamic Sampling (see %UpdateStats -v- Optimizer Dynamic Sampling. I still consider this to be a useful feature, but I have found scenarios where Oracle has not chosen a better plan that it did choose with explicitly gathered statistics. Therefore, I still suggest locking statistics on temporary working storage record, but where batch programs have been coded to explicitly update statistics then dbms_stats should be called with the force option to override the lock. I have updated my
DDL model wrapper script accordingly.

The scripts in this posting can be downloaded from my website

The PeopleSoft DBA Blog

Thursday, June 26, 2008

Discrepancy in Timings between Application Engine and Oracle SQL Trace

Tuesday, June 03, 2008

Oracle Optimizer Statistics and Optimizer Dynamic Sampling with PeopleSoft Temporary Records

Announcements

My Book

Oracle ACE Director

BlueSky/Twitter

Featured post

nVision Performance Tuning: Table of Contents

Blog Archive

Popular Posts

PeopleSoft Administrator Podcasts

Tags

Other PeopleSoft Blogs

Other Oracle Blogs

OakTable Network

©David Kurtz

The PeopleSoft DBA Blog

Thursday, June 26, 2008

Discrepancy in Timings between Application Engine and Oracle SQL Trace

Tuesday, June 03, 2008

Oracle Optimizer Statistics and Optimizer Dynamic Sampling with PeopleSoft Temporary Records

Announcements

My Book

Oracle ACE Director

BlueSky/Twitter

Featured post

nVision Performance Tuning: Table of Contents

Blog Archive

Popular Posts

PeopleSoft Administrator Podcasts

Tags

Subscribe

Other PeopleSoft Blogs

Other Oracle Blogs

OakTable Network

©David Kurtz