Showing posts with label Active Session History. Show all posts
Showing posts with label Active Session History. Show all posts

Friday, October 04, 2024

Cursor Sharing in Scheduled Processes: 4. How to Identify Candidate Processes for Cursor Sharing

This is the last in a series of 4 posts about the selective use of cursor sharing in scheduled processes in PeopleSoft.

In this article, I look at a method to identify candidate processes for cursor sharing.  Then it is necessary to test whether cursor sharing actually is beneficial.

My example is based on nVision reports in a PeopleSoft Financials system, but the technique can be applied to other processes and is not even limited to PeopleSoft.  nVision reports example because they vary from report to report, depending upon how they are written, and the nature of the reporting trees they use.  Some nVision reports benefit from cursor sharing, others it makes little difference, and for some it is detrimental.

As always Active Session History (ASH) is your friend.  First, you need to know which ASH data relates to which process, so you need to enable PeopleSoft instrumentation (see Effective PeopleSoft Performance Monitoring), and install my psftapi package and trigger to enable instrumentation of Cobol, nVision and SQR.

Candidates for Cursor Sharing

Use ASH for a given process to identify candidate processes by calculating the following measures.
  • Elapsed time of the process from the first to the last ASH sample.  This is not the elapsed duration of the client process, but it will be a reasonable approximation. Otherwise, you can get the exact duration from the process request record (PSPRCSRQST).
  • Total database time for a process (all ASH samples).
  • Total time that a process is restrained by the resource manager (where EVENT is 'resmgr: CPU quantum')
  • Total database time spent on CPU (where EVENT is null).
  • Total database time spent on SQL parse (where IN_PARSE flag is set to Y)
  • Number of distinct SQL IDs.
  • Number of distinct force matching signatures.
Look for processes with high elapsed time, of which a significant proportion is spent on both CPU and SQL parse.  This should correlate with processes where there are many more SQL IDs than force matching signatures.

Is Cursor Sharing Enabled Already?

It is possible to determine whether cursor sharing is already set for a process, although this is not explicitly recorded.  
  • If cursor sharing is not enabled then the number of distinct SQL_IDs should be greater than the number of distinct force-matching signatures. This may not be the case if you don't have enough ASH samples, but then the program probably doesn't consume enough time for it to be worth considering cursor sharing.
  • If the number of SQL_IDs is equal to the number of force matching signatures then cursor sharing is probably enabled, but again this could be unreliable if the number of ASH samples is low (and close to the number of SQL IDs).
  • It should be impossible for the number of distinct SQL IDs to be less than the number of distinct force matching signatures, but it can happen due to quirks in ASH sampling.
I have coded this into my queries.  It will be reasonably accurate if you have several ASH samples per SQL ID.  Otherwise, you may detect false positives.

Sample Queries and Output

I have written a couple of queries that I have published on GitHub.  They happen to be specific to nVision, but can easily be extended to other processes.
  • The first query calculates average values for each process/run control ID combination within the AWR retention period (high_parse_nvision_avg.sql)
Having implemented cursor sharing for a particular process it is necessary to watch it over time and decide whether the change has been effective. The metrics shown below come from a real system (although actual run control IDs have been changed).  
  • All the timings for NVS_RPTBOOK_1 have come down significantly. The number of SQL_IDs has dropped from 238 to 11.  The number of force matching signatures has also dropped, but that is because we have fewer ASH samples and some statements are no longer sampled at all.  Cursor sharing is beneficial and can be retained.
  • However, this is not the case for the second process. Although NVS_RPTBOOK_2 looked like a good candidate for cursor sharing, and the parse time has indeed come down, all the other durations have gone up.  The cursor sharing setting will have to be removed for this report.
                                            Cursor           Avg StdDev    Avg StdDev    Avg StdDev    Avg StdDev    Avg StdDev   Avg
                                    Cursor  Sharing   Num   Elap   Elap    ASH    ASH ResMgr ResMgr  Parse  Parse    CPU    CPU   SQL  Avg
OPRID      RUNCNTLID                Sharing Setting Procs   Secs   Secs   Secs   Secs   Secs   Secs   Secs   Secs   Secs   Secs   IDs  FMS
---------- ------------------------ ------- ------- ----- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ----- ----
…
NVISION    NVS_RPTBOOK_1            EXACT   FORCE      33   3691   1062   2687   1071    741    702   2232    932   1791    479   238   16
                                    FORCE   FORCE      13   1623    377    664    394    357    373     43     19    353     85    11   12
…
           NVS_RPTBOOK_2            EXACT   EXACT      39   3696   1435   3316   1431   1038    927   1026    661   2042    611   137   27
                                    FORCE   EXACT       7   4028   2508   3676   2490   1333   1563     17     12   2275    939    19   19

It is always worth looking at individual process executions.  

We can see that cursor sharing was introduced on 31st July.  Even though there is a lot of variance in runtimes due to variances in data volumes and other system activities, it is clear that cursor sharing is beneficial for this process.

                                                                                                                                       Cursor
                              Process R                                              Elap    ASH ResMgr  Parse    CPU   SQL        ASH Sharing Cursor  Parse   S:F
OPRID      RUNCNTLID         Instance S  MIN_SAMPLE_TIME      MAX_SAMPLE_TIME        Secs   Secs   Secs   Secs   Secs   IDs  FMS  Samp Setting Sharing     % Ratio
---------- ---------------- --------- -- -------------------- -------------------- ------ ------ ------ ------ ------ ----- ---- ----- ------- ------- ----- -----
NVISION    NVS_RPTBOOK_1     12447036 9  21.07.2024 21.03.25  21.07.2024 21.47.02    2645   1543    174   1297   1277   145   17   150 FORCE   EXACT      84   8.5
                             12452568 9  22.07.2024 21.02.04  22.07.2024 21.41.03    2373   1413    123   1188   1250   133   13   138 FORCE   EXACT      84  10.2
                             12458455 9  23.07.2024 21.07.15  23.07.2024 21.52.25    2759   1587     51   1372   1423   152   14   155 FORCE   EXACT      86  10.9
                             12465042 9  24.07.2024 20.58.08  24.07.2024 21.50.19    3154   2100    369   1782   1557   201   18   205 FORCE   EXACT      85  11.2
                             12471732 9  25.07.2024 21.25.34  25.07.2024 22.46.32    4885   3861   1946   3318   1843   333   14   377 FORCE   EXACT      86  23.8
                             12477118 9  26.07.2024 22.41.07  26.07.2024 23.26.07    2730   1791    113   1526   1586   173   14   174 FORCE   EXACT      85  12.4
                             12479163 9  27.07.2024 23.13.40  28.07.2024 00.01.23    2917   1688    161   1513   1260   156   14   164 FORCE   EXACT      90  11.1
                             12480710 9  28.07.2024 21.47.44  28.07.2024 22.29.08    2529   1586    205   1320   1238   149   12   154 FORCE   EXACT      83  12.4
                             12487744 9  29.07.2024 21.47.44  29.07.2024 22.51.05    3834   2815    797   2292   1843   248   16   273 FORCE   EXACT      81  15.5
                             12495417 9  30.07.2024 22.57.13  30.07.2024 23.46.48    3015   2084    307   1869   1592   200   15   203 FORCE   EXACT      90  13.3
…
                             12501446 9  31.07.2024 21.27.51  31.07.2024 21.51.18    1478    461     72     31    389    10   11    45 FORCE   FORCE       7   0.9
                             12507769 9  01.08.2024 21.44.01  01.08.2024 22.05.56    1387    357    100     21    246     7    8    34 FORCE   FORCE       6   0.9
                             12513527 9  02.08.2024 21.02.27  02.08.2024 21.27.47    1538    635    236     31    400    11   12    62 FORCE   FORCE       5   0.9
                             12515368 9  03.08.2024 22.12.50  03.08.2024 22.40.03    1682    686    143     51    532     9   10    67 FORCE   FORCE       7   0.9
                             12516959 9  04.08.2024 21.38.01  04.08.2024 21.57.00    1263    266            51    266     8    9    26 FORCE   FORCE      19   0.9
                             12522863 9  05.08.2024 21.14.36  05.08.2024 21.48.40    2082   1167    727     51    430    14   13   114 FORCE   EXACT       4   1.1
                             12529263 9  06.08.2024 21.02.59  06.08.2024 21.39.47    2223   1300    900     51    389    12   13   126 FORCE   FORCE       4   0.9
                             12535782 9  07.08.2024 21.08.23  07.08.2024 21.37.48    1774    974    585     52    379    12   13    94 FORCE   FORCE       5   0.9
                             12541727 9  08.08.2024 21.07.43  08.08.2024 21.40.54    2014   1085    809     51    276    16   17   106 FORCE   FORCE       5   0.9
                             12547232 9  09.08.2024 21.27.28  09.08.2024 21.47.08    1213    236            31    236     8    9    23 FORCE   FORCE      13   0.9
…
Note that on 5th August the report erroneously claims that cursor sharing went back to EXACT.  This is because there are more SQL_IDs than force matching signatures.  Again, this is a quirk of ASH sampling. 

Monday, December 02, 2019

Practical Application Performance Tuning: An nVision Case Study

I gave this presentation at the UKOUG Techfest 19 conference.  It is closely based on a previous presentation about PeopleSoft nVision performance tuning, and uses the experience of a PeopleSoft project as a case study, so I am also posting both here on my PeopleSoft blog, and also on my Oracle blog.
This video was produced as a part of the preparation for this session.  The slide deck is also available on my website.

Learning about and understanding the principles and mechanics of the Oracle database is fundamentally important for both DBAs and developers. It is one of the reasons we still physical conferences.
This presentation tells the story of a performance tuning project for the GL reporting on a Financials system on an engineered system. It required various techniques and features to be brought to bear. Having a theoretical understanding of how the database and various features work allowed us to make reasonable predictions about whether they would be effective in our environment. Some ideas were discounted, some were taken forward.
We will look at instrumentation, ASH, statistics collection, partitioning, hybrid columnar compression, Bloom filtering, SQL profiles. All of them played a part in the solution, some added further complications that had to be worked around, some had to be carefully integrated with the application, and some required some reconfiguration of the application into order to work properly.
Ultimately, performance improvement is an experimental science, and it requires a similar rigorous thought process.

Friday, October 24, 2014

Minimising Parse Time in Application Engine with ReUseStatement

This article explains how to determine the overhead of using literal values rather than bind variables in SQL submitted by PeopleSoft Application Engine programs. Using a combination of PeopleSoft Application Engine Batch Timings and Oracle Active Session History (ASH), it is possible to determine where it is most effective to change to alter this behaviour by setting the ReUse attribute.

ReUse Statement Flag

I originally wrote about the Performance Benefits of ReUseStatement in Application Engine over 5 years ago, and the problem is no less significant today than it was then.  There are still many places in the delivered PeopleSoft application that would benefit from it.  However, it is not possible to simply set the attribute across all Application Engine programs because it will cause errors in steps where SQL is dynamically generated, so each case must be considered.  Where the ReUse attributed is changed, it must be tested and migrated like any other customisation.

Recently, I have been encountering problems in a number of places on a Financials system which were resolved by enabling ReUseStatement.  So I started by calculating how much time was lost by not setting it.

Application Engine Batch Timings

If an Application Engine step is not reused, then it is compiled prior to every execution during which the Application Engine bind variables are resolved to literals. The number and duration of compilations and executions are reported in the Batch Timings, the production of which are controlled with the TraceAE flag in the Process Scheduler configuration file (psprcs.cfg).  

;-------------------------------------------------------------------------
; AE Tracing Bitfield
;
; Bit       Type of tracing
; ---       ---------------
...
; 128       - Timings Report to AET file
...
; 1024      - Timings Report to tables
...
TraceAE=1152
;------------------------------------------------------------------------

Whatever other TraceAE flags you set, I would always recommend that you set 128 and 1024 so that batch timings are always emitted to both file and database.  The overhead of enabling them is negligible because they are managed in memory at run time and physically written once when the Application Engine program ends.

NB: The settings in the configuration file can be overridden by command line options.  If you specify -TRACE parameter in the Process Scheduler definitions remember to also set these flags.

Batch timings are written to the AE Trace file at end of an Application Engine program, and to PS_BAT_TIMINGS PeopleTools tables at the successful end of an Application Engine program. 
It can be useful to have batch timings in the trace file of an Application Engine that failed because they are not written to the database.  As your system runs, you will build up batch timings for all of your successful Application Engine processes (which will be most of them.  This is a useful source of performance metrics.

Compilations, Execution and ReUse

In this example, the number of compilations is equal to the number of executions because ReUseStatement is not set.

                          PeopleSoft Application Engine Timings
                              (All timings in seconds)

                               C o m p i l e    E x e c u t e    F e t c h        Total
SQL Statement                  Count   Time     Count   Time     Count   Time     Time
------------------------------ ------- -------- ------- -------- ------- -------- --------
99XxxXxx.Step02.S                 8453      2.8    8453    685.6       0      0.0    688.4
...

With ReUse Statement enabled, there is now only a single compilation, and most of the time is saved in execution not compilation.

                               C o m p i l e    E x e c u t e    F e t c h        Total
SQL Statement                  Count   Time     Count   Time     Count   Time     Time
------------------------------ ------- -------- ------- -------- ------- -------- --------
99XxxXxx.Step02.S                    1      0.0    8453    342.3       0      0.0    342.3
...

So we can use the batch timings to identify steps where ReUseStatement is not set because they have as many compilations as executions, and then we can profile the top statements.

 Profile Compilations

This query produces a simple profile of batch timings for statements. 
  • In sub-query x it extract batch timings for statements with more than one compilation in processes that ended in the last 7 days.
  • There is a long-standing bug in batch timings where negative timings can be returned when the clock that returns milliseconds recycles back to zero every 216 milliseconds.  Sub-query y calculates an adjustment that is applied in sub-query z if the timing is negative.
  • Arbitrarily, I am only looking at statements with more than a total of 10000 compilations.

REM ReUseCand.sql
REM (c)Go-Faster COnsultancy Ltd. 2014
COLUMN detail_id FORMAT a32
COLUMN step_time FORMAT 999990 HEADING 'AE|Step|Secs'
COLUMN compile_count HEADING 'AE|Compile|Count'
COLUMN execute_count HEADING 'AE|Execute|Count'
COLUMN processes HEADING 'Num|Process|Instances'
COLUMN process_name HEADING 'Process|Name'
SPOOL ReUseCand
WITH x AS (
SELECT  l.process_instance, l.process_name
,  l.time_elapsed/1000 time_elapsed
, l.enddttm-l.begindttm diffdttm
,  d.bat_program_name||'.'||d.detail_id detail_id
, d.compile_count, d.compile_time/1000 compile_time
, d.execute_time/1000 execute_time
FROM ps_bat_Timings_dtl d
, ps_bat_timings_log l
WHERE d.process_instance = l.process_instance
AND d.compile_count = d.execute_count 
AND d.compile_count > 1 
AND l.enddttm > SYSDATE-7
), y as (
SELECT  x.*
, GREATEST(0,60*(60*(24*EXTRACT(day FROM diffdttm)
                             +EXTRACT(hour FROM diffdttm))
                             +EXTRACT(minute FROM diffdttm))
                             +EXTRACT(second FROM diffdttm)-x.time_elapsed) delta 
FROM  x)
, z as (
SELECT process_instance, process_name, detail_id
,  CASE WHEN time_elapsed < 0 THEN time_elapsed+delta  
               ELSE time_elapsed END time_elapsed
,  compile_count
, CASE WHEN compile_time < 0 THEN compile_time+delta
      ELSE compile_time END AS compile_time
, CASE WHEN execute_time < 0 THEN execute_time+delta
      ELSE execute_time END AS execute_time
FROM y
), a as (
SELECT  process_name, detail_id
, SUM(compile_time+execute_time) step_time
, SUM(compile_count) compile_count
, COUNT(DISTINCT process_instance) processes
FROM  z
GROUP BY process_name, detail_id)
SELECT * FROM a
WHERE compile_count >= 10000 
ORDER BY step_time DESC
/
SPOOL OFF

So now I have a list of steps with lots of compilations.  I know how long they took, but I don't know how much time I might save by enabling ReUseStatement. That will save some time in Application Engine, but it will also cut down database parse time.  So now I need determine the parse time from ASH data.

Process                                         Step    Compile    Process
Name         DETAIL_ID                          SEcs      Count  Instances
------------ -------------------------------- ------ ---------- ----------
AP_PSTVCHR   AP_PSTCOMMON.ACCT_UPD.Step02.S    12001      40704         10
AP_VCHRBLD   APVEDTMOVE.UPDQTY03.Step03.S       4313      49536         28
FS_VATUPDFS  FS_VATUPDFS.Seq0-b.DJ300-2.S       4057     203704          3
AP_VCHRBLD   APVEDTMOVE.UPDQTY03.Step02.S       2799      49536         28
PC_BI_TO_PC  PC_BI_TO_PC.UPD_PRBI.UPDATE.S      1974      23132         10
FS_VATUPDFS  FS_VATUPDFS.Seq0-a.X0001.D         1960      37368          3
GL_JEDIT_0   FS_CEDT_ECFS.iTSELog.iTSELog.S     1628      13104        519
AP_APY2015   AP_APY2015.V_CREATE.Step14.H       1192      11318         19

This query is based on the previous one, but includes scalar queries on the ASH data for each step.
  • WARNING: This query can run for a long period because it has to scan ASH data for each entry in BAT_TIMINGS_DTL.
  • This time in sub-query x I am looking for a rolling 7-day period up to the last hour, because ASH data will have been flushed to the ASH repository.
  • In sub-query y there are two scalar queries that retrieve the DB time spent on hard parse, and all DB time for each batch timing entry.  These queries count 10 seconds for each distinct sample ID to estimate elapsed time rather than total DB time.
  • The query does not group by process name because one step can be called from many Application Engine programs and I want to aggregate the total time across all of them.

REM ReUseCandASH.sql
REM ReUseCandASH.sql
REM (c)Go-Faster Consultancy Ltd. 2014
COLUMN processes       HEADING 'Num|Process|Instances'
COLUMN process_name    HEADING 'Process|Name'
COLUMN detail_id       FORMAT a32
COLUMN step_time       HEADING 'AE|Step|SEcs'    FORMAT 999990 
COLUMN compile_count   HEADING 'AE|Compile|Count'
COLUMN execute_count   HEADING 'AE|Execute|Count'
COLUMN hard_parse_secs HEADING 'Hard|Parse|Secs' FORMAT 99990
COLUMN ash_secs        HEADING 'DB|Time'         FORMAT 99990
SPOOL ReUseCandASH
WITH x AS (
SELECT  l.process_instance, l.process_name
,  l.time_elapsed/1000 time_elapsed
, l.begindttm, l.enddttm
, l.enddttm-l.begindttm diffdttm
,  d.bat_program_name||'.'||d.detail_id detail_id
, d.compile_count, d.compile_time/1000 compile_time
, d.execute_time/1000 execute_time
FROM ps_bat_timings_dtl d
, ps_bat_timings_log l
WHERE d.process_instance = l.process_instance
AND d.compile_count = d.execute_count
AND d.compile_count > 1
AND l.enddttm >= TRUNC(SYSDATE-7,'HH24')
AND l.enddttm < TRUNC(SYSDATE,'HH24') 
), y as (
SELECT  x.*
, GREATEST(0,60*(60*(24*EXTRACT(day    FROM diffdttm)
                             +EXTRACT(hour   FROM diffdttm))
                             +EXTRACT(minute FROM diffdttm))
                             +EXTRACT(second FROM diffdttm)-x.time_Elapsed) delta
FROM  x)
, z as  (
SELECT process_instance, process_name, detail_id
,  CASE WHEN time_elapsed < 0 THEN time_elapsed+delta 
             ELSE time_elapsed END AS time_elapsed
,  compile_count
, CASE WHEN compile_time < 0 THEN compile_time+delta
      ELSE compile_time END AS compile_time
, CASE WHEN execute_time < 0 THEN execute_time+delta
      ELSE execute_time END AS execute_time
, (
 SELECT 10*COUNT(DISTINCT h.sample_id) 
 FROM psprcsque q
 , dba_hist_snapshot x
  , dba_hist_active_sess_history h
 WHERE q.prcsinstance = y.process_instance
  AND  x.begin_interval_time <= y.enddttm
  AND  X.END_INTERVAL_TIME >= y.begindttm
  AND  h.sample_time between y.begindttm and y.enddttm
  AND  h.SNAP_id = x.SNAP_id
  AND  h.dbid = x.dbid
  AND  h.instance_number = x.instance_number
  AND  h.module = 'PSAE.'|| y.process_name||'.'||q.sessionidnum
  AND  h.action = y.detail_id
 AND h.in_hard_parse = 'Y'
 ) hard_parse_secs
, (
 SELECT 10*COUNT(DISTINCT h.sample_id)
 FROM psprcsque q
 , dba_hist_snapshot x
  , dba_hist_active_sess_history h
 WHERE q.prcsinstance = y.process_instance
  AND  x.begin_interval_time <= y.enddttm
  AND  X.END_INTERVAL_TIME >= y.begindttm
  AND  h.sample_time between y.begindttm and y.enddttm
  AND  h.SNAP_id = X.SNAP_id
  AND  h.dbid = x.dbid
  AND  h.instance_number = x.instance_number
  AND  h.module = 'PSAE.'|| y.process_name||'.'||q.sessionidnum
  AND  h.action = y.detail_id
 ) ash_secs
FROM  y
), a AS (
SELECT  /*process_name ,*/ detail_id
, SUM(compile_time+execute_time) step_time
, SUM(compile_count) compile_count
, COUNT(DISTINCT process_instance) processes
, SUM(hard_parse_secs) hard_parse_secs
, SUM(ash_secs) ash_secs
FROM  z
GROUP BY /*process_name,*/ detail_id)
SELECT  a.* 
FROM  a
WHERE  compile_count >= 10000
ORDER BY step_time DESC
/
spool off

Now we can also see how much time the database is spending on hard parse on each step, and it is this time that is likely to be saved by enabling ReUseStatement.
However, before we can enable the ReUseStatement attribute we must first manually inspect each step in Application Designer and determine whether doing so would break anything.  The Comment column in this profile was added manually as I did that.  Some statements I can change, some I have to accept the overhead.

                                   Step    Compile    Process      Parse         DB
DETAIL_ID                          Secs      Count  Instances       Secs       Time Comment
-------------------------------- ------ ---------- ---------- ---------- ---------- …………………………………………………………………………………………………………………………………
AP_PSTCOMMON.ACCT_UPD.Step02.S    12001      40704         10      11820      11920 Set ReUseStatement
FS_CEDT_ECMB.4EditCDT.uValCDT.S    5531      10289        679        620       5870 Dynamic SQL, can't set ReUseStatement
APVEDTMOVE.UPDQTY03.Step03.S       4306      49471         27       4020       4100 Set ReUseStatement
FS_VATUPDFS.Seq0-b.DJ300-2.S       4057     203704          3       3150       3860 Dynamic SQL, can't set ReUseStatement
FS_CEDT_ECFS.iTSELog.iTSELog.S     3332      19073        716       2130       3520 Dynamic SQL, can't set ReUseStatement
APVEDTMOVE.UPDQTY03.Step02.S       2796      49471         27       2730       2820 Set ReUseStatement
PC_BI_TO_PC.UPD_PRBI.UPDATE.S      1974      23132         10        230       1920 Set ReUseStatement 
FS_VATUPDFS.Seq0-a.X0001.D         1960      37368          3          0          0 Dynamic SQL, can't set ReUseStatement
FS_CEDT_ECMB.4uAnchCT.uAnchCDT.S   1319      10289        679        510       1290 Dynamic SQL, can't set ReUseStatement
AP_APY2015.V_CREATE.Step14.H       1169      11094         19          0          0 Set ReUseStatement
GL_JETSE.GA100.CHKEDT.S            1121      15776        569        860        930 Dynamic SQL, can't set ReUseStatement
FS_CEDT_ECMB.iTSELog.iTSELog.S      988      10289        679        450        990 Dynamic SQL, can't set ReUseStatement
FS_CEDT_ECMB.uMarkVal.uMarkVal.S    711      10289        679         50        670 Dynamic SQL, can't set ReUseStatement
FS_CEDT_ECMB.uMarkInv.uMarkInv.S    668      10289        679         40        790 Dynamic SQL, can't set ReUseStatement
  • Due to a bug in the instrumentation of Application Engine, the session's action attribute is not set for Do Select (type D) and Do When (type H) steps.  ASH data cannot therefore be matched for them.
  • More DB Time is reported for FS_CEDT_ECMB.uMarkInv.uMarkInv.S than is reported by batch timings.  This is a consequence of ASH sampling, where we count 10 seconds for each sample.

Conclusion

Setting ReUseStatement is very simple because it doesn't involve changing SQL, but there are lots of places where it could be set.  This technique picks out the relatively few places where doing so could potentially have a significant effect.

Wednesday, September 03, 2014

Who is using this index?

Or, to put it another way, I want to change or drop this index, who and what will I impact?

The Challenge 

The problem that I am going to outline is certainly not exclusive to PeopleSoft, but I am going to illustrate it with examples from PeopleSoft. I often find tables with far more indexes than are good for them.
  • The Application Designer tool makes it very easy for developers to add indexes to tables. Sometimes, too easy!
  • Sometimes, DBAs are too quick to unquestioningly follow the advice of the Oracle tuning advisor to add indexes.
Recently, I have been working on 3 different PeopleSoft Financials systems where I have found major tables with a host of indexes.

There are several concerns:
  • Indexes are maintained during data modification. The more indexes you have, the greater the overhead. 
  • The more indexes you have, particularly if they lead on the same columns, the more likely Oracle is to use the wrong one, resulting in poorer performance.
  • There is of course also a space overhead for each index, but this is often of less concern. 
If you can get rid of an index, Oracle doesn't store, maintain or use it. 

In some cases, I have wanted to remove unnecessary indexes, and in others to adjust indexes. However, this immediately raises the question of where are these indexes used, and who will be impacted by the change. Naturally, I turn to the Active Session History (ASH) to help me find the answers. 

Index Maintenance Overhead during DDL 

ASH reports the object number, file number, block number, and (from 11g) row number within the block being accessed by physical file operations. However, the values reported in v$active_session_history (and later other views) are not reliable for other events because they are merely left over from the previous file event that reported them. So, we can profile the amount of time spent on physical I/O on different tables and indexes, but not for other forms of DB Time, such as CPU time, spent accessing the blocks in the buffer cache.

Let me take an extreme example from PeopleSoft Global Payroll. The table PS_GP_RSLT_ACUM is one of the principal result tables. It has only a single unique index (with the same name). The table is populated with the simplest of insert statements.
INSERT /*GPPRDMGR_I_ACUM*/ INTO PS_GP_RSLT_ACUM
(CAL_RUN_ID ,EMPLID ,EMPL_RCD ,GP_PAYGROUP ,CAL_ID ,RSLT_SEG_NUM ,PIN_NUM ,EMPL_RCD_ACUM ,ACM_FROM_DT ,ACM_THRU_DT ,USER_KEY1 ,USER_KEY2 ,USER_KEY3 ,USER_KEY4 ,USER_KEY5 ,USER_KEY6 ,SLICE_BGN_DT ,SLICE_END_DT ,COUNTRY ,ACM_TYPE ,ACM_PRD_OPTN ,CALC_RSLT_VAL ,CALC_VAL ,USER_ADJ_VAL ,PIN_PARENT_NUM ,CORR_RTO_IND ,ORIG_CAL_RUN_ID ,SEQ_NUM8 ,VALID_IN_SEG_IND ,CALLED_IN_SEG_IND )
VALUES
(:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15,:16,:17,:18,:19,:20,:21,:22,:23,:24,:25,:26,:27,:28,:29,:30)
I can profile the ASH data for just this statement over the last week on a production system. Note that DBA_OBJECTS and DBA_DATA_FILES are outer joined to the ASH data and only matched for events like 'db file%'
SELECT o.object_type, o.object_name
,      f.tablespace_name, NVL(h.event,'CPU+CPU Wait') event
,      SUM(10) ash_Secs
FROM dba_hist_Active_sess_history h
  LEFT OUTER JOIN dba_objects o
    ON o.object_id = h.current_obj#
   AND h.event like 'db file%'
  LEFT OUTER JOIN dba_data_files f
    ON f.file_id = h.current_file#
   AND h.event like 'db file%'
WHERE h.sql_id = '4ru0618dswz3y'
AND   h.sample_time >= sysdate-7
GROUP BY o.object_type, o.object_name, h.event, f.tablespace_name
ORDER BY ash_secs DESC
/
A full payroll calculation inserts over 3 million rows on this particular system. The calculation is run incrementally several times per week during which old rows are deleted and newly recalculated rows inserted.  Looking at just this insert statement:
  • 30% of the time is spent on CPU operations, we cannot profile that time further with ASH.
  • 38% of the time is spent reading from the table and index, yet this is a simple INSERT … VALUES statement.
OBJECT_TYPE         OBJECT_NAME        TABLESPACE_NAME EVENT                      ASH_SECS
------------------- ------------------ --------------- ------------------------ ----------
                                                       CPU+CPU Wait                   1040
                                       UNDOTBS1        db file sequential read         900
INDEX SUBPARTITION  PS_GP_RSLT_ACUM    GP201408IDX     db file sequential read         750
TABLE SUBPARTITION  PS_GP_RSLT_ACUM    GP201408TAB     db file sequential read         550
                                                       gc current grant 2-way           70
                                                       cursor: pin S wait on X          60
                                                       db file sequential read          10
                                                       buffer exterminate               10
                                                       row cache lock                   10
                                                                                ----------
                                                                                      3400 
More time is spent reading the index than the table.  That is not a surprise.  When you insert a row into a table, you also insert it into the index. Rows in index leaf blocks are ordered by the key columns, and the new entry must go into the right place, so you have to read down the index from the root block, through the branch blocks, to find the correct leaf block for the new entry.
[Digression: Counter-intuitively index compression can improve DML performance. It does for this index.  The overhead of the compression processing can be outweighed by the savings in physical I/O.  It depends.]

Profile Physical I/O by Object 

I can twist this query around and profile DB_TIME by object for 'db file%' events
SELECT o.object_type, o.object_name, sum(10) ash_secs
FROM   dba_hist_active_sess_history h
,      dba_objects o
WHERE  o.object_id = h.current_obj#
AND    h.event LIKE 'db file%'
AND    h.sample_time > sysdate-7
GROUP  BY o.object_type, o.object_name
ORDER BY ash_Secs DESC
Now I can see upon which objects the most time is spent on physical I/O.
OBJECT_TYP OBJECT_NAME          ASH_SECS
---------- ------------------ ----------
TABLE      PS_ITEM                101130
INDEX      PS_WS_ITEM              98750
TABLE      PS_PROJ_RESOURCE        97410
TABLE      PS_BI_LINE              85040
INDEX      PSAPSAPMSGSUBCON        75070
TABLE      PS_BI_HDR               37230
TABLE      PS_RS_ASSIGNMENT        29460
INDEX      PS_PSAPMSGPUBHDR        23230
INDEX      PS_BI_ACCT_ENTRY        21490
TABLE      PS_VOUCHER              21330
TABLE      PS_VCHR_ACCTG_LINE      21250
TABLE      PS_BI_ACCT_ENTRY        18860
…
                              ----------
sum                              1382680
This is a worthwhile exercise, it shows the sources of physical I/O in an application.

However, if you want to find where an index is used, then this query will also identify SQL_IDs where the index is either used in the query or maintained by DML. If I am interested in looking for places where changing or deleting an index could have an impact then I am only interested in SQL query activity. ASH samples that relate to index maintenance are a false positive. Yet, I cannot simply eliminate ASH samples where the SQL_OPNAME is not SELECT because the index may be used in a query within the DML statement.

Another problem with this method is that it matches SQL to ASH by object ID. If someone has rebuilt an index, then its object number changes.

A different approach is required.

Index Use from SQL Plans Captured by AWR 

During an AWR snapshot the top-n SQL statements by each SQL criteria in the AWR report (Elapsed Time, CPU Time, Parse Calls, Shareable Memory, Version Count) , see dbms_workload_repository. The SQL plans are exposed by the view DBA_HIST_SQL_PLAN.

On PeopleSoft systems, I generally recommend decreasing the snapshot interval from the default of 60 minutes to 15. The main reason is that SQL gets aged out of the library cache very quickly in PeopleSoft systems. They generate lots of dynamic code, often with literal values rather than bind variables. Cursor sharing is not recommended for PeopleSoft, so different bind variables result in different SQL_IDs. The dynamic code also results in different SQL IDs even with cursor sharing. Therefore, increasing the snapshot frequency means that will capture more SQL statements. This will increase the total volume of the AWR repository simply because there are more snapshots. However, the overall volume of ASH data captured does not change, it just gets copied to the repository earlier.

On DBA_HIST_SQL_PLAN the object ID, owner, type, and name are recorded, so I can find the plans which referenced a particular object. I am going to carry on with the example from a PeopleSoft Financials system and look at indexes on the PS_PROJ_RESOURCE table.

These are some of the indexes on PS_PROJ_RESOURCE. We have 4 indexes that all lead on PROCESS_INSTANCE. I suspect that not all are essential, but I need to work out what is using them, and which one I should retain.
                          Col
INDEX_NAME                Pos COLUMN_NAME          COLUMN_EXPRESSION
------------------ ---------- -------------------- ----------------------------------
…
PSJPROJ_RESOURCE            1 PROCESS_INSTANCE
                            2 BUSINESS_UNIT_GL
                            3 BUSINESS_UNIT
                            4 PROJECT_ID
                            5 ACTIVITY_ID
                            6 CUST_ID

PSLPROJ_RESOURCE            1 PROCESS_INSTANCE
                            2 EMPLID
                            3 EMPL_RCD
                            4 TRANS_DT

PSMPROJ_RESOURCE            1 PROCESS_INSTANCE
                            2 BUSINESS_UNIT
                            3 PROJECT_ID
                            4 ACTIVITY_ID
                            5 RESOURCE_ID

PSNPROJ_RESOURCE            1 PROCESS_INSTANCE
                            2 BUSINESS_UNIT
                            3 TIME_RPTG_CD
…
I find it easier to extract the ASH data to my own working storage table. For each index on PS_PROJ_RESOURCE, I am going to extract a distinct list of plan hash values. I will then extract all ASH data for those plans. Note, that I have not joined the SQL_ID on DBA_HIST_SQL_PLAN. That is because different SQL_IDs can produce the same execution plan. The plan is equally valid for all SQL_IDs that produce the plan, not just the one where the SQL_ID also matches.
DROP TABLE my_ash purge
/
CREATE TABLE my_ash COMPRESS AS
WITH p AS (
  SELECT DISTINCT p.plan_hash_value, p.object#, p.object_owner, p.object_type, p.object_name
  FROM      dba_hist_sql_plan p
  WHERE     p.object_name like 'PS_PROJ_RESOURCE'
 AND p.object_type LIKE 'INDEX%'
  AND p.object_owner = 'SYSADM'
 )
SELECT p.object# object_id, p.object_owner, p.object_type, p.object_name
,      h.*
FROM   dba_hist_active_sess_history h
,      p
WHERE  h.sql_plan_hash_value = p.plan_hash_value
/ 
I am fortunate that PeopleSoft is a well-instrumented application. Module and Action are set to fairly sensible values that will tell me the whereabouts in the application to which the ASH sample relates. In the following query, I have omitted any ASH data generated by SQL*Plus, Toad, or SQL Developer, and also any generated by Oracle processes to prevent statistics collection jobs from being included.
Set pages 999 lines 150 trimspool on
break on object_name skip 1 
compute sum of ash_secs on object_name
column ash_secs heading 'ASH|Secs' format 9999999
column module format a20
column action format a32
column object_name format a18
column max_sample_time format a19 heading 'Last|Sample'
column sql_plans heading 'SQL|Plans' format 9999
column sql_execs heading 'SQL|Execs' format 99999
WITH h AS (
  SELECT   object_name
  ,  CASE WHEN h.module IS NULL THEN REGEXP_SUBSTR(h.program,'[^.@]+',1,1)
                 WHEN h.module LIKE 'PSAE.%' THEN REGEXP_SUBSTR(h.module,'[^.]+',1,2)
                         ELSE REGEXP_SUBSTR(h.program,'[^.@]+',1,1)
   END as module
  ,  CASE WHEN h.action LIKE 'PI=%' THEN NULL
        ELSE h.action
                    END as action
  , CAST(sample_time AS DATE) sample_time
  ,  sql_id, sql_plan_hash_value, sql_exec_id
  FROM    my_ash h
)
SELECT  object_name, module, action
, sum(10) ash_secs
, COUNT(DISTINCT sql_plan_hash_value) sql_plans
, COUNT(DISTINCT sql_id||sql_plan_hash_value||sql_exec_id) sql_execs
, MAX(sample_time) max_sample_time
FROM     h
WHERE NOT LOWER(module) IN('oracle','toad','sqlplus','sqlplusw')
AND       NOT LOWER(module) LIKE 'sql%'
GROUP BY object_name, module, action
ORDER BY SUBSTR(object_name,4), object_name, ash_Secs desc
/
Spool off
I now have a profile of how much each index is used. In this particular case, I found something using every index.  It is possible that you will not find anything that uses some indexes.
                                                                             ASH   SQL    SQL Last
OBJECT_NAME        MODULE               ACTION                              Secs Plans  Execs Sample
------------------ -------------------- -------------------------------- ------- ----- ------ -------------------
…
PSJPROJ_RESOURCE   PC_TL_TO_PC          GFCPBINT_AE.CallmeA.Step24.S        7300     1     66 06:32:57 27/08/2014
                   PC_PRICING           GFCPBINT_AE.CallmeA.Step24.S          40     1      2 08:38:57 22/08/2014
******************                                                       -------
sum                                                                         7340

PSLPROJ_RESOURCE   PC_TL_TO_PC          GFCPBINT_AE.CallmeA.Step28.S        1220     1     53 06:33:17 27/08/2014
******************                                                       -------
sum                                                                         1220

PSMPROJ_RESOURCE   PC_TL_TO_PC          GFCPBINT_AE.XxBiEDM.Step07.S          60     2      6 18:35:18 20/08/2014
******************                                                       -------
sum                                                                           60

PSNPROJ_RESOURCE   PC_TL_TO_PC          GFCPBINT_AE.CallmeA.Step26.S        6720     1     49 18:53:58 26/08/2014
                   PC_TL_TO_PC          GFCPBINT_AE.CallmeA.Step30.S        3460     1     60 06:33:27 27/08/2014
                   GFCOA_CMSN           GFCOA_CMSN.01INIT.Step01.S          2660     1     47 19:19:40 26/08/2014
                   PC_TL_TO_PC          GFCPBINT_AE.CallmeA.Step06.S        1800     1     52 18:53:28 26/08/2014
                   PC_TL_TO_PC          GFCPBINT_AE.CallmeG.Step01.S        1740     1     61 06:34:17 27/08/2014
                   PC_TL_TO_PC          GFCPBINT_AE.CallmeA.Step02.S        1680     1     24 18:53:18 26/08/2014
                   PC_TL_TO_PC          GFCPBINT_AE.CallmeA.Step10.S        1460     1     33 17:26:26 22/08/2014
                   PC_TL_TO_PC          GFCPBINT_AE.CallmeA.Step08.S         920     1     26 17:26:16 22/08/2014
                   PC_TL_TO_PC          GFCPBINT_AE.CallmeA.Step36.S         460     1     18 18:26:38 20/08/2014
                   PC_TL_TO_PC          GFCPBINT_AE.CallmeA.Step09.S         420     1     16 06:33:07 27/08/2014
                   PC_PRICING           GFCPBINT_AE.CallmeG.Step01.S         200     1     10 08:09:55 22/08/2014
                   PC_AP_TO_PC          GFCPBINT_AE.CallmeH.Step00A.S        170     1     17 21:53:26 21/08/2014
                   PC_PRICING           GFCPBINT_AE.CallmeA.Step36.S          20     1      1 08:02:46 05/08/2014
                   PC_PRICING           GFCPBINT_AE.CallmeA.Step30.S          20     1      1 13:42:48 04/08/2014
                   PC_PRICING           GFCPBINT_AE.CallmeA.Step06.S          20     1      1 15:58:35 28/07/2014
                   PC_TL_TO_PC          GFCPBINT_AE.CallmeA.Pseudo.S          20     1      1 19:45:11 06/08/2014
******************                                                       -------
sum                                                                        21770
…
The next stage is to look at individual SQL statements This query looks for which SQL statement is using a particular index on PROJ_RESOURCE. If I can't find the SQL that cost the most time, then just choose another SQL with the same plan
  • I have found that sometimes a plan is captured by AWR, but the SQL statement is not. Personally, I think that is a bug. Working around it has made the following query quite complicated.
Break on object_name skip 1 
column ash_secs heading 'ASH|Secs' format 9999999
Set long 50000
Column cmd Format a200
Spool dmk

WITH h AS (
  SELECT  h.object_name
  ,       CASE WHEN h.module IS NULL THEN REGEXP_SUBSTR(h.program,'[^.@]+',1,1)
               WHEN h.module LIKE 'PSAE.%' THEN REGEXP_SUBSTR(h.module,'[^.]+',1,2)
               ELSE REGEXP_SUBSTR(h.program,'[^.@]+',1,1)
          END as module
  ,       CASE WHEN h.action LIKE 'PI=%' THEN NULL
               ELSE h.action
          END as action
  ,       h.sql_id, h.sql_plan_hash_value
  ,       t.command_type –-not null if plan and statement captured
  FROM    my_ash h
  LEFT OUTER JOIN (
    SELECT t1.*
    FROM dba_hist_sqltext t1
    ,    dba_hist_sql_plan p1
   WHERE t1.sql_id = p1.sql_id
    AND p1.id = 1
    ) t
  ON   t.sql_id = h.sql_id
   AND  t.dbid = h.dbid
  WHERE   h.object_name IN('PSMPROJ_RESOURCE')
  AND     h.object_Type = 'INDEX'
  AND     h.object_owner = 'SYSADM'
  And     NOT LOWER(module) IN('oracle','toad','sqlplus','sqlplusw')
  AND     NOT LOWER(module) LIKE 'sql%'
), x AS ( --aggregate DB time by object and statement
SELECT    object_name, sql_id, sql_plan_hash_value
, sum(10) ash_secs
,  10*COUNT(command_type) sql_secs  --DB time for captured statements only
FROM      h
WHERE NOT LOWER(module) IN('oracle','toad','sqlplus','sqlplusw')
AND       NOT LOWER(module) LIKE 'sql%'
GROUP BY  object_name, sql_id, sql_plan_hash_value
), y AS ( --rank DB time per object and plan
SELECT  object_name, sql_id, sql_plan_hash_value
,  ash_secs
, SUM(ash_secs) OVER (PARTITION BY object_name, sql_plan_hash_value) plan_ash_secs
, row_number() OVER (PARTITION BY object_name, sql_plan_hash_value ORDER BY sql_Secs DESC) ranking
FROM x
), z AS (
SELECT object_name
, CASE WHEN t.sql_text IS NOT NULL THEN y.sql_id
       ELSE (SELECT t1.sql_id
             FROM   dba_hist_sqltext t1
             ,      dba_hist_sql_plan p1
             WHERE  t1.sql_id = p1.sql_id
             AND    p1.plan_hash_value = y.sql_plan_hash_value
             AND    rownum = 1) --if still cannot find statement just pick any one
  END AS sql_id
, y.sql_plan_hash_value, y.plan_ash_secs
, CASE WHEN t.sql_text IS NOT NULL THEN t.sql_text 
       ELSE (SELECT t1.sql_Text
             FROM   dba_hist_sqltext t1
             ,      dba_hist_sql_plan p1
             WHERE  t1.sql_id = p1.sql_id
             AND    p1.plan_hash_value = y.sql_plan_hash_value
             AND    rownum = 1) --if still cannot find statement just pick any one
  END AS sql_text
from y
 left outer join dba_hist_sqltext t
 on t.sql_id = y.sql_id
WHERE ranking = 1 --captured statement with most time
)
SELECT *
--'SELECT * FROM table(dbms_xplan.display_awr('''||sql_id||''','||sql_plan_hash_value||',NULL,''ADVANCED''))/*'||object_name||':'||plan_ash_Secs||'*/;' cmd
FROM z
ORDER BY object_name, plan_ash_secs DESC
/
Spool off
So now I can see the individual SQL statements.
PSJPROJ_RESOURCE   f02k23bqj0xc4          3393167302          7340 UPDATE PS_PROJ_RESOURCE C SET (C.Operating_Unit, C.CHARTFIELD1, C.PRODUCT, C.CLA
                                                                   SS_FLD, C.CHARTFIELD2, C.VENDOR_ID, C.contract_num, C.contract_line_num, …

PSLPROJ_RESOURCE   2fz0gcb2774y0           821236869          1220 UPDATE ps_proj_resource p SET p.deptid = NVL (( SELECT j.deptid FROM ps_job j WH
                                                                   ERE j.emplid = p.emplid AND j.empl_rcd = p.empl_rcd AND j.effdt = ( SELECT MAX (…

PSMPROJ_RESOURCE   96cdkb7jyq863           338292674            50 UPDATE PS_GFCBI_EDM_TA04 a SET a.GFCni_amount = ( SELECT x.resource_amount FROM
                                                                   PS_PROJ_RESOURCE x WHERE x.process_instance = …

                   1kq9rfy8sb8d4          4135884683            10 UPDATE PS_GFCBI_EDM_TA04 a SET a.GFCni_amount = ( SELECT x.resource_amount FROM
                                                                   PS_PROJ_RESOURCE x WHERE x.process_instance = …

PSNPROJ_RESOURCE   ga2x2u4jw9p0x          2282068749          6760 UPDATE PS_PROJ_RESOURCE P SET (P.RESOURCE_TYPE, P.RESOURCE_SUB_CAT) = …

                   9z5qsq6wrr7zp          3665912247          3500 UPDATE PS_PROJ_RESOURCE P SET P.TIME_SHEET_ID = …
If I replace the last select clause with the commented line, then I can generate the commands to extract the statement and plan from the AWR repository.
SELECT * FROM table(dbms_xplan.display_awr('45ggt0yfrh5qp',3393167302,NULL,'ADVANCED'))/*PSJPROJ_RESOURCE:7340*/;

SELECT * FROM table(dbms_xplan.display_awr('8ntxj3694r6kg',821236869,NULL,'ADVANCED'))/*PSLPROJ_RESOURCE:1220*/;

SELECT * FROM table(dbms_xplan.display_awr('96cdkb7jyq863',338292674,NULL,'ADVANCED'))/*PSMPROJ_RESOURCE:50*/;

SELECT * FROM table(dbms_xplan.display_awr('1kq9rfy8sb8d4',4135884683,NULL,'ADVANCED'))/*PSMPROJ_RESOURCE:10*/;

SELECT * FROM table(dbms_xplan.display_awr('ga2x2u4jw9p0x',2282068749,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:6760*/;
SELECT * FROM table(dbms_xplan.display_awr('9z5qsq6wrr7zp',3665912247,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:3500*/;
SELECT * FROM table(dbms_xplan.display_awr('b28btd6k3x8jt',1288409804,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:3060*/;
SELECT * FROM table(dbms_xplan.display_awr('avs70c19khxmw',2276811880,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:2660*/;
SELECT * FROM table(dbms_xplan.display_awr('b78qhsch85g4a',1019599680,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:1960*/;
SELECT * FROM table(dbms_xplan.display_awr('65kq2v1ubybps',3138703971,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:1820*/;
SELECT * FROM table(dbms_xplan.display_awr('1dj17ra70c801',1175874548,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:1460*/;
SELECT * FROM table(dbms_xplan.display_awr('3w71v896s7m5d',3207074729,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:500*/;
SELECT * FROM table(dbms_xplan.display_awr('35mz5bw7p5ubw',2447377432,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:170*/;
Ultimately, I have needed to look through the SQL plans that use an index to decide whether I need to keep that index or decide whether the statement would perform adequately using another index. In this case, on this particular system, I think the index PSMPROJ_RESOURCE would be adequate for this statement, and I would consider dropping PSLPROJ_RESOURCE.
>SELECT * FROM table(dbms_xplan.display_awr('8ntxj3694r6kg',821236869,NULL,'ADVANCED'))/*PSLPROJ_RESOURCE:1220*/;
--------------------
UPDATE ps_proj_resource p SET p.deptid = NVL (( SELECT j.deptid FROM
ps_job j WHERE j.emplid = p.emplid AND j.empl_rcd = p.empl_rcd AND
j.effdt = ( SELECT MAX (j1.effdt) FROM ps_job j1 WHERE j1.emplid =
j.emplid AND j1.empl_rcd = j.empl_rcd AND j1.effdt <= p.trans_dt) AND
j.effseq = ( SELECT MAX (j2.effseq) FROM ps_job j2 WHERE j2.emplid =
j.emplid AND j2.empl_rcd = j.empl_rcd AND j2.effdt = j.effdt)),
p.deptid ) 
WHERE p.process_instance = … 
AND EXISTS ( SELECT
j.deptid FROM ps_job j WHERE j.emplid = p.emplid AND j.empl_rcd =
p.empl_rcd AND j.effdt = ( SELECT MAX (j1.effdt) FROM ps_job j1 WHERE
j1.emplid = j.emplid AND j1.empl_rcd = j.empl_rcd AND j1.effdt <=
p.trans_dt) AND j.effseq = ( SELECT MAX (j2.effseq) FROM ps_job j2
WHERE j2.emplid = j.emplid AND j2.empl_rcd = j.empl_rcd AND j2.effdt =
j.effdt))

Plan hash value: 821236869

-----------------------------------------------------------------------------------------
| Id  | Operation            | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT     |                  |       |       | 63104 (100)|          |
|   1 |  UPDATE              | PS_PROJ_RESOURCE |       |       |            |          |
|   2 |   INDEX RANGE SCAN   | PSLPROJ_RESOURCE |   365 | 11315 |    22   (0)| 00:00:01 |
|   3 |    INDEX RANGE SCAN  | PSAJOB           |     1 |    23 |     3   (0)| 00:00:01 |
|   4 |     SORT AGGREGATE   |                  |     1 |    20 |            |          |
|   5 |      INDEX RANGE SCAN| PSAJOB           |     1 |    20 |     3   (0)| 00:00:01 |
|   6 |     SORT AGGREGATE   |                  |     1 |    23 |            |          |
|   7 |      INDEX RANGE SCAN| PSAJOB           |     1 |    23 |     3   (0)| 00:00:01 |
|   8 |   INDEX RANGE SCAN   | PSAJOB           |     1 |    29 |     3   (0)| 00:00:01 |
|   9 |    SORT AGGREGATE    |                  |     1 |    20 |            |          |
|  10 |     INDEX RANGE SCAN | PSAJOB           |     1 |    20 |     3   (0)| 00:00:01 |
|  11 |    SORT AGGREGATE    |                  |     1 |    23 |            |          |
|  12 |     INDEX RANGE SCAN | PSAJOB           |     1 |    23 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
…
I carried on with the examination of SQL statements and execution plans to determine whether each index is really needed or another index (or even no index at all) would do as well.  This decision also requires some background knowledge about the application. Eventually, I decided that I want to drop the J, L, and N indexes on PROJ_RESOURCE and just keep M. 

Limitations of Method

AWR does not capture all SQLs, nor all SQL plans. First the SQL has to be in the library cache and then it must be one of the top-n. A SQL that is efficient because it uses an appropriate index may not be captured, and will not be detected by this approach. This might lead you to erronously believe that the index could be dropped.
ASH data is purged after a period of time, by default 31 days. If an index is only used by a process that has not run within the retention period, then it will not be detected by this approach. This is another reason to retain ASH and AWR in a repository for a longer period. I have heard 400 days suggested so that you have ASH for a year and a month.
  • However, this also causes the SYSAUX tablespace to become very large, so I would suggest regularly moving the data to a separate database. I know one customer who has built a central AWR repository for all their production and test databases and automated regular transfer of data. That repository has been of immense diagnostic value.
[Update] This analysis will not detect index use in support of constraint validation (PeopleSoft doesn't use database referential integrity constraints).  As Mark Farnham points out below, that may be a reason for retaining a particular index.

Getting Rid of Indexes 

Obviously, any index changes need to be tested carefully in all the places that reference the index, but on the other hand, it is not viable to do a full regression test every time you want to change an index.
Therefore, if all the testing is successful and you decide to go ahead and drop the index in production, you might prefer to make it invisible first for a while before actually dropping it. It is likely that the indexes you choose to examine are large and will take time to rebuild. An invisible index will not be used by the optimizer, but it will continue to be maintained during DML. If there are any unfortunate consequences, you can immediately make the index visible without having to rebuild it.

Thursday, November 04, 2010

PeopleTools 8.50 uses DBMS_APPLICATION_INFO to Identify Database Sessions

I recently worked on a PeopleTools 8.50 system in production for the first time and was able to make use of the new Oracle specific instrumentation in PeopleTools.

PeopleTools now uses the DBMS_APPLICATION_INFO package to set module and action session attributes.  This data is then copied into the Active Session History (ASH).
  • Within the PIA, the application server sets module and action to the name of the current component and page within the current components
  • For Integration Broker messages they are set to service and queue name.
  • For Application Engine processes the module is set to PSAE and the action is set to the program name.
The first two of these three changes are very welcome, but I think the instrumentation of Application Engine is inadequate. I have a better suggestion which can be implemented with a database trigger.

Enterprise Manager
In Oracle Enterprise Manager, I can see the top SQL statements and group them by Module and Action, so I now can see which components are consuming the most time.  I can also produce an AWR for a specific component or page within a component.

OEM Screenshot of a PeopleTools 8.50 system.

I can query the ASH data to profile which pages consume the most time on the database.

MODULE                 ACTION                   ASH_SECS 
---------------------- ---------------------- ---------- 
RECV_PO                PO_PICK_ORDERS                240 
XXX_REQ_INQUIRY        xyzzy                         170 
XXX_REQ_WRKLST         XXX_REQ_WORKLIST              170 
VCHR_EXPRESS           VCHR_LINE_RECV_WRK            170 
XXX_FIN_WORKLIST       XXX_FIN_WORKLIST              160 
VCHR_EXPRESS           VCHR_EXPRESS1                 160 
PURCHASE_ORDER_EXP     PO_EXPRESS                    140 
XXX_HOME_PAGE          XXX_HOME_PAGE                 140 
RECV_PO                RECV_WPO                      130 
VCHR_EXPRESS           xyzzy                         120 
XXX_PUR_WORKLIST       XXX_PUR_WRKLST                120 
CDM_RPT                CDM_RPT_INDEX                 100 
…
----------
sum                                                  2820

(Updated 19.11.2010) For some components the action is set to ‘xyzzy’. This seems to be a default value set when the component is opened, but before any of the pages are processed.  Therefore, it refers to activity in the search dialogue, including processing of :
  • look ups to obtain values for search criteria
  • SQL issued during SearchSave PeopleCode to validate the search criteria.
  • the query on the Component Search record
Batch Processes
Now, I want to look at what happens in batch processes. In previous versions of PeopleTools, the module was set to the same value as the program name, and action was left blank. This is not particularly helpful. In 8.50 module is set to PSAE for Application Engine processes, and action is set to the name of the program.

MODULE                       ACTION                      ASH_SECS
---------------------------- ------------------------- ----------
PSNVS.EXE                                                   10220
PSAE                         AP_PSTPYMNT                     3100
PSAE                         AP_MATCH                        2690
sqrw.exe                                                     1770
PSAE                         PO_RECVACCR                     1390
PSQRYSRV.exe                                                  880
PSAE                         FS_STREAMLN                      870
PSPUBDSP.exe                                                  850
PSBRKDSP.exe                                                  740
PSPRCSRV.exe                                                  690
PSSUBDSP.exe                                                  620

This is certainly better than in previous versions. However, its weakness is that if I have multiple concurrent instances of the same process, although I could tell that the ASH data had come from different sessions, I would not be determine which came from which session. This situation could occur, for example, in Global Payroll if ‘streaming’ was configured, where the payroll calculation can be broken into many processes that run concurrently.

In another blog posting, Using Oracle Enterprise Manager (Grid Control) with PeopleSoft, I proposed a trigger that writes name of the PeopleSoft process name to module, and the Process Instance number into action. Furthermore, this trigger works for all processes scheduled by the Process Scheduler.  I think that that trigger is still useful in PeopleTools 8.50.