Sunday, October 11, 2009

Aggregating & Purging Batch Timings

Application Engine can collect timing information for the programs being execution. These 'batch timings' can be written to log file and/or tables in the database. I always recommend that this is enabled in all environments. The runtime overhead is very low, and this data is extremely valuable to determine the performance of a system over a period of time, and to identify the pieces of SQL or PeopleCode code that account for the most time. The timing data collected for individual processes can be viewed directly within the Process Monitor component.

The Process Scheduler purge process does not delete batch timings, so this data remains in the database indefinitely, although it can no longer be accessed via the Process Monitor. Over time, on a busy system, a large volume of data can accumulate. In some ways this is a good thing. There are good reasons to purge the Process Scheduler as aggressively as the business will permit.  The batch timings can still be analysed by direct SQL query. However, the sheer volume of data is likely to result in queries that can take quite a while to execute. After a while, you are less likely to be interested in the performance of individual processes, but are more likely to want to aggregate the data. So, it makes sense to hold the data at least partly aggregated.

I have produced a very simple Application Engine program (GFC_TIM_ARCH) to address this problem. This program is available for download from the Go-Faster website as a PeopleTools Application Designer Project. Please note that this process has been written using Oracle RDBMS specific SQL syntax.

  • Application Engine stores batch timing data in three table PS_BAT_TIMINGS_LOG, PS_BAT_TIMINGS_FN and PS_BAT_TIMINGS_DTL.
  • I have created three new tables PS_GFC_TIMINGS_LOG, PS_GFC_TIMINGS_FN and PS_GFC_TIMINGS_DTL.
  • GFC_TIM_ARCH aggregates the data in each of these tables by the day on which the process begin, by the process name, and (where applicable) by the 'detail_id' column. The aggregated data is put into the GFC_TIMINGS% tables, the original data is removed from the BAT_TIMINGS% tables.
  • It finds the earliest three days for which timing data exists that is older than the longest Process Scheduler retention limit. The idea is that the process should be run daily (the delivered 'Daily Purge' recurrence is suitable), but if it doesn't run for some reason it will catch up the next day.
  • Only one instance of WMS_TIM_ARCH is permitted to run concurrently.

On one system, where I tested this process, BAT_TIMINGS_DTL was growing by over 1 million rows per day. This became around 5000 rows per day in GFC_TIMINGS_DTL.

If you have been running with batch timings for a while, then when you first introduce this process you will probably have a large backlog of data to be aggregated and purged. The easiest option is to run this process repeatedly until the data has been processed (possibly using the recurrence that causes a program to run every minute). After the backlog has been cleared the BAT_TIMINGS% tables should be rebuilt or shrunk in order to release the space left in the tables by the deleted rows. This will help queries that scan the BAT_TIMINGS_DTL record, otherwise these scans still need to include the empty rows because they are below the tables High Water Mark.

Once the backlog has been cleared, the GFC_TIM_ARCH process can run daily and, not withstanding variations in the load on the system, the rolling volume of data retained in the BAT_TIMINGS% tables should be fairly constant, and it should not be necessary to rebuild them frequently. Space freed by the daily delete should be used by new rows that are inserted into the table as AE processes run.

No comments :