Monday, May 26, 2008

Performance Tuning the Performance Monitor Archive Process

I have been working with a PeopleSoft Performance Monitor (PPM) system that is collecting data for 7 monitored systems. Three of those systems are very active. We want to keep data for a rolling 7 days. We reached the point where the PPM archive process could not rows as fast as they were arriving! I have, of course, reduced the sample frequency of events and transactions, but the PPM is receiving between 20 and 30 million rows of history data per week.

It is also worth remarking here that since the application servers, web servers and Process Schedulers were moved to commodity Intel hardware there are now a larger number of these components, and PPM is receiving more event data.

I have taken a look inside the PPM archive process. PPM history data is removed by a PeopleCode program (PSPM_ARCHIVE.ARCHIVE.GBL.default.ARCPCODE.OnExecute) that is called from the Application Engine.

Rows of transaction date to be archived are identified by the following SQL (and there is a similar one for event data):

&TransHistSQL.Open("SELECT X.PM_INSTANCE_ID, X.PM_TRANS_DEFN_SET, X.PM_TRANS_DEFN_ID, X.PM_AGENTID, X.PM_TRANS_STATUS, X.OPRID, X.PM_PERF_TRACE, X.PM_CONTEXT_VALUE1, X.PM_CONTEXT_VALUE2, X.PM_CONTEXT_VALUE3, X.PM_CONTEXTID_1, X.PM_CONTEXTID_2, X.PM_CONTEXTID_3, X.PM_PROCESS_ID, %DateTimeOut(X.PM_AGENT_STRT_DTTM), %DateTimeOut(X.PM_MON_STRT_DTTM), X.PM_TRANS_DURATION, X.PM_PARENT_INST_ID, X.PM_TOP_INST_ID, X.PM_METRIC_VALUE1, X.PM_METRIC_VALUE2, X.PM_METRIC_VALUE3, X.PM_METRIC_VALUE4, X.PM_METRIC_VALUE5, X.PM_METRIC_VALUE6, X.PM_METRIC_VALUE7, X.PM_ADDTNL_DESCR, Z.PM_ARCHIVE_MODE FROM PSPMTRANSHIST X, PSPMAGENT Y, PSPMSYSDEFN Z WHERE X.PM_AGENTID=Y.PM_AGENTID AND Y.PM_SYSTEMID=Z.PM_SYSTEMID AND (Z.PM_ARCHIVE_MODE='1' OR Z.PM_ARCHIVE_MODE='2') AND %DateTimeDiff(X.PM_MON_STRT_DTTM, %CurrentDateTimeIn) >= (PM_MAX_HIST_AGE * 24 * 60)");

Lets look at the last condition in the WHERE clause (in bold). It uses the %DateTimeDiff PeopleCode macro to calculate the age of the row of data. This expands to.

… AND ROUND((CAST(( SYSDATE) AS DATE) - CAST((X.PM_MON_STRT_DTTM) AS DATE)) * 1440, 0) >= (PM_MAX_HIST_AGE * 24 * 60)

However, putting a function around a column prevents the database from using any index on that column. In this case it is not possible to create an function-based index to satisfy this expression because it contains SYSDATE.

The reaction of Oracle's Optimizer is to full scan the PSPMTRANSHIST table. With millions of rows of data in this table, a full scan is simply not going to perform adequately.

Instead, I would rather code something simple, but Oracle specific (and if I am going to do Oracle specific date arithmetic then there is no need to use %CurrentDateTimeIn).

… AND X.PM_MON_STRT_DTTM <= SYSDATE - PM_MAX_HIST_AGE


The PeopleCode can be adjusted to be sensitive to the database platform like this.


/*-- SELECT THE ROWS FROM PSPMTRANSHIST ELIGLIBLE FOR ARCHIVING */
If %DbType = "ORACLE" Then /*dmk 19.5.2008 oracle specific performance tuning*/
&TransHistSQL.Open("SELECT X.PM_INSTANCE_ID, X.PM_TRANS_DEFN_SET, X.PM_TRANS_DEFN_ID, X.PM_AGENTID, X.PM_TRANS_STATUS, X.OPRID, X.PM_PERF_TRACE, X.PM_CONTEXT_VALUE1, X.PM_CONTEXT_VALUE2, X.PM_CONTEXT_VALUE3, X.PM_CONTEXTID_1, X.PM_CONTEXTID_2, X.PM_CONTEXTID_3, X.PM_PROCESS_ID, %DateTimeOut(X.PM_AGENT_STRT_DTTM), %DateTimeOut(X.PM_MON_STRT_DTTM), X.PM_TRANS_DURATION, X.PM_PARENT_INST_ID, X.PM_TOP_INST_ID, X.PM_METRIC_VALUE1, X.PM_METRIC_VALUE2, X.PM_METRIC_VALUE3, X.PM_METRIC_VALUE4, X.PM_METRIC_VALUE5, X.PM_METRIC_VALUE6, X.PM_METRIC_VALUE7, X.PM_ADDTNL_DESCR, Z.PM_ARCHIVE_MODE FROM PSPMTRANSHIST X, PSPMAGENT Y, PSPMSYSDEFN Z WHERE X.PM_AGENTID=Y.PM_AGENTID AND Y.PM_SYSTEMID=Z.PM_SYSTEMID AND (Z.PM_ARCHIVE_MODE='1' OR Z.PM_ARCHIVE_MODE='2') AND X.PM_MON_STRT_DTTM <= SYSDATE - PM_MAX_HIST_AGE


and similarly for event data


/*-- SELECT THE ROWS FROM PSPMEVENTHIST ELIGLIBLE FOR ARCHIVING */
If %DbType = "ORACLE" Then
/*dmk 19.5.2008 - oracle specific performance tuning*/
&EventHistSQL.Open("SELECT X.PM_INSTANCE_ID, X.PM_EVENT_DEFN_SET, X.PM_EVENT_DEFN_ID, X.PM_AGENTID, %DateTimeOut(X.PM_AGENT_DTTM), %DateTimeOut(X.PM_MON_DTTM), X.PM_PROCESS_ID, X.PM_FILTER_LEVEL,X.PM_METRIC_VALUE1, X.PM_METRIC_VALUE2, X.PM_METRIC_VALUE3, X.PM_METRIC_VALUE4, X.PM_METRIC_VALUE5, X.PM_METRIC_VALUE6, X.PM_METRIC_VALUE7, X.PM_ADDTNL_DESCR, Z.PM_ARCHIVE_MODE FROM PSPMEVENTHIST X, PSPMAGENT Y, PSPMSYSDEFN Z WHERE X.PM_AGENTID=Y.PM_AGENTID AND Y.PM_SYSTEMID=Z.PM_SYSTEMID AND (Z.PM_ARCHIVE_MODE='1' OR Z.PM_ARCHIVE_MODE='2') AND X.PM_MON_DTTM <= SYSDATE - PM_MAX_HIST_AGE


The resulting query needs an index on PM_AGENTID and PM_MON_STRT_DTTM (on both PSPMTRANSHIST and PSPMEVENTHIST). There are delivered indexes that lead on the PM_AGENTID and have PM_MON_STRT_DTTM further down, but a dedicated index is beneficial.

2 comments :

Brian said...

Did this change to the code end up being successful for you?

Trying to speed up this job in DB2 and the query takes 120 seconds to purge 1000 rows.

David Kurtz said...

This change works well on Oracle. It removes the function from around an indexed column which permits the index to be used. I expect you would have to write it slightly differently for DB2, but the concept should be the same.