The PeopleSoft DBA Blog: Management of Long Running PS/Queries Cancelled by Resource Manager CPU Limit

I have written previously about using the Oracle database resource manager to prioritise the allocation of CPU to different processes in a PeopleSoft system. I proposed a sample resource plan that can be used as a starting point to build a resource plan that meets a system's specific objectives and requirements.

see PSFT_PLAN: A Sample Oracle Database Resource Manager Plan for PeopleSoft
The script to implement the sample PSFT_PLAN resource plan is available from GitHub.

This post looks at

How to configure the resource manager to cancel long-running queries,
What happens when it does and what PeopleSoft users experience,
How the system administrators can monitor such queries,
What action could they take?

Configuring SQL Cancellation in a Resource Manager Consumer Group

The sample resource plan, PSFT_PLAN, contains various server consumer groups. It relies upon MODULE and ACTION being set by enabling PeopleSoft instrumentation (EnableAEMonitoring=1) and/or the psftapi_store_prcsinstance trigger on PSPRCSRQST (see Effective PeopleSoft Performance Monitoring),

PSQUERY_BATCH_GROUP

Applies to scheduled PSQUERY Application Engine Programs
Limited to 4 hours on CPU (or estimated at >= 4 hours)

PSQUERY_ONLINE

Applies to queries run online via the PeopleSoft Internet Architecture (PIA).
There is no resource manager limit for this consumer group.

The PIA has a session timeout (default 20 minutes).
The ICQuery Tuxedo service that runs the queries also has a timeout (default 20 minutes)

When the resource manager cancels a SQL call, it simply raises an Oracle error that appears in a PIA message box without further explanation. It is better to let the PIA timeouts handle online queries in a more controlled fashion.
To prevent queries continuing to run on the database after the client has been terminated

Enable Terminated Connection Timeout by setting SQLNET.EXPIRE_TIME to 5 to 10 minutes in SQLNET.ORA.
Set timeout CPU timeout to match PIA/ICQuery timeout. However, if on Exadata only enable SQL Quarantine for scheduled PS/Queries (see A Resource Manager CPU Time Limit for PS/Queries Executed On-line in the PIA)

LOW_LIMITED_GROUP

Applies to SQL*Plus, SQL Developer and Toad.

Limited to 2 hours on CPU (or estimated at >= 2 hours)

Recommendations:

Users should generally be encouraged to schedule queries that will take more than a few minutes to run on the process scheduler.
Resist the temptation to increase either the PIA inactivity timeout or ICQuery service timeout from the delivered setting of 20 minutes.

Plan Directives

Plan directives are created with DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE.

  DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
    'PSFT_PLAN', 'PSQUERY_BATCH_GROUP'
    ,mgmt_p6 => 1
    ,switch_group => 'CANCEL_SQL'
    ,switch_time => 14400
    ,switch_estimate => TRUE 
    ,switch_for_call => TRUE
  );

  DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
    'PSFT_PLAN', 'PSQUERY_ONLINE_GROUP'
    ,mgmt_p6 => 90
  );

  DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
    'PSFT_PLAN', 'LOW_LIMITED_GROUP'
    ,mgmt_p8 => 1
    ,switch_group => 'CANCEL_SQL'
    ,switch_time => 7200
    ,switch_elapsed_time => 7200
    ,switch_estimate => TRUE 
    ,switch_for_call => TRUE
  );

Four parameters control cancellation behaviour.

SWITCH_GROUP specifies the consumer group to which the session is switched when a switch condition is met. If the group switches to CANCEL_SQL the current call is cancelled, raising error ORA-00400.
SWITCH_TIME specified the number of seconds on CPU (not elapsed time).
If SWITCH_ESTIMATE is true, the resource manager also switches group if the estimated run time is greater than the switch time
SWITCH_FOR_CALL is set to true so that if the consumer group is switched, it is then restored to the original consumer group at the end of the top call. Thus a persistent session is not permanently switched. This is important if switching an application engine server (PSAESRV) session.

Cancellation Behaviour

The resource manager can cancel long-running queries in these consumer groups raising ORA-00040: active time limit exceeded - call aborted

The query may be cancelled immediately because the estimated execution time is greater than the limit.
Otherwise, it is quite likely to run for an elapsed time that is greater than the CPU time limit. The database session is only on CPU if the event reported on the session is NULL. Otherwise, it is doing something else.
Some time will be consumed in the client process, during which the database session will be idle waiting for the next fetch request from the client. This is usually reported as event SQL*Net message from client.
Some of the database time may not be on CPU because it may be doing something else, such as physical IO.
The database session may be held back by the resource manager allocating CPU to higher priority processes, in which case again the session will not be on CPU, and will report being on event resmgr: cpu quantum.

Querying the PeopleSoft Message Log

The full message text is stored in multiple pieces in PS_MESSAGE_LOGPARM and must be reconstructed so that it can be searched for the error code. I demonstrated this technique in another blog post: Querying the PeopleSoft Message Log with SQL.

For this analysis, I have made some alterations to the message log query (see message_log_checker-psquery.sql).

This query is restricted to messages generated by PSQUERY processes that did not run to success (not run status 9).
PeopleSoft messages are typically defined with up to 9 substitution variables, but long SQL statements can have many more entries in PS_MESSAGE_LOGPARM. So the PL/SQL function in this query simply appends any additional log parameter rows beyond the 9 substitution variables to the end of the generated string.
Once the message has been generated we can look for one of the error messages associated with the resource manager terminating a query:

ORA-00040: active time limit exceeded - call aborted
ORA-56955: quarantined plan used

It is necessary to filter by message number because even in PS/Query users can write invalid SQL that produces other error messages. However, all this text processing for each row retrieved makes the query quite slow.

Here is an example output. User USR001 ran a private query MY_TEST2 with run control 42. It ran for 20772s (5h 46m) until terminated by the resource manager. As explained above, the 4-hour limit is on CPU time that will be less than the elapsed time.

                                                         Public/
                                                          Private                                Ru   Exec                              Msg Msg  Msg
     P.I. DBNAME  OPRID    RUNCNTLID                      Query   QRYNAME                        St   Secs DTTM_STAMP_SEC               Seq Set  Nbr
--------- ------- -------- ------------------------------ ------- ------------------------------ -- ------ ---------------------------- --- --- ----
MSG
---------------------------------------------------------------------------------------------------------------------------------------------------- 
 12395311 FSPROD  USR001   42                             Private MY_TEST2                       10  20772 10-FEB-25 04.41.47.384694 PM   1  65   30
File: C:\PT860P13B_2403250500-retail\peopletools\src\pssys\qpm.cppSQL error. Stmt #: 8526  Error Position: 189  Return: 40 - ORA-00040: active time
 limit exceeded - call abortedFailed SQL stmt: SELECT A.LEDGER, A.FISCAL_YEAR, A.BUSINESS_UNIT, …
                                                                                                           10-FEB-25 04.41.47.421800 PM   2  50  380
Error in running query because of SQL Error, Code=40, Message=ORA-00040: active time limit exceeded - call aborted
…

From Oracle 19c on Exadata, timed-out statements are automatically quarantined. If a quarantined statement is run and a quarantined execution plan is generated, then error ORA-56955 is generated immediately. Therefore, it can also be detected in the logs. The query searches for both messages.

                                                          Public/
                                                          Private                                Ru   Exec                              Msg Msg  Msg
     P.I. DBNAME  OPRID    RUNCNTLID                      Query   QRYNAME                        St   Secs DTTM_STAMP_SEC               Seq Set  Nbr
--------- ------- -------- ------------------------------ ------- ------------------------------ -- ------ ---------------------------- --- --- ----
MSG
----------------------------------------------------------------------------------------------------------------------------------------------------
 12319513 FSPROD  USR002   Transactions                   Public  GBR_JRNL_LINE_DTL_ACCT         10     25 13-FEB-25 11.13.35.746644 PM   1  65   30
File: C:\PT860P13B_2403250500-retail\peopletools\src\pssys\qpm.cppSQL error. Stmt #: 8526  Error Position: 2783  Return: -8581 - ORA-56955: quarant
ined plan usedFailed SQL stmt: SELECT A.FISCAL_YEAR, A.ACCOUNTING_PERIOD, A.JRNL_HDR_STATUS, B.LED
…
                                                                                                           13-FEB-25 11.13.35.814112 PM   2  50  380
Error in running query because of SQL Error, Code=-8581, Message=ORA-56955: quarantined plan used

I discuss automatic SQL quarantine for PS/Query in a subsequent blog.

Opinion

So far, I have explained how to set a maximum CPU time limit for PS/Queries in a resource manager consumer group and how to detect cancelled PS/Queries by examining the message log.

The final stage is to close the feedback loop and go back to the users producing the queries, find out what they are trying to do, and why the queries are running for such a long time.

The PeopleSoft DBA Blog

Thursday, February 20, 2025

Management of Long Running PS/Queries Cancelled by Resource Manager CPU Limit

Configuring SQL Cancellation in a Resource Manager Consumer Group

Plan Directives

Cancellation Behaviour

Querying the PeopleSoft Message Log

Opinion

No comments :

Announcements

My Book

Oracle ACE Director

BlueSky/Twitter

Featured post

nVision Performance Tuning: Table of Contents

Blog Archive

Popular Posts

PeopleSoft Administrator Podcasts

Tags

Other PeopleSoft Blogs

Other Oracle Blogs

OakTable Network

©David Kurtz

The PeopleSoft DBA Blog

Thursday, February 20, 2025

Management of Long Running PS/Queries Cancelled by Resource Manager CPU Limit

Configuring SQL Cancellation in a Resource Manager Consumer Group

Plan Directives

Cancellation Behaviour

Querying the PeopleSoft Message Log

Opinion

No comments :

Announcements

My Book

Oracle ACE Director

BlueSky/Twitter

Featured post

nVision Performance Tuning: Table of Contents

Blog Archive

Popular Posts

PeopleSoft Administrator Podcasts

Tags

Subscribe

Other PeopleSoft Blogs

Other Oracle Blogs

OakTable Network

©David Kurtz