Showing posts with label PS/Query. Show all posts
Showing posts with label PS/Query. Show all posts

Tuesday, June 24, 2025

Optimising Journal Line Queries: 2. Exadata System Statistics

This is the second of a series of five articles that examine the challenges posed by typical queries on the journal line table (PS_JRNL_LN).

  1. Problem Statement
  2. Exadata System Statistics
  3. Partitioning
  4. Compression
  5. Conclusion

Exadata System Statistics 

Many other people have written notes about how Oracle's optimizer costs a full table scan.  This is a selection:

Roughly speaking, the cost calculated by the optimizer that we see in an execution plan is an estimate of the time taken to perform an operation, where the unit of time is the duration of a single block read.  Although that statement is an oversimplification.  There are various guesses and assumptions built into the optimizer's calculation.  The cost-based optimizer looks for the cheapest plan, that ought to be the fastest to execute.  However, in many cases, cost does not correspond to execution time.

Full Scan Cost

The cost of a full table scan is made up of an I/O cost (the time taken to read the blocks from disk) and a CPU cost (the time taken to process the rows).  The I/O cost is the number of multi-block read operations, multiplied by the ratio of the duration of a multi-block read to a single-block read.

  • IO Cost = (HWM / MBRC) . (MREADTIM / SREADTIM)

Where

  • HWM = the high water mark of the segment expressed as a number of blocks
  • MBRC = average multi-block read count for sequential read, in blocks (see parameter DB_FILE_MULTIBLOCK_READ_COUNT).
  • MREADTIME = average time to perform a multi-block read at once (sequential read), in milliseconds
  • SREADTIME = average time to read a single block (random read), in milliseconds

See PL/SQL Packages and Types Reference: DBMS_STATS

The single and multi-block read times are derived from two system statistics, the block size and the multi-block read count.

  • SREADTIM = IOSEEKTIM + DB_BLOCK_SIZE / IOTFRSPEED
  • MREADTIM = IOSEEKTIM + (DB_BLOCK_SIZE * MBRC) / IOTFRSPEED

Where

  • IOSEEKTIM = Seek time + latency time + operating system overhead time, in milliseconds (default 10ms).
  • IOTFRSPEED = I/O transfer speed in bytes per millisecond (or if you prefer KBytes/second)
  • DB_BLOCK_SIZE = block size of the segment (usually 8Kb)

System statistics can be gathered based on actual system behaviour using DBMS_STATS, or set to pre-defined values using DBMS_STATS.GATHER_SYSTEM_STATS.  Over the years many blogs, forums and presentations have discussed the merits or otherwise of collecting or setting system statistics.  

Oracle's position is set out in the Oracle Optimizer Blog: Should You Gather System Statistics?  It can be summarised as: 

  • Do not gather your own system statistics.
  • Use the Oracle-provided defaults.
  • Except on Exadata, where you can consider using the Exadata defaults, and perhaps not even then on a mixed workload. You will have to test this for yourself.

On any Oracle system, the default system statistics can be reset with 

exec DBMS_STATS.GATHER_SYSTEM_STATS('NOWORKLOAD');

This sets the system statistics as follows:

  • MBRC=8
  • IOSEEKTIM=10
  • IOTFRSPEED=10

Thus:

SREADTIM = IOSEEKTIM + DB_BLOCK_SIZE / IOTFRSPEED
         = 10 + 8192 / 4096
         = 12 (ms)

MREADTIM = IOSEEKTIM + (DB_BLOCK_SIZE * MBRC) / IOTFRSPEED
         = 10 + (8192 * 8 ) / 4096
         = 10 + 16 
         = 26 (ms)

However, on Exadata, you can set 'system statistics take into account the unique capabilities of Oracle Exadata, such as large I/O size and high I/O throughput

exec DBMS_STATS.GATHER_SYSTEM_STATS('EXADATA');

Some system statistics are then set differently:

  • MBRC=128
  • IOSEEKTIM=10
  • IOTFRSPEED=204800

Thus

SREADTIM = IOSEEKTIM + DB_BLOCK_SIZE / IOTFRSPEED
         = 10 + 8192 / 204800
         = 10.04 (ms)

MREADTIM = IOSEEKTIM + (DB_BLOCK_SIZE * MBRC) / IOTFRSPEED
         = 10 + (8192 . 128 ) / 204800
         = 10 + 5.1200 
         = 15.12000 (ms)

Now, I can put these numbers back into the formula Oracle uses to calculate the I/O cost of a full scan.

  • IO Cost = (HWM / MBRC) . (MREADTIM / SREADTIM)

Let us suppose that we are going to read 100M blocks.  The I/O cost of that scan will be very different with Exadata system statistics rather than the normal default system statistics.

Normal IO Cost  = (100000000/8) . (26/12)
                = 27,083,333.3
 
Exadata IO Cost = (100000000/128) . (15.12/10.04)
                = 1,176,543.8

Thus, introducing Exadata system statistics significantly reduces the cost of the full scan, making the database more likely to use a full scan than index lookups.  That may or may not be desirable.

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name           | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                |   428K|    93M|       |  2834K  (8)| 00:01:51 |       |       |
|   1 |  HASH GROUP BY                 |                |   428K|    93M|   111M|  2834K  (8)| 00:01:51 |       |       |
|*  2 |   HASH JOIN                    |                |   428K|    93M|    46M|  2819K  (8)| 00:01:51 |       |       |
|   3 |    JOIN FILTER CREATE          | :BF0001        |   428K|    41M|       |  1476   (7)| 00:00:01 |       |       |
|   4 |     PART JOIN FILTER CREATE    | :BF0000        |   428K|    41M|       |  1476   (7)| 00:00:01 |       |       |
|*  5 |      TABLE ACCESS STORAGE FULL | PS_JRNL_HEADER |   428K|    41M|       |  1476   (7)| 00:00:01 |       |       |
|   6 |    JOIN FILTER USE             | :BF0001        |  1120K|   136M|       |  2817K  (8)| 00:01:51 |       |       |
|   7 |     PARTITION RANGE JOIN-FILTER|                |  1120K|   136M|       |  2817K  (8)| 00:01:51 |:BF0000|:BF0000|
|*  8 |      TABLE ACCESS STORAGE FULL | PS_JRNL_LN     |  1120K|   136M|       |  2817K  (8)| 00:01:51 |:BF0000|:BF0000|
-------------------------------------------------------------------------------------------------------------------------

If I look at the same example query that I used earlier, then with the Exadata default system statistics, the cost has come down significantly (from 66M to 2817K).  It is a significant improvement, but it is still greater than the cost of the nested loop (1730K).  Therefore, for this query, I still only get this execution plan if I hint the statement to force it.  I still need to make the full scan cheaper.

Different queries will have different costs and will flip between the nested loop and Full scan/Bloom filter/hash join at different points.

Non-Exadata System Statistics 

If you are not running on Exadata, then the advice from Oracle is clear and unambiguous: use the default system statistics that can be reset with 
exec DBMS_STATS.GATHER_SYSTEM_STATS('NOWORKLOAD');
You will have to consider other techniques to reduce the cost of the full table scan.

Monday, June 23, 2025

Optimising Journal Line Queries: 1. Problem Statement

In each PeopleSoft product, certain tables usually grow to become the largest in the implementations at most customers. The challenges they present and the options for dealing with them are also common to most systems.  Most PeopleSoft Financials systems use General Ledger.  In General Ledger, the ledger, summary ledger and journal line tables are usually the largest tables, and present the biggest challenges.

This is the first of five articles that examine the challenges typically presented by queries on the journal line table (PS_JRNL_LN).

  1. Problem Statement
  2. Exadata System Statistics
  3. Partitioning
  4. Compression
  5. Conclusion

Problem Statement

In General Ledger, we typically see many queries on the ledger (or summary ledger) tables and also queries in the application, drill-down queries in nVision reporting, and ad-hoc PS/Queries that query details of journals posted to the ledger. Below is part of a typical query. The statement and execution plans below were taken from a PeopleSoft Financials system.  It is running on Oracle 19c on Exadata.  Making use of Exadata features will also be a topic.
SELECT A.FISCAL_YEAR, A.ACCOUNTING_PERIOD, B.BUSINESS_UNIT, B.JOURNAL_ID, TO_CHAR(B.JOURNAL_DATE,'YYYY-MM-DD'), B.LEDGER, B.ACCOUNT
, B.PRODUCT, B.PROJECT_ID, B.CHARTFIELD1, B.CHARTFIELD2, B.CURRENCY_CD, B.AFFILIATE, SUM( B.MONETARY_AMOUNT), B.FOREIGN_CURRENCY
, SUM( B.FOREIGN_AMOUNT), A.REVERSAL_CD, A.REVERSAL_ADJ_PER, A.JRNL_HDR_STATUS, TO_CHAR(A.POSTED_DATE,'YYYY-MM-DD'), A.OPRID, A.DESCR254
, B.DEPTID, A.SOURCE, B.ALTACCT, TO_CHAR(CAST((A.DTTM_STAMP_SEC) AS TIMESTAMP),'YYYY-MM-DD-HH24.MI.SS.FF'), B.LINE_DESCR 
FROM PS_JRNL_HEADER A, PS_JRNL_LN B 
WHERE (A.BUSINESS_UNIT = B.BUSINESS_UNIT 
AND A.JOURNAL_ID = B.JOURNAL_ID 
AND A.JOURNAL_DATE = B.JOURNAL_DATE 
AND A.UNPOST_SEQ = B.UNPOST_SEQ 
AND A.JRNL_HDR_STATUS IN('P','V','U') 
AND A.FISCAL_YEAR IN (2024) 
AND A.ACCOUNTING_PERIOD BETWEEN 1 AND 12 
AND B.CHARTFIELD1 IN ('1234567','1234568','1234569')
AND B.LEDGER IN ('LEDGER')) 
GROUP BY A.FISCAL_YEAR, A.ACCOUNTING_PERIOD, B.BUSINESS_UNIT, B.JOURNAL_ID, B.JOURNAL_DATE, B.LEDGER, B.ACCOUNT, B.PRODUCT
, B.PROJECT_ID, B.CHARTFIELD1, B.CHARTFIELD2, B.CURRENCY_CD, B.AFFILIATE, B.FOREIGN_CURRENCY, A.REVERSAL_CD, A.REVERSAL_ADJ_PER
, A.JRNL_HDR_STATUS, A.POSTED_DATE, A.OPRID, A.DESCR254, B.DEPTID, A.SOURCE, B.ALTACCT, A.DTTM_STAMP_SEC, B.LINE_DESCR
  • The journal line table (PS_JRNL_LN) is joined to its parent, the journal header table (PS_JRNL_HEADER), by the 4 key columns on the journal header (BUSINESS_UNIT, JOURNAL_ID, JOURNAL_DATE, UNPOST_SEQ).
  • There are criteria on both the journal header and line tables.
    • The number of journal line rows per header is usually highly variable, and it also varies from customer to customer depending on the shape of their data.  It is not unusual to see thousands of journal line rows per header row.  Filtering it by FISCAL_YEAR and perhaps also ACCOUNTING_PERIOD could be very effective.  However, these columns are on PS_JRNL_HEADER, and not on PS_JRNL_LN. 
    • Queries often include criteria on other attribute columns on PS_JRNL_LN.  However, these columns are not indexed by default, though many customers add such indexes.
Below is the execution plan that Oracle produced for this query.  The optimizer chose to full scan the PS_JRNL_HEADER table, expecting to find 428,000 rows, and then do an index look-up on PS_JRNL_LN for each header row.  The optimizer predicts that the query will run for about 68s.  In practice, this query runs for over an hour, spending most of its time on the index lookup of PS_JRNL_LN by its unique index of the same name.
------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name           | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                |   428K|    93M|       |  1730K  (1)| 00:01:08 |       |       |
|   1 |  HASH GROUP BY                      |                |   428K|    93M|   111M|  1730K  (1)| 00:01:08 |       |       |
|   2 |   NESTED LOOPS                      |                |   428K|    93M|       |  1715K  (1)| 00:01:07 |       |       |
|   3 |    NESTED LOOPS                     |                |   428K|    93M|       |  1715K  (1)| 00:01:07 |       |       |
|*  4 |     TABLE ACCESS STORAGE FULL       | PS_JRNL_HEADER |   428K|    41M|       |  1476   (7)| 00:00:01 |       |       |
|   5 |     PARTITION RANGE ITERATOR        |                |     1 |       |       |     3   (0)| 00:00:01 |   KEY |   KEY |
|*  6 |      INDEX RANGE SCAN               | PS_JRNL_LN     |     1 |       |       |     3   (0)| 00:00:01 |   KEY |   KEY |
|*  7 |    TABLE ACCESS BY LOCAL INDEX ROWID| PS_JRNL_LN     |     1 |   128 |       |     4   (0)| 00:00:01 |     1 |     1 |
------------------------------------------------------------------------------------------------------------------------------
If I add hints to the SQL statement forcing it to full scan PS_JRNL_HEADER and generate a Bloom filter that it will apply during a full scan of PS_JRNL_LN, then the cost of the execution plan goes up (from 1730K to 66M), and the estimated run time goes up to 43 minutes, due to the cost of the full scan.  However, the actual execution time comes down to under 3 minutes.
-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name           | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                |   428K|    93M|       |    66M  (1)| 00:43:30 |       |       |
|   1 |  HASH GROUP BY                 |                |   428K|    93M|   111M|    66M  (1)| 00:43:30 |       |       |
|*  2 |   HASH JOIN                    |                |   428K|    93M|    46M|    66M  (1)| 00:43:30 |       |       |
|   3 |    JOIN FILTER CREATE          | :BF0001        |   428K|    41M|       | 32501   (1)| 00:00:02 |       |       |
|   4 |     PART JOIN FILTER CREATE    | :BF0000        |   428K|    41M|       | 32501   (1)| 00:00:02 |       |       |
|*  5 |      TABLE ACCESS STORAGE FULL | PS_JRNL_HEADER |   428K|    41M|       | 32501   (1)| 00:00:02 |       |       |
|   6 |    JOIN FILTER USE             | :BF0001        |  1132K|   137M|       |    66M  (1)| 00:43:28 |       |       |
|   7 |     PARTITION RANGE JOIN-FILTER|                |  1132K|   137M|       |    66M  (1)| 00:43:28 |:BF0000|:BF0000|
|*  8 |      TABLE ACCESS STORAGE FULL | PS_JRNL_LN     |  1132K|   137M|       |    66M  (1)| 00:43:28 |:BF0000|:BF0000|
-------------------------------------------------------------------------------------------------------------------------
I could solve the problem by adding hints either directly to the statement or with a SQL profile or SQL patch.  However, this is one of many statements, and the users will continuously produce more with the ad-hoc PS/Query tool.  I need a generic solution.
The Cost-Based Optimizer chose the nested loop plan because it calculated that it was cheaper.  However, that does not correspond to the change in the actual execution time.  The plan that was executed more quickly cost more, indicating a problem with the cost model.
  • Why is the full scan so expensive?  
  • How can I make it cheaper?

Thursday, May 08, 2025

Logging Run Controls and Bind Variables for Scheduled PS/Queries

This blog proposes additional logging for scheduled PS/Queries so that long-running queries can be reconstructed and analysed.

Previous blog posts have discussed limiting PS/Query runtime with the resource manager (see Management of Long Running PS/Queries Cancelled by Resource Manager CPU Limit).  From 19c, on Engineered Systems only, the 'Oracle Database automatically quarantines the plans for SQL statements terminated by … the Resource Manager for exceeding resource limits'.  SQL Quarantine is enabled by default in Oracle 19c on Exadata (unless patch 30104721 is applied that backports the new 23c parameters, see Oracle Doc ID 2635030.1: 19c New Feature SQL Quarantine - How To Stop Automatic SQL Quarantine).

What is the Problem?

SQL Quarantine prevents a query from executing.  Therefore, AWR will not capture the execution plan.  AWR will also purge execution plans where an execution has not been captured within the AWR retention period.  The original long-running query execution that was quarantined, if captured by AWR, will be aged out because it will not execute again.

If we want to investigate PS/Queries that produced execution plans that exceeded the runtime limit and were then quarantined, we need to reproduce the execution plan, either with the EXPLAIN PLAN FOR command or by executing the query in a session where the limited resource manager consumer group does not apply.

However, PS/Queries with bind variables present a challenge.  A PS/Query run with different bind variables can produce different execution plans.  One execution plan might be quarantined and so never complete, while another may complete within an acceptable time.  

In AWR, a plan is only captured once for each statement.  Therefore, it is possible to find one set of bind variables for each plan, although there may be many sets of bind variables that all produce the same execution plan.  However, we cannot obtain Oracle bind variables for quarantined execution plans that did not execute.  To regenerate their execution plans, we need another way to obtain their bind variables.

This problem occurs more generally where the Diagnostics Pack is not available, then it is not possible to reconstruct long-running queries without additional logging or tracing.

Solution

Scheduled PS/Queries are executed by the PSQUERY application engine.  The name of the query and the bind variables are passed via two run control records.  Users typically reuse an existing run control but provide different bind variable values.  I propose to introduce two tables to hold a copy of the data in these tables for each process instance.
  • PS_QUERY_RUN_CNTRL: Scheduled Query Run Control.  This record identifies the query executed.  Rows in this table will be copied to PS_QRYRUN_CTL_HST.
  • PS_QUERY_RUN_PARM: Scheduled Query Run Parameters.  This record holds the bind variables and the values passed to the query.  The table contains a row for each bind variable for each execution.  Rows in this table will be copied to PS_QRYRUN_PARM_HST

Two database triggers manage the history tables:

  • A database trigger that fires when the run status of the request is updated to '7' (processing).  It copies rows for the current run control into two corresponding history tables.  Thus, we will have a log of every bind variable for every scheduled query.
  • A second database trigger will fire when a PSQUERY request record is deleted.  It deletes the corresponding rows from these history tables.

When a PS/Query produces a quarantined execution plan, the PSQUERY process terminates with error ORA-56955: quarantined plan used (see Quarantined SQL Plans for PS/Queries).  Now we can obtain the bind variables that resulted in attempts to execute a quarantined query execution plan.

Implementation

The following script (ps_query_run_cntrl_hist_trigger.sql) creates the tables and triggers.  

REM ps_query_run_cntrl_hist_trigger.sql
REM 21.4.2025 - trigger and history tables to capture 
set echo on serveroutput on timi on
clear screen
spool ps_query_run_cntrl_hist_trigger
rollback;

CREATE TABLE SYSADM.PS_QRYRUN_CTL_HST 
  (PRCSINSTANCE INTEGER  DEFAULT 0 NOT NULL,
   OPRID VARCHAR2(30)  DEFAULT ' ' NOT NULL,
   RUN_CNTL_ID VARCHAR2(30)  DEFAULT ' ' NOT NULL,
   DESCR VARCHAR2(30)  DEFAULT ' ' NOT NULL,
   QRYTYPE SMALLINT  DEFAULT 1 NOT NULL,
   PRIVATE_QUERY_FLAG VARCHAR2(1)  DEFAULT 'N' NOT NULL,
   QRYNAME VARCHAR2(30)  DEFAULT ' ' NOT NULL,
   URL VARCHAR2(254)  DEFAULT ' ' NOT NULL,
   ASIAN_FONT_SETTING VARCHAR2(3)  DEFAULT ' ' NOT NULL,
   PTFP_FEED_ID VARCHAR2(30)  DEFAULT ' ' NOT NULL) TABLESPACE PTTBL
/
CREATE UNIQUE  iNDEX SYSADM.PS_QRYRUN_CTL_HST 
ON SYSADM.PS_QRYRUN_CTL_HST (PRCSINSTANCE) TABLESPACE PSINDEX PARALLEL NOLOGGING
/
ALTER INDEX SYSADM.PS_QRYRUN_CTL_HST NOPARALLEL LOGGING
/
CREATE TABLE SYSADM.PS_QRYRUN_PARM_HST 
  (PRCSINSTANCE INTEGER  DEFAULT 0 NOT NULL,
   OPRID VARCHAR2(30)  DEFAULT ' ' NOT NULL,
   RUN_CNTL_ID VARCHAR2(30)  DEFAULT ' ' NOT NULL,
   BNDNUM SMALLINT  DEFAULT 0 NOT NULL,
   FIELDNAME VARCHAR2(18)  DEFAULT ' ' NOT NULL,
   BNDNAME VARCHAR2(30)  DEFAULT ' ' NOT NULL,
   BNDVALUE CLOB) TABLESPACE PSIMAGE2 
/
CREATE UNIQUE  iNDEX SYSADM.PS_QRYRUN_PARM_HST 
ON SYSADM.PS_QRYRUN_PARM_HST (PRCSINSTANCE, BNDNUM) TABLESPACE PSINDEX PARALLEL NOLOGGING
/
ALTER INDEX SYSADM.PS_QRYRUN_PARM_HST NOPARALLEL LOGGING
/
  • PSQUERY is not a restartable Application Engine program.  Therefore, there is no risk of duplicate inserts into the history tables.
  • The exception handlers in the triggers deliberately suppress any error, in case that causes the process scheduler to crash.
CREATE OR REPLACE TRIGGER sysadm.query_run_cntrl_hist_ins
BEFORE UPDATE OF runstatus ON sysadm.psprcsrqst
FOR EACH ROW
WHEN (new.runstatus ='7' AND old.runstatus != '7' AND new.prcsname = 'PSQUERY' AND new.prcstype = 'Application Engine')
BEGIN
  INSERT INTO PS_QRYRUN_CTL_HST 
  (PRCSINSTANCE, OPRID, RUN_CNTL_ID, DESCR ,QRYTYPE, PRIVATE_QUERY_FLAG, QRYNAME, URL, ASIAN_FONT_SETTING, PTFP_FEED_ID)
  SELECT :new.prcsinstance, OPRID, RUN_CNTL_ID, DESCR ,QRYTYPE, PRIVATE_QUERY_FLAG, QRYNAME, URL, ASIAN_FONT_SETTING, PTFP_FEED_ID 
  FROM ps_query_run_cntrl WHERE oprid = :new.oprid AND run_cntl_id = :new.runcntlid;
  
  INSERT INTO PS_QRYRUN_PARM_HST
  (PRCSINSTANCE, OPRID, RUN_CNTL_ID, BNDNUM, FIELDNAME, BNDNAME, BNDVALUE) 
  SELECT :new.prcsinstance prcsinstance, OPRID, RUN_CNTL_ID, BNDNUM, FIELDNAME, BNDNAME, BNDVALUE
  FROM ps_query_run_parm WHERE oprid = :new.oprid AND run_cntl_id = :new.runcntlid;

  EXCEPTION WHEN OTHERS THEN NULL; --exception deliberately coded to suppress all exceptions 
END;
/

CREATE OR REPLACE TRIGGER sysadm.query_run_cntrl_hist_del
BEFORE DELETE ON sysadm.psprcsrqst
FOR EACH ROW
WHEN (old.prcsname = 'PSQUERY' AND old.prcstype = 'Application Engine')
BEGIN
  DELETE FROM PS_QRYRUN_CTL_HST WHERE prcsinstance = :old.prcsinstance;
  DELETE FROM PS_QRYRUN_PARM_HST WHERE prcsinstance = :old.prcsinstance;

  EXCEPTION WHEN OTHERS THEN NULL; --exception deliberately coded to suppress all exceptions
END;
/ 
show errors

spool off

Example

When a query is scheduled to run on the process scheduler, the bind variables are specified through this generic dialogue.

Scheduled Query Bind Variable Diaglogue

Once the PSQUERY process has started (it immediately commits its update to RUNSTATUS), these values are written to the new history tables.

select * from ps_qryrun_ctl_hst;

PRCSINSTANCE OPRID    RUN_CNTL_ID  DESCR                             QRYTYPE P QRYNAME                        URL                                                ASI PTFP_FEED_ID                  
------------ -------- ------------ ------------------------------ ---------- - ------------------------------ -------------------------------------------------- --- ------------------------------
    12345678 ABCDEF   042225       Journal Line Detail - Account           1 N XXX_JRNL_LINE_DTL_ACCT         https://xxxxxxx.yyyyy.com/psp/XXXXXXX/EMPLOYEE/ERP                                   

select * from ps_qryrun_ctl_hst;

PRCSINSTANCE OPRID    RUN_CNTL_ID  BNDNUM FIELDNAME          BNDNAME              BNDVALUE                      
------------ -------- ------------ ------ ------------------ -------------------- ------------------------------
    12345678 ABCDEF   042225            1 bind1              BUSINESS_UNIT        354XX 
    12345678 ABCDEF   042225            2 bind2              BUSINESS_UNIT        354XX 
    12345678 ABCDEF   042225            3 FISCAL_YEAR        FISCAL_YEAR          2025 
    12345678 ABCDEF   042225            4 ACCOUNTING_PD_FROM ACCOUNTING_PD_FROM   2 
    12345678 ABCDEF   042225            5 ACCOUNTING_PD_TO   ACCOUNTING_PD_TO     2 
    12345678 ABCDEF   042225            6 bind6              ACCOUNT              23882XXXXX 
    12345678 ABCDEF   042225            7 bind7              ACCOUNT              23882XXXXX 
    12345678 ABCDEF   042225            8 bind8              ALTACCOUNT           23882XXXXX 
    12345678 ABCDEF   042225            9 bind9              ALTACCOUNT           23882XXXXX

Conclusion

If the query is quarantined, PSQUERY will terminate with error ORA-56955: quarantined plan used. The SQL statement can be extracted from the message log, and the execution plan can be generated with the EXPLAIN PLAN FOR command, using the bind variable values captured in the history tables.

Note: The signature of the SQL Quarantine directive is the exact matching signature of the SQL text (it can be generated from the SQL text with dbms_sqltune.sqltext_to_signature).  There can be multiple PLAN_HASH_VALUEs for the same signature (because there can be multiple execution plans for the same SQL). Verify that the FULL_PLAN_HASH_VALUE of the execution plan generated with the captured bind variables corresponds to the PLAN_HASH_VALUE of a SQL Quarantine directive.

Monday, February 24, 2025

Quarantined SQL Plans for PS/Queries

This follows on from my previous post, Management of Long Running PS/Queries Cancelled by Resource Manager.

From 19c, on Engineered Systems only (such as Exadata and Exadata Cloud Service) the 'Oracle Database automatically quarantines the plans for SQL statements terminated by … the Resource Manager for exceeding resource limits.

The Resource Manager can set a maximum estimated execution time for a SQL statement, for example, 20 minutes. If a statement execution exceeds this limit, then the Resource Manager terminates the statement. However, the statement may run repeatedly before being terminated, wasting 20 minutes of resources each time it is executed.

Starting in Oracle Database 19c, if a statement exceeds the specified resource limit, then the Resource Manager terminates the execution and “quarantines” the plan. To quarantine the plan means to put it on a blacklist of plans that the database will not execute. Note that the plan is quarantined, not the statement itself.'

[Oracle SQL Tuning Guide: 4.7 About Quarantined SQL Plans]

When an attempt is made to execute a quarantined execution plan an error is produced: ORA-56955: quarantined plan used.

Oracle does not log timed-out or quarantined queries.  On V$SQL and V$SQLSTAT, AVOIDED_EXECUTIONS records the number of times a SQL query has been prevented from running.  However, this will not stay in the library cache long on a PeopleSoft system, due to the continual parse of dynamically generated SQL statements.  As of Oracle 19.20, it is not recorded in AWR on DBA_HIST_SQLSTAT.  

If an error condition occurs during a PSQUERY process run on the process scheduler, the process terminates with an error.  The SQL statement and the error message are recorded in the Application Engine message log.  As demonstrated in the previous blog, we can detect such failures by inspecting the message log of the PSQUERY process that did not finish successfully (ie. it has an end time, but the run status does not indicate success).

Matching Quarantine Directives to Cancelled Queries

Quarantine directives are visible via DBA_SQL_QUARANTINE, including SQL text and execution plan hash value.

It would be useful to know which quarantine directive relates to which query.  However, it is not easy to match the SQL in the PeopleSoft message log entries with that in the quarantine entries. The SQL text in the message log can have multiple spaces. These are stripped out in the DBA_SQL_QUARANTINE view where the normalised SQL statement is visible.  

The timestamp of creation and last execution of the quarantine directive is stored on it, but matching these to when the query was running can result in false positives.

Also, you cannot tell which quarantine directive was created by which consumer group.  The maximum CPU timeout is recorded on DBA_SQL_QUARANTINE.  In my example, it is only possible to distinguish the originating consumer group because the two consumer groups happen to have different timeouts.

A method that matches exactly, but only returns partial rows is to:

  • Obtain ASH data for queries terminated by the resource manager.  It can be matched by timestamp, MODULE, and ACTION (provided that EnableAEMonitoring is enabled).
    • Profile the ASH to find the statement that took the longest during each PSQUERY process, and that is almost certain to be the SQL query.  Thus obtaining the SQL_ID, SQL Plan Hash Value and consumer group ID.  It is also possible to determine the total database time for the query, and the database time spent on CPU.
    • The consumer group name can then be obtained from DBA_HIST_RSRC_CONSUMER_GROUP
  • Obtain the SQL text for the long-running query.  It would also have to be captured by an AWR snapshot.  This does often occur because it was a long-running SQL, but it is not certain.
  • The signature for the SQL statement (not the force-matching signature) can be derived using the SQLTEXT_TO_SIGNATURE function in DBMS_SQLTUNE.  This can be matched to the signature recorded in DBA_SQL_QUARANTINE.
  • You can have multiple quarantine directives for the same signature (i.e. the same SQL statement), each with a different plan hash value.  
    • NB: The plan hash value on DBA_SQL_QUARANTINE is the adaptive plan hash value (with all of its possible plan alternatives), and therefore it matches SQL_FULL_PLAN_HASH_VALUE in the ASH data, and not SQL_PLAN_HASH_VALUE (the plan that actually executed).
Note that
  • When a query executes until timed-out, producing ORA-00040, you usually can find the SQL statement in the AWR repository and so generate the signature to exactly match the quarantine record.
  • When an attempt is made to run a quarantined statement and execution plan, you usually cannot find the SQL statement because it hasn't run for long enough to produce an ASH sample.  Even when it has, you also have to rely on the statement having been captured previously by AWR.  Those conditions only come together occasionally.
This matching process is done by this query: message_log_checker-psquery2.sql.  Below is a sample output.  
  • We can see the quarantine directives that were created when the resource manager cancelled a query, raising error ORA-00040: active time limit exceeded - call aborted.  
  • However, where quarantine directives have prevented SQL from executing, raising error ORA-56955: quarantined plan used, the ASH data from the event that originally created the directive has since been purged, so we cannot use it to match directives.
Mon Feb 24                                                                                                                                                           page    1
                                                     PS/Queries terminated by Resource Manager/quarantined Execution Plan

                                                          Public/                                                              ASH
                                                          Private                                Ru Oracle      Exec    ASH    CPU Message Log
     P.I. DBNAME  OPRID    RUNCNTLID                      Query   QRYNAME                        St Err. #      Secs   Secs   Secs Date/Time Stamp              SQL_ID
--------- ------- -------- ------------------------------ ------- ------------------------------ -- --------- ------ ------ ------ ---------------------------- -------------
   SQL Plan   Full Plan                                                                                      CPU
 Hash Value  Hash Value Consumer Group Name                   SIGNATURE Quarantine Name                      Time  Quarantine Created           Quarantine Last Executed
----------- ----------- ------------------------- --------------------- ------------------------------------ ----- ---------------------------- ----------------------------
…
 31452465 FSPROD  USR0001  GBR_JRNL_LN_DETAIL_ACCT        Public  GBR_JRNL_LN_DETAIL_ACCT        10 ORA-56955     36     10     10 20-FEB-25 06.28.03.578218 PM 0wm9g6xkys12h
 4009529842   653370716 PSQUERY_BATCH_GROUP         5584654620166156419

 31451318 FSPROD  USR0002  GBR_JRNL_LN_DETAIL_ACCT        Public  GBR_JRNL_LN_DETAIL_ACCT        10 ORA-56955     36               20-FEB-25 02.36.38.590841 PM


 31451292 FSPROD  USR0002  GBR_JRNL_LN_DETAIL_ACCT        Public  GBR_JRNL_LN_DETAIL_ACCT        10 ORA-56955     36               20-FEB-25 02.30.51.777351 PM


 31438602 FSPROD  USR0003  1                              Private DK_GBR_GL_DETAIL_NEW           10 ORA-00040  28316  28275  14203 18-FEB-25 11.39.19.502412 PM 5qrbrf775whky
 3446094907  3491308607 PSQUERY_BATCH_GROUP        16266909742923016361 SQL_QUARANTINE_f3gxc76u48u59d019243f 14400 18-FEB-25 11.49.33.091081 PM

 31437925 FSPROD  USR0004  16                             Private TB_TEST2                       10 ORA-00040  17684  17654  17541 18-FEB-25 06.09.14.060615 PM 06xqrgj18wp05
 4256462904  2062199471 PSQUERY_BATCH_GROUP         6341132966559464532 SQL_QUARANTINE_5h01uuscnrg2n7aeaaaaf 14400 18-FEB-25 06.17.20.679769 PM

 31437907 FSPROD  USR0004  16                             Private TB_TEST2                       10 ORA-00040  17694  17695  17592 18-FEB-25 06.04.05.942470 PM 4yurn75y2p0t2
 3232504707   121066138 PSQUERY_BATCH_GROUP         4966087806133732884 SQL_QUARANTINE_49usqjjc001hn0737529a 14400 18-FEB-25 06.17.24.869185 PM
 …

Thursday, February 20, 2025

Management of Long Running PS/Queries Cancelled by Resource Manager CPU Limit

I have written previously about using the Oracle database resource manager to prioritise the allocation of CPU to different processes in a PeopleSoft system.  I proposed a sample resource plan that can be used as a starting point to build a resource plan that meets a system's specific objectives and requirements.

This post looks at 

  • How to configure the resource manager to cancel long-running queries,
  • What happens when it does and what PeopleSoft users experience,
  • How the system administrators can monitor such queries,
  • What action could they take?

Configuring SQL Cancellation in a Resource Manager Consumer Group

The sample resource plan, PSFT_PLAN, contains various server consumer groups.  It relies upon MODULE and ACTION being set by enabling PeopleSoft instrumentation (EnableAEMonitoring=1) and/or the psftapi_store_prcsinstance trigger on PSPRCSRQST (see Effective PeopleSoft Performance Monitoring),
  • PSQUERY_BATCH_GROUP
    • Applies to scheduled PSQUERY Application Engine Programs
    • Limited to 4 hours on CPU (or estimated at >= 4 hours)
  • PSQUERY_ONLINE
    • Applies to queries run online via the PeopleSoft Internet Architecture (PIA).
    • There is no resource manager limit for this consumer group.  
      • The PIA has a session timeout (default 20 minutes).  
      • The ICQuery Tuxedo service that runs the queries also has a timeout (default 20 minutes)
    • When the resource manager cancels a SQL call, it simply raises an Oracle error that appears in a PIA message box without further explanation.  It is better to let the PIA timeouts handle online queries in a more controlled fashion.
    • To prevent queries continuing to run on the database after the client has been terminated
  • LOW_LIMITED_GROUP
  • Applies to SQL*Plus, SQL Developer and Toad.
  • Limited to 2 hours on CPU (or estimated at >= 2 hours)

  • Recommendations: 

    • Users should generally be encouraged to schedule queries that will take more than a few minutes to run on the process scheduler.  
    • Resist the temptation to increase either the PIA inactivity timeout or ICQuery service timeout from the delivered setting of 20 minutes. 

    Plan Directives

    Plan directives are created with DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE
      DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
        'PSFT_PLAN', 'PSQUERY_BATCH_GROUP'
        ,mgmt_p6 => 1
        ,switch_group => 'CANCEL_SQL'
        ,switch_time => 14400
        ,switch_estimate => TRUE 
        ,switch_for_call => TRUE
      );
    
      DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
        'PSFT_PLAN', 'PSQUERY_ONLINE_GROUP'
        ,mgmt_p6 => 90
      );
    
      DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
        'PSFT_PLAN', 'LOW_LIMITED_GROUP'
        ,mgmt_p8 => 1
        ,switch_group => 'CANCEL_SQL'
        ,switch_time => 7200
        ,switch_elapsed_time => 7200
        ,switch_estimate => TRUE 
        ,switch_for_call => TRUE
      );
    Four parameters control cancellation behaviour.
    • SWITCH_GROUP specifies the consumer group to which the session is switched when a switch condition is met.  If the group switches to CANCEL_SQL the current call is cancelled, raising error ORA-00400.
    • SWITCH_TIME specified the number of seconds on CPU (not elapsed time).
    • If SWITCH_ESTIMATE is true, the resource manager also switches group if the estimated run time is greater than the switch time
    • SWITCH_FOR_CALL is set to true so that if the consumer group is switched, it is then restored to the original consumer group at the end of the top call.  Thus a persistent session is not permanently switched. This is important if switching an application engine server (PSAESRV) session.

    Cancellation Behaviour

    The resource manager can cancel long-running queries in these consumer groups raising ORA-00040: active time limit exceeded - call aborted 
    • The query may be cancelled immediately because the estimated execution time is greater than the limit.  
    • Otherwise, it is quite likely to run for an elapsed time that is greater than the CPU time limit. The database session is only on CPU if the event reported on the session is NULL.  Otherwise, it is doing something else.
    • Some time will be consumed in the client process, during which the database session will be idle waiting for the next fetch request from the client.  This is usually reported as event SQL*Net message from client.
    • Some of the database time may not be on CPU because it may be doing something else, such as physical IO.
    • The database session may be held back by the resource manager allocating CPU to higher priority processes,  in which case again the session will not be on CPU, and will report being on event resmgr: cpu quantum.

    Querying the PeopleSoft Message Log

    The full message text is stored in multiple pieces in PS_MESSAGE_LOGPARM and must be reconstructed so that it can be searched for the error code.  I demonstrated this technique in another blog post: Querying the PeopleSoft Message Log with SQL.

    For this analysis, I have made some alterations to the message log query (see message_log_checker-psquery.sql).

    • This query is restricted to messages generated by PSQUERY processes that did not run to success (not run status 9).
    • PeopleSoft messages are typically defined with up to 9 substitution variables, but long SQL statements can have many more entries in PS_MESSAGE_LOGPARM.  So the PL/SQL function in this query simply appends any additional log parameter rows beyond the 9 substitution variables to the end of the generated string.
    • Once the message has been generated we can look for one of the error messages associated with the resource manager terminating a query:
      • ORA-00040: active time limit exceeded - call aborted
      • ORA-56955: quarantined plan used

    It is necessary to filter by message number because even in PS/Query users can write invalid SQL that produces other error messages.  However, all this text processing for each row retrieved makes the query quite slow.

    Here is an example output.  User USR001 ran a private query MY_TEST2 with run control 42.  It ran for 20772s (5h 46m) until terminated by the resource manager.  As explained above, the 4-hour limit is on CPU time that will be less than the elapsed time.
                                                             Public/
                                                              Private                                Ru   Exec                              Msg Msg  Msg
         P.I. DBNAME  OPRID    RUNCNTLID                      Query   QRYNAME                        St   Secs DTTM_STAMP_SEC               Seq Set  Nbr
    --------- ------- -------- ------------------------------ ------- ------------------------------ -- ------ ---------------------------- --- --- ----
    MSG
    ---------------------------------------------------------------------------------------------------------------------------------------------------- 
     12395311 FSPROD  USR001   42                             Private MY_TEST2                       10  20772 10-FEB-25 04.41.47.384694 PM   1  65   30
    File: C:\PT860P13B_2403250500-retail\peopletools\src\pssys\qpm.cppSQL error. Stmt #: 8526  Error Position: 189  Return: 40 - ORA-00040: active time
     limit exceeded - call abortedFailed SQL stmt: SELECT A.LEDGER, A.FISCAL_YEAR, A.BUSINESS_UNIT, …
                                                                                                               10-FEB-25 04.41.47.421800 PM   2  50  380
    Error in running query because of SQL Error, Code=40, Message=ORA-00040: active time limit exceeded - call aborted
    …
    From Oracle 19c on Exadata, timed-out statements are automatically quarantined.  If a quarantined statement is run and a quarantined execution plan is generated, then error ORA-56955 is generated immediately.  Therefore, it can also be detected in the logs.  The query searches for both messages.
                                                              Public/
                                                              Private                                Ru   Exec                              Msg Msg  Msg
         P.I. DBNAME  OPRID    RUNCNTLID                      Query   QRYNAME                        St   Secs DTTM_STAMP_SEC               Seq Set  Nbr
    --------- ------- -------- ------------------------------ ------- ------------------------------ -- ------ ---------------------------- --- --- ----
    MSG
    ----------------------------------------------------------------------------------------------------------------------------------------------------
     12319513 FSPROD  USR002   Transactions                   Public  GBR_JRNL_LINE_DTL_ACCT         10     25 13-FEB-25 11.13.35.746644 PM   1  65   30
    File: C:\PT860P13B_2403250500-retail\peopletools\src\pssys\qpm.cppSQL error. Stmt #: 8526  Error Position: 2783  Return: -8581 - ORA-56955: quarant
    ined plan usedFailed SQL stmt: SELECT A.FISCAL_YEAR, A.ACCOUNTING_PERIOD, A.JRNL_HDR_STATUS, B.LED
    …
                                                                                                               13-FEB-25 11.13.35.814112 PM   2  50  380
    Error in running query because of SQL Error, Code=-8581, Message=ORA-56955: quarantined plan used
    I discuss automatic SQL quarantine for PS/Query in a subsequent blog.

    Opinion

    So far, I have explained how to set a maximum CPU time limit for PS/Queries in a resource manager consumer group and how to detect cancelled PS/Queries by examining the message log.

    The final stage is to close the feedback loop and go back to the users producing the queries, find out what they are trying to do, and why the queries are running for such a long time.

    Tuesday, November 26, 2024

    PeopleSoft PS/Query: Finding Users' Cartesian Joins

    Many PeopleSoft users like its ad hoc query tool because they can write their own queries directly on the system, without having to learn to write structured query language (SQL), or getting a developer to write it for them.   

    What is the Problem?

    It is easy for users to create poor queries, that either don't work as intended or can run for long periods, even indefinitely, consuming resources without ever producing results.  This can consume significant amounts of CPU, and in the cloud, that is mostly what you pay for!  The effect can be mitigated with the database's resource manager, but it is better not to do it in the first place.

    One cause of long-running queries that I come across is missing join criteria leading the database to perform Cartesian Merge Joins.  I should stress that not all Cartesian joins are evil.  For example, in some data warehouse queries (e.g. GL nVision reporting), it can be a very effective strategy to Cartesian join dimension tables before visiting the fact table, especially if you can use Bloom filter a full scan on the fact table.  It works well with parallel query, and on engineered systems this can also be pushed down to the storage cells.

    Finding Execution Plans with Cartesian Joins

    The following query profiles database time by execution plan from ASH for SQL statements from PS/Queries run via the PSQUERY application engine program on a process scheduler.  It returns the longest-running statement for each execution plan.

    The data is generated and processed through several common table expressions.

    • R returns the PSQUERY processes that ran in the time window of interest
    • P returns the execution plans captured by AWR that generate Cartesian products for which the SQL text is also captured.  
    • X returns the ASH data for Cartesian join executions. When P is joined with the ASH data, then we just get the queries that performed Cartesian joins.
    • Y sums and groups the ASH data by statement and process
    • Z sums the data by execution plan and identifies the longest-running SQL statement for that plan.

    REM qry_cartesianplans.sql
    WITH r as ( /*processes of interest*/
    SELECT /*+MATERIALIZE*/ r.oprid, r.prcsinstance, r.prcsname, r.begindttm, r.enddttm
    ,      DECODE(c.private_query_flag,'Y','Private','N','Public') private_query_flag, c.qryname
    FROM   psprcsrqst r
           LEFT OUTER JOIN ps_query_run_cntrl c ON c.oprid = r.oprid AND c.run_cntl_id = r.runcntlid
    WHERE prcsname = 'PSQUERY'
    AND r.begindttm >= trunc(SYSDATE)-0+8/24
    AND r.begindttm <= trunc(SYSDATE)-0+19/24
    ), p as ( /*known Cartesian plans with SQL text*/
    SELECT /*+MATERIALIZE*/ p.plan_hash_value, MAX(p.options) options
    FROM   dbA_hist_sql_plan p
    ,      dba_hist_sqltext t
    WHERE  t.sql_id = p.sql_id
    AND    (p.id = 0 OR p.options = 'CARTESIAN')
    GROUP BY p.plan_hash_Value
    ), x AS ( /*ASH for processes*/
    SELECT /*+materialize leading(r x)*/  r.prcsinstance, r.oprid, r.private_query_flag, r.qryname
    ,      h.event, x.dbid, h.sample_id, h.sample_time, h.instance_number
    ,      CASE WHEN h.module IS NULL       THEN REGEXP_SUBSTR(h.program, '[^@]+',1,1)
                WHEN h.module LIKE 'PSAE.%' THEN REGEXP_SUBSTR(h.module, '[^.]+',1,2) 
                ELSE                             REGEXP_SUBSTR(h.module, '[^.@]+',1,1) 
           END AS module
    ,      h.action
    ,      NULLIF(h.top_level_sql_id, h.sql_id) top_level_sql_id
    ,      h.sql_id, h.sql_plan_hash_value, h.force_matching_signature, h.sql_exec_id
    ,      h.session_id, h.session_serial#, h.qc_instance_id, h.qc_Session_id, h.qc_Session_serial#
    ,      f.name, p.options
    ,      NVL(usecs_per_row,1e7) usecs_per_row
    ,      CASE WHEN p.plan_hash_value IS NOT NULL THEN NVL(usecs_per_row,1e7) ELSE 0 END usecs_per_row2
    FROM   dba_hist_snapshot x
    ,      dba_hist_active_sess_history h
           LEFT OUTER JOIN p ON p.plan_hash_value = h.sql_plan_hash_value
           LEFT OUTER JOIN dba_sql_profiles f ON h.force_matching_signature = f.signature
    ,      r
    ,      sysadm.psprcsque q
    WHERE  h.SNAP_id = X.SNAP_id
    AND    h.dbid = x.dbid
    AND    h.instance_number = x.instance_number
    AND    x.end_interval_time >= r.begindttm
    AND    x.begin_interval_time <= NVL(r.enddttm,SYSDATE)
    AND    h.sample_time BETWEEN r.begindttm AND NVL(r.enddttm,SYSDATE)
    AND    q.prcsinstance = r.prcsinstance
    AND    (  (h.module = r.prcsname AND h.action like 'PI='||r.prcsinstance||':Processing')
           OR  h.module like 'PSAE.'||r.prcsname||'.'||q.sessionidnum)
    ), y as( /*profile time by statement/process*/
    SELECT prcsinstance, oprid, private_query_flag, qryname, sql_plan_hash_value, sql_id, force_matching_signature, name
    ,      dbid, module, action, top_level_sql_id
    ,      count(distinct qc_session_id||qc_session_serial#||sql_id||sql_exec_id) execs
    ,      sum(usecs_per_row)/1e6 ash_Secs
    ,      sum(usecs_per_Row2)/1e6 awr_secs
    ,      avg(usecs_per_row)/1e6*count(distinct sample_time) elapsed_secs
    ,      count(distinct instance_number||session_id||session_serial#) num_procs
    ,      max(options) options
    FROM   x 
    GROUP BY prcsinstance, oprid, private_query_flag, qryname, sql_plan_hash_value, sql_id, force_matching_signature, name
    ,      dbid, module, action, top_level_sql_id, qc_instance_id, qc_session_id, qc_session_serial#
    ), z as ( /*find top statement per plan and sum across all executions*/
    SELECT row_number() over (partition by force_matching_signature, sql_plan_hash_value order by awr_secs desc) plan_seq
    ,      prcsinstance, oprid, name, private_query_flag, NVL(qryname,action) qryname, options
    ,      sql_id, sql_plan_hash_Value, force_matching_signature
    ,      count(distinct sql_id) over (partition by force_matching_signature, sql_plan_hash_value) sql_ids
    ,      sum(execs) over (partition by force_matching_signature, sql_plan_hash_value) plan_execs
    ,      sum(ash_Secs) over (partition by force_matching_signature, sql_plan_hash_value) plan_ash_secs
    ,      sum(awr_Secs) over (partition by force_matching_signature, sql_plan_hash_value) plan_awr_secs
    ,      sum(elapsed_Secs) over (partition by force_matching_signature, sql_plan_hash_value) elap_secs
    ,      sum(num_procs) over (partition by force_matching_signature, sql_plan_hash_value) max_procs
    FROM   y
    )
    Select z.*, z.plan_ash_secs/z.elap_secs eff_para
    from   z
    where  plan_seq = 1
    and    sql_id is not null
    and    plan_ash_secs >= 300
    ORDER BY plan_ash_secs DESC
    FETCH FIRST 50 ROWS ONLY
    /
    There are two SQL statements for the same private query. XXX_GL_BJU run by user BXXXXXX that exhibited a Cartesian join.

    Plan    Plan
         Process                                          Private                                                           SQL Plan        Force Matching  SQL   Plan     ASH     AWR Elapsed  Max  Eff.
     #  Instance OPRID     NAME                           Query   QRYNAME                        OPTIONS   SQL_ID         Hash Value             Signature  IDs  Execs    Secs    Secs Seconds  Prc  Para
    -- --------- --------- ------------------------------ ------- ------------------------------ --------- ------------- ----------- --------------------- ---- ------ ------- ------- ------- ---- -----
     1  12344342 NXXXXXX                                  Public  XXX_TRIAL_BALANCE_BY_BU_XXX_V2           c4zfcub2bnju8  2128864041   4468535744829993986    4      4  103473  103473  103473    4   1.0
     1  12344471 FXXXXXX                                  Public  XXXAM_FIN_GL_AP                          d8jnxzmgx20mq  4189069557  16033793374717384734    1      1   32599   32599   32599    1   1.0
     1  12344448 VXXXXXX                                  Private XXX1_LEDGERBAL1_UPRDAC_XXXX1             ftn7nz1xafh5z           0  15193759933860031914    2      2   20615   20615   20615    2   1.0
     1  12345574 BXXXXXX                                  Private XXX_GL_BJU                     CARTESIAN ab2v91h9zj3hv   603930234   4189289347608449750    1      1   16862   16862   16862    1   1.0
     1  12345681 BXXXXXX                                  Private XXX_GL_BJU                     CARTESIAN 05tphb379fu8j   603930234   6203431496815450503    1      1   15452   15452   15452    1   1.0
     1  12345852 WXXXXXX                                  Public  XXXINSOLVENTS_JRNL_DETAIL                51aw4ahxba0gq  3918624993  11145663850623390044    1      1   13435   13435   13435    1   1.0
     1  12345863 CXXXXXX                                  Public  XXX_COMMUTATIONS_JRNL_DTL                7q9kt75bh35dg           0  11985643849566057390    1      1   13283   13283   13283    1   1.0
     1  12344773 WXXXXXX                                  Private XXX_COMMUTATION_JRNL_DETAIL_2            361gck3w3mak7           0  18367721225324700858    1      2   12883   12883   12883    2   1.0
     1  12344682 DXXXXXX                                  Private COMBINED_JE_DETAIL_DV                    2gchgaf465ku5           0   5375582220398622005    1      1    9279    9279    9279    1   1.0
     1  12345618 DXXXXXX                                  Private COMBINED_JE_DETAIL_DV_NO_AFF             2q2faj9c6003u           0  15355473744647942117    1      1    5079    5079    5079    1   1.0
    …
    The SQL statement and execution plan can be extracted from AWR using DBMS_XPLAN.DISPLAY_WORKLOAD_REPSITORY.  

    SELECT * FROM table(dbms_xplan.display_workload_repository('ab2v91h9zj3hv',603930234,'ADVANCED +ADAPTIVE'));
    

    In this example, there are two similar SQL statements, with different force matching signatures, that produce the same execution plan.  The difference is that one has an IN list of 3 accounts, and the other has an equi-join to just one account.  This is enough to produce a different force matching signature.  This is why I often group ASH data by execution plan hash value.  Even if the SQL statement is different, if the execution plan is the same, then the issues and solutions tend to be the same.

    The statements have been reformated to make them easier to read.  Both are just joins between two objects.  There are criteria on PS_JRNL_DRILL_VW (a view on PS_JRNL_LN), but there are no join criteria between it and its parent table JRNL_HEADER, thus a meaningless Cartesian product that joins every journal line to every journal header was created and sorted.

    SQL_ID ab2v91h9zj3hv 
    --------------------
    SELECT A.BUSINESS_UNIT, A.JOURNAL_ID,
    TO_CHAR(A.JOURNAL_DATE,'YYYY-MM-DD'), B.DESCR254, A.ACCOUNT,
    A.LINE_DESCR, SUM( A.MONETARY_AMOUNT), A.LEDGER, B.ACCOUNTING_PERIOD,
    B.SOURCE, B.OPRID, A.PRODUCT, A.CLASS_FLD, A.PROGRAM_CODE,
    A.CHARTFIELD1, A.CHARTFIELD3, A.CURRENCY_CD, A.FOREIGN_CURRENCY 
    FROM PS_JRNL_DRILL_VW A, PS_JRNL_HEADER B 
    WHERE ( A.BUSINESS_UNIT IN('12341','12347') 
    AND A.LEDGER IN ('CORE','LOCAL_ADJ','LOCAL_ADJ2') 
    AND A.ACCOUNT IN ('1234510040','1234510000','1234510060') 
    AND A.ACCOUNTING_PERIOD BETWEEN 1 AND 12 AND A.FISCAL_YEAR = 2023) 
    GROUP BY A.BUSINESS_UNIT, A.JOURNAL_ID, A.JOURNAL_DATE, B.DESCR254, A.ACCOUNT,
    A.LINE_DESCR, A.LEDGER, B.ACCOUNTING_PERIOD, B.SOURCE, B.OPRID,
    A.PRODUCT, A.CLASS_FLD, A.PROGRAM_CODE, A.CHARTFIELD1, A.CHARTFIELD3,
    A.CURRENCY_CD, A.FOREIGN_CURRENCY ORDER BY 11

    SQL_ID 05tphb379fu8j
    --------------------
    SELECT A.BUSINESS_UNIT, A.JOURNAL_ID,
    TO_CHAR(A.JOURNAL_DATE,'YYYY-MM-DD'), B.DESCR254, A.ACCOUNT,
    A.LINE_DESCR, SUM( A.MONETARY_AMOUNT), A.LEDGER, B.ACCOUNTING_PERIOD,
    B.SOURCE, B.OPRID, A.PRODUCT, A.CLASS_FLD, A.PROGRAM_CODE,
    A.CHARTFIELD1, A.CHARTFIELD3, A.CURRENCY_CD, A.FOREIGN_CURRENCY 
    FROM PS_JRNL_DRILL_VW A, PS_JRNL_HEADER B 
    WHERE ( A.BUSINESS_UNIT IN('12341','12347') 
    AND A.LEDGER IN ('CORE','LOCAL_ADJ','LOCAL_ADJ2') 
    AND A.ACCOUNT = '1234510000' 
    AND A.ACCOUNTING_PERIOD BETWEEN 1 AND 12 
    AND A.FISCAL_YEAR = 2023) 
    GROUP BY A.BUSINESS_UNIT, A.JOURNAL_ID, A.JOURNAL_DATE, B.DESCR254, A.ACCOUNT
    , A.LINE_DESCR, A.LEDGER, B.ACCOUNTING_PERIOD, B.SOURCE, B.OPRID
    , A.PRODUCT, A.CLASS_FLD,A.PROGRAM_CODE, A.CHARTFIELD1, A.CHARTFIELD3
    , A.CURRENCY_CD, A.FOREIGN_CURRENCY ORDER BY 11
    
    Line 2 of the execution plan reports a MERGE JOIN CARTESIAN operation that feeds into the SORT GROUP operation at line 1.

    Plan hash value: 603930234
    
    ---------------------------------------------------------------------------------------------------------------------------------------------------
    | Id  | Operation                                        | Name           | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop | Inst   |IN-OUT|
    ---------------------------------------------------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT                                 |                |       |       | 84648 (100)|          |       |       |        |      |
    |   1 |  SORT GROUP BY                                   |                |    57 |  9063 | 84648   (1)| 00:00:04 |       |       |        |      |
    |   2 |   MERGE JOIN CARTESIAN                           |                |    57 |  9063 | 84647   (1)| 00:00:04 |       |       |        |      |
    |   3 |    NESTED LOOPS                                  |                |     1 |   145 |  1636   (0)| 00:00:01 |       |       |        |      |
    |   4 |     VIEW                                         | PS_JRNL_HEADER |   112 |  4032 |  1188   (0)| 00:00:01 |       |       |        |      |
    |   5 |      UNION-ALL                                   |                |       |       |            |          |       |       |        |      |
    |   6 |       REMOTE                                     | PS_JRNL_HEADER |    76 |  5624 |    18   (0)| 00:00:01 |       |       | FSARC~ | R->S |
    |   7 |       INLIST ITERATOR                            |                |       |       |            |          |       |       |        |      |
    |   8 |        TABLE ACCESS BY INDEX ROWID BATCHED       | PS_JRNL_HEADER | 16679 |   586K| 11634   (1)| 00:00:01 |       |       |        |      |
    |*  9 |         INDEX RANGE SCAN                         | PSEJRNL_HEADER | 16679 |       |   347   (0)| 00:00:01 |       |       |        |      |
    |  10 |     VIEW                                         | PS_JRNL_LN     |     1 |   109 |     4   (0)| 00:00:01 |       |       |        |      |
    |  11 |      UNION-ALL PARTITION                         |                |       |       |            |          |       |       |        |      |
    |* 12 |       FILTER                                     |                |       |       |            |          |       |       |        |      |
    |  13 |        REMOTE                                    | PS_JRNL_LN     |     1 |   217 |     5   (0)| 00:00:01 |       |       | FSARC~ | R->S |
    |* 14 |       FILTER                                     |                |       |       |            |          |       |       |        |      |
    |  15 |        PARTITION RANGE SINGLE                    |                |     1 |   109 |     5   (0)| 00:00:01 |   KEY |   KEY |        |      |
    |* 16 |         TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PS_JRNL_LN     |     1 |   109 |     5   (0)| 00:00:01 |   KEY |   KEY |        |      |
    |* 17 |          INDEX RANGE SCAN                        | PS_JRNL_LN     |     1 |       |     4   (0)| 00:00:01 |   KEY |   KEY |        |      |
    |  18 |    BUFFER SORT                                   |                |  7749K|   103M| 84644   (1)| 00:00:04 |       |       |        |      |
    |  19 |     VIEW                                         | PS_JRNL_HEADER |  7749K|   103M| 83011   (1)| 00:00:04 |       |       |        |      |
    |  20 |      UNION-ALL                                   |                |       |       |            |          |       |       |        |      |
    |  21 |       REMOTE                                     | PS_JRNL_HEADER |  5698K|  1880M| 50467   (1)| 00:00:02 |       |       | FSARC~ | R->S |
    |  22 |       TABLE ACCESS STORAGE FULL                  | PS_JRNL_HEADER |  2050K|    86M| 32544   (1)| 00:00:02 |       |       |        |      |
    ---------------------------------------------------------------------------------------------------------------------------------------------------
    A profile of database time for that execution plan by event and plan line ID shows that most of the time is spent on line 1, sorting the output of the Cartesian product.
                                                                                                                        Stmt    Stmt
       SQL Plan SQL Plan                                                                  H   E I     ASH                ASH    Elap
     Hash Value  Line ID EVENT                                                            P P x M    Secs  ELAP_SECS    Secs    Secs
    ----------- -------- ---------------------------------------------------------------- - - - - ------- ---------- ------- -------
      603930234        1 CPU+CPU Wait                                                     N N Y N  217091 23405.3608  299088   32314
                      18 direct path read temp                                            N N Y N   64395 7034.44748  299088   32314
                      18 CPU+CPU Wait                                                     N N Y N   16998 1812.39445  299088   32314
                       1 ASM IO for non-blocking poll                                     N N Y N     195 20.4802032  299088   32314
                      21 CPU+CPU Wait                                                     N N Y N     195   20.47995  299088   32314
                      16 CPU+CPU Wait                                                     N N Y N     113   10.24021  299088   32314
                         CPU+CPU Wait                                                     N N Y N     103   10.25244  299088   32314
    This query never finished because the Cartesian product was so large.  The time recorded was spent in two executions that were eventually cancelled by system operators.

    The answer in this particular case is to fix the code.  We have to go back to the user, explain why it is necessary to join parent and child tables and get them to correct their PS/Query.

    Finding PS/Queries Without Joins on Related Records

    In PeopleSoft, the parent of a child record is recorded on PSRECDEFN in the column PARENTRECNAME.  However, this does not translate into a foreign key relationship in any database supported by PeopleSoft.  This is part of PeopleSoft's original platform-agnosticism.  Not all databases previously supported by PeopleSoft supported database enforced referential integrity.  Therefore it never became part of the implementation, and there is no guarantee that the applications were written in such a way to honour foreign-key constraints (i.e. insert parents before children, delete children before parents etc.).

    The below query looks at pairs of parent-child records in each select block of each PS/Query and counts the number of key columns for which there are criteria on the child record that are joined to the parent record.  It is restricted to just the journal header/line tables and views.

    It returns rows where no joined key columns are found.  These queries are therefore suspected of being faulty.  However, there may be false positives where child records are joined to grandparents rather than immediate parents.  Such an approach in SQL is perfectly valid, and can even result in better performance.  

    WITH x as (
    SELECT r1.oprid, r1.qryname, r1.selnum
    , r1.rcdnum rcdnum1, r1.recname recname1, r1.corrname corrname1
    , r2.rcdnum rcdnum2, r2.recname recname2, r2.corrname corrname2
    , (SELECT count(*) 
       FROM psqryfield qf1 --INNER JOIN psrecfielddb f1 ON f1.recname = r1.recname AND f1.fieldname = qf1.fieldname
       ,    psqryfield qf2 INNER JOIN psrecfielddb f2 ON f2.recname = r2.recname AND f2.fieldname = qf2.fieldname 
                                                     AND MOD(f2.useedit,2)=1 /*key fields only*/
       , psqrycriteria c
       WHERE qf1.oprid = r1.oprid AND qf1.qryname = r1.qryname AND qf1.selnum = r1.selnum AND qf1.recname = r1.recname AND qf1.fldrcdnum = r1.rcdnum
       AND   qf2.oprid = r2.oprid AND qf2.qryname = r2.qryname AND qf2.selnum = r2.selnum AND qf2.recname = r2.recname AND qf2.fldrcdnum = r2.rcdnum
       AND    c.oprid = r1.oprid AND c.qryname = r1.qryname AND  c.selnum = r1.selnum 
       AND   (  (c.lcrtselnum = r1.selnum AND c.lcrtfldnum = qf1.fldnum AND c.r1crtselnum = r2.selnum AND c.r1crtfldnum = qf2.fldnum)
             OR (c.lcrtselnum = r2.selnum AND c.lcrtfldnum = qf2.fldnum AND c.r1crtselnum = r1.selnum AND c.r1crtfldnum = qf1.fldnum))
    -- AND rownum = 1
      ) num_key_fields
    FROM psrecdefn r
    , psqryrecord r1
      INNER JOIN psqryrecord r2 ON r1.oprid = r2.oprid AND r1.qryname = r2.qryname AND r1.selnum = r2.selnum AND r1.rcdnum != r2.rcdnum --AND r1.corrname < r2.corrname
    WHERE r.recname = r2.recname AND r.parentrecname = r1.recname
    )
    SELECT x.* FROM x
    WHERE num_key_fields = 0
    AND recname1 IN('JRNL_HEADER')
    AND recname2 IN('JRNL_LN','JRNL_DRILL_VW')
    ORDER BY 1,2,3
    /

    However, these queries may not have been run recently.  Users tend to write queries, save a modification as a new version, and then abandon the old version.

                                              Sel Rec1                    Cor Rec2                    Cor #Key
    OPRID     QRYNAME                           #    # Record 1           #1     # Record             #2  Flds
    --------- ------------------------------ ---- ---- ------------------ --- ---- ------------------ --- ----
              2_XX_CHI_JOURNAL_MES2_RE          1    1 JRNL_HEADER        A      2 JRNL_LN            B      0
              12300_GL_ACCOUNT_DETAIL           1    1 JRNL_HEADER        A      2 JRNL_LN            B      0
              123_DK                            1    1 JRNL_HEADER        A      2 JRNL_LN            B      0
              123_NEW                           1    1 JRNL_HEADER        A      2 JRNL_LN            B      0
              12345_ACCRUAL_JE_DETAILS          1    1 JRNL_HEADER        A      2 JRNL_LN            C      0
              12345_ACCRUAL_JE_DETAILS_V2       1    1 JRNL_HEADER        A      2 JRNL_LN            C      0
              12345_ACCRUAL_JE_DETAILS_V3       1    1 JRNL_HEADER        A      2 JRNL_LN            C      0
              12345_HARDSOFT_JE_DETAILS_V3      1    1 JRNL_HEADER        A      2 JRNL_LN            C      0
              12345_BM_CURR_ACTIVITY2           1    1 JRNL_HEADER        A      2 JRNL_LN            B      0
              AAIC_CBP_POOLS                    1    1 JRNL_HEADER        A      2 JRNL_LN            B      0
    …
    

    I demonstrated how to identify long-running PS/Queries on the process schedulers in an earlier blog post. The following query merges in that query, so that it only considers queries that have run on a process scheduler within the purge period, for which join criteria may be missing.  They are sorted by descending execution time.

    REM qry_missingjoins.sql
    WITH q1 as (
    SELECT r.prcsinstance
    , r.oprid runoprid, r.runcntlid
    , DECODE(c.private_query_flag,'Y','Private','N','Public') private_query_flag
    , DECODE(c.private_query_flag,'Y',r.oprid,' ') oprid
    , c.qryname
    , CAST(begindttm AS DATE) begindttm
    , CAST(enddttm AS DATE) enddttm
    , runstatus
    , (CAST(NVL(enddttm,SYSDATE) AS DATE)-CAST(begindttm AS DATE))*86400 exec_Secs
    FROM psprcsrqst r
      LEFT OUTER JOIN ps_query_run_cntrl c ON c.oprid = r.oprid AND c.run_cntl_id = r.runcntlid
    WHERE prcsname = 'PSQUERY'
    AND dbname IN(select DISTINCT dbname from ps.psdbowner)
    --AND r.begindttm >= trunc(SYSDATE)-2+8/24
    --AND r.begindttm <= trunc(SYSDATE)-2+19/24
    ), q as (
    Select /*+MATERIALIZE*/ oprid, qryname
    , SUM(exec_secs) exec_secs
    , COUNT(*) num_execs
    , COUNT(DECODE(runstatus,'9',1,NULL)) complete_execs
    , COUNT(DISTINCT runoprid) runoprids
    FROM q1
    GROUP BY oprid, qryname
    ), x as (
    SELECT r1.oprid, r1.qryname, r1.selnum
    , r1.rcdnum rcdnum1, r1.recname recname1, r1.corrname corrname1
    , r2.rcdnum rcdnum2, r2.recname recname2, r2.corrname corrname2
    , (SELECT count(*) 
       FROM psqryfield qf1 --INNER JOIN psrecfielddb f1 ON f1.recname = r1.recname AND f1.fieldname = qf1.fieldname
       ,    psqryfield qf2 INNER JOIN psrecfielddb f2 ON f2.recname = r2.recname AND f2.fieldname = qf2.fieldname AND MOD(f2.useedit,2)=1
       , psqrycriteria c
       WHERE qf1.oprid = r1.oprid AND qf1.qryname = r1.qryname AND qf1.selnum = r1.selnum AND qf1.recname = r1.recname AND qf1.fldrcdnum = r1.rcdnum
       AND   qf2.oprid = r2.oprid AND qf2.qryname = r2.qryname AND qf2.selnum = r2.selnum AND qf2.recname = r2.recname AND qf2.fldrcdnum = r2.rcdnum
       AND    c.oprid = r1.oprid AND  c.qryname = r1.qryname AND  c.selnum = r1.selnum 
       AND   (  (c.lcrtselnum = r1.selnum AND c.lcrtfldnum = qf1.fldnum AND c.r1crtselnum = r2.selnum AND c.r1crtfldnum = qf2.fldnum)
             OR (c.lcrtselnum = r2.selnum AND c.lcrtfldnum = qf2.fldnum AND c.r1crtselnum = r1.selnum AND c.r1crtfldnum = qf1.fldnum))
       AND rownum = 1
      ) num_key_fields
    FROM psrecdefn r
    , psqryrecord r1
      INNER JOIN psqryrecord r2 ON r1.oprid = r2.oprid AND r1.qryname = r2.qryname AND r1.selnum = r2.selnum AND r1.rcdnum != r2.rcdnum --AND r1.corrname < r2.corrname
    WHERE r.recname = r2.recname AND r.parentrecname = r1.recname
    )
    SELECT /*+LEADING(Q)*/ q.*, x.selnum
    , x.rcdnum1, x.recname1, x.corrname1
    , x.rcdnum2, x.recname2, x.corrname2, x.num_key_fields
    FROM x
      INNER JOIN q ON q.oprid = x.oprid AND q.qryname = x.qryname
    WHERE num_key_fields = 0
    AND exec_secs >= 600
    ORDER BY exec_secs desc
    
    /

    Now I have a list of candidate queries that have been used recently and may be missing joins that I investigate further.

                                                                                              Sel Rec1                    Cor Rec2                    Cor #Key
    OPRID     QRYNAME                         EXEC_SECS  NUM_EXECS COMPLETE_EXECS  RUNOPRIDS    #    # Record 1           #1     # Record 2           #2  Flds
    --------- ------------------------------ ---------- ---------- -------------- ---------- ---- ---- ------------------ --- ---- ------------------ --- ----
    UKXXXXXXX AR_VENDOR_LOCATION_DETAILB         264317        361            360          1    1    1 VENDOR             A      8 VNDR_LOC_SCROL     H      0
              XX_COL_MOV_ALT_ACCT2_PERIO         193692       2096           2051         14    1    1 JRNL_HEADER        A      3 OPEN_ITEM_GL       C      0
              APC_123_LEDGER_ACTIVITY_BY_BU      151438       2959           2938         73    2    1 JRNL_HEADER        B      2 JRNL_LN            C      0
    MXXXXXX   MT_AUSTRALIA_TAX_PMTS              137471         36             28          1    1    1 JRNL_HEADER        A      2 JRNL_LN            B      0
              XX_PAN_ASIA_JOURNALS_REF           135825         48             47          4    1    1 JRNL_HEADER        A      5 JRNL_OPENITM_VW    E      0
              XXX_STKCOMP_LIFE                   120537        526            523          1    1    1 JRNL_HEADER        A      2 JRNL_LN            B      0
              XXX_123_TB_LEDGER_BAL_BU           100848       2093           2044         17    3    1 JRNL_HEADER        B      2 JRNL_LN            C      0
    KXXXXXX   XXX_JRNL_LIST_AUDIT_KL              99843        489            482          1    1    2 JRNL_HEADER        B      1 JRNL_DRILL_VW      A      0
              XXX_JE_ID_QUERY                     86106        156            151          1    1    1 JRNL_HEADER        A      2 JRNL_LN            C      0
              XXX_ACTIVITY_DETAILS_2              85356        336            302          5    1    1 JRNL_HEADER        A      2 JRNL_LN            B      0
    …

    Anyone can inspect any public queries, but you must be logged in as the owner of a private query to be able to see it.

    The scripts in this article can be downloaded from GitHub davidkurtz/psscripts.

    Monday, November 25, 2024

    PeopleSoft PS/Query: Identify Long Running Queries (on Process Schedulers)

    This is the first of a series in which I will share some of my PeopleSoft scripts, and explain how they work.
    Many PeopleSoft users like its ad hoc query tool because they can write their own queries directly on the system, without having to learn to write structured query language (SQL), or getting a developer to write it for them.  

    What is the Problem?

    This tool is disliked and even feared by database administrators (DBAs) and system administrators, because it is easy for users to create poor queries, that either don't work as intended or can run for long periods, sometimes indefinitely, without even producing results.  This can consume significant amounts of CPU.

    Managing Queries Scheduled on the Process Scheduler

    The PSQUERY application engine program runs queries on the process scheduler. Users should be encouraged to use this rather than running them online.  
    Queries run online via the PeopleSoft Internet Architecture (PIA) cannot be managed.  
    • There is no limit to the number of queries that users can initiate concurrently.  
    • The number that can actually execute concurrently is limited by the number of PSQRYSRV processes in each application server domain.  Any additional requests will simply queue up in Tuxedo.
    • It is possible to set maximum execution times in the PeopleSoft configuration, on the ICQuery service on the PSQRYSRV server in the application server.  
    It is easier to manage and monitor the queries run in PSQUERY processes on the process scheduler.  They don't put any load on the PIA, but they put load on the database.
    • A system-wide maximum number of concurrently executing instances of the application engine program can be set on the process definition.
    • A maximum number of concurrently executing instances of the application engine program per process scheduler can be set (by using a process class).
    • The application engine, or its process class, can be given a lower priority so that other queued processes are run in preference.

    PS/Queries run either in the PIA or on the process scheduler can be mapped to low-priority consumer groups in an Oracle database resource manager plan so that they do not starve the rest of the system of CPU (see PeopleSoft DBA Blog: PSFT_PLAN: A Sample Oracle Database Resource Manager Plan for PeopleSoft).
    A maximum run time, or maximum estimated run time, can be defined for a consumer group.  If the limit is breached an Oracle error is raised: ORA-00040: active time limit exceeded - call aborted. In the PIA, the error message is simply presented to the user.  The scheduled PSQUERY application engine process will terminate and the error will be logged.  In both cases, the user has to recognise the error message and understand what it means.  Otherwise, they will raise the issue with support.
    The various methods of setting maximum execution time limits are quite blunt instruments.  They are essentially one-size-fits-all approaches.  Typically, some queries are expected to run for a long time, and then the limits must be set to accommodate them. 

    Queries Scheduled on the Process Scheduler

    I can query who has run which queries, and how long they ran for.  Simply outer join the run control record for the PSQUERY application engine (PS_QUERY_RUN_CNTL) to the process scheduler request table (PSPRCSRQST).

    In this case, I am interested in 
    the top 50 PS/Queries by cumulative execution
    with a cumulative execution time of over 5 minutes (300s)
    that were scheduled yesterday between 8am and 7pm
    REM qry_missingjoins.sql
    WITH x as (
    SELECT r.prcsinstance, r.oprid, r.runcntlid
    ,      DECODE(c.private_query_flag,'Y','Private','N','Public') private_query_flag, c.qryname
    ,      CAST(begindttm AS DATE) begindttm
    ,      CAST(enddttm AS DATE) enddttm
    ,      runstatus
    ,      (CAST(NVL(enddttm,SYSDATE) AS DATE)-CAST(begindttm AS DATE))*86400 exec_Secs
    FROM   psprcsrqst r
      LEFT OUTER JOIN ps_query_run_cntrl c ON c.oprid = r.oprid AND c.run_cntl_id = r.runcntlid
    WHERE  prcsname = 'PSQUERY'
    AND    r.begindttm >= TRUNC(SYSDATE)-0+8/24 /*from 8am*/
    AND    r.begindttm <= TRUNC(SYSDATE)-0+19/24 /*to 7pm*/
    )
    SELECT x.* FROM x
    WHERE  exec_Secs >= 300 /*Over 5 minutes*/
    ORDER BY exec_secs desc /*descending order of elapsed time*/
    FETCH FIRST 50 ROWS ONLY /*top 50 ROWS ONLY*/
    /
    I now have a profile of top queries that I can use to direct further investigation.
      Process                                           Private                                                                    Run     Exec
     Instance OPRID      RUNCNTLID                      Query   QRYNAME                        BEGINDTTM         ENDDTTM           Stat    Secs
    --------- ---------- ------------------------------ ------- ------------------------------ ----------------- ----------------- ---- -------
     12344471 F******    ***AM_FIN_GL_AP                Public  ***AM_FIN_GL_AP                10:06:21 19.**.** 19:08:52 19.**.** 8      32551
     12344342 N******    ownxxxxxxxxxxxx                Public  ***_TRIAL_BALANCE_BY_BU_***_V2 09:41:58 19.**.** 18:20:09 19.**.** 10     31091
     12344336 N******    ojnxxxxxxxxxx                  Public  ***_TRIAL_BALANCE_BY_BU_***    09:40:27 19.**.** 16:51:11 19.**.** 10     25844
     12345209 N******    eowxxxxxxxxxxxxx               Public  ***_TRIAL_BALANCE_BY_BU_***    12:41:17 19.**.** 19:08:30 19.**.** 8      23233
     12345213 N******    iwoxxxxxxxxxxxxx               Public  ***_TRIAL_BALANCE_BY_BU_***_V2 12:41:53 19.**.** 19:08:56 19.**.** 8      23223
     12345574 B******    gl                             Private ***_GL_BJU                     14:27:32 19.**.** 19:08:59 19.**.** 8      16887
     12345681 B******    gl                             Private ***_GL_BJU                     14:51:06 19.**.** 19:09:02 19.**.** 8      15476
     12345852 W******    insolvents                     Public  ***INSOLVENTS_JRNL_DETAIL      15:24:41 19.**.** 19:09:04 19.**.** 8      13463
    …
                                                                                                                                        -------
    sum                                                                                                                                  268112
    Notes: 
    • Some details have been redacted from this real-world example.
    • The result is not guaranteed to be completely accurate.  A user might have reused a run control record and can only get the current value.
    • This and other scripts can be downloaded from GitHub davidkurtz/psscripts.