Tuesday, June 24, 2025

Optimising Journal Line Queries: 2. Exadata System Statistics

This is the second of a series of five articles where I look at the challenges presented by typical queries on the journal line table (PS_JRNL_LN).

  1. Problem Statement
  2. Exadata System Statistics
  3. Partitioning
  4. Compression
  5. Conclusion

Exadata System Statistics 

Many other people have written notes about how Oracle's optimizer costs a full table scan.  This is a selection:

Roughly speaking, the cost calculated by the optimizer that we see in an execution plan is an estimate of the time taken to perform an operation, where the unit of time is the duration of a single block read.  Although that statement is an oversimplification.  There are various guesses and assumptions built into the optimizer's calculation.  The cost-based optimizer looks for the cheapest plan, that ought to be the fastest to execute.  However, in many cases, cost does not correspond to execution time.

Full Scan Cost

The cost of a full table scan is made up of an I/O cost (the time taken to read the blocks from disk) and a CPU cost (the time taken to process the rows).  The I/O cost is the number of multi-block read operations, multiplied by the ratio of the duration of a multi-block read to a single-block read.

  • IO Cost = (HWM / MBRC) . (MREADTIM / SREADTIM)

Where

  • HWM = the high water mark of the segment expressed as a number of blocks
  • MBRC = average multi-block read count for sequential read, in blocks (see parameter DB_FILE_MULTIBLOCK_READ_COUNT).
  • MREADTIME = average time to perform a multi-block read at once (sequential read), in milliseconds
  • SREADTIME = average time to read a single block (random read), in milliseconds

See PL/SQL Packages and Types Reference: DBMS_STATS

The single and multi-block read times are derived from two system statistics, the block size and the multi-block read count.

  • SREADTIM = IOSEEKTIM + DB_BLOCK_SIZE / IOTFRSPEED
  • MREADTIM = IOSEEKTIM + (DB_BLOCK_SIZE * MBRC) / IOTFRSPEED

Where

  • IOSEEKTIM = Seek time + latency time + operating system overhead time, in milliseconds (default 10ms).
  • IOTFRSPEED = I/O transfer speed in bytes per millisecond (or if you prefer KBytes/second)
  • DB_BLOCK_SIZE = block size of the segment (usually 8Kb)

System statistics can be gathered based on actual system behaviour using DBMS_STATS, or set to pre-defined values using DBMS_STATS.GATHER_SYSTEM_STATS.  Over the years many blogs, forums and presentations have discussed the merits or otherwise of collecting or setting system statistics.  

Oracle's position is set out in the Oracle Optimizer Blog: Should You Gather System Statistics?  It can be summarised as: 

  • Do not gather your own system statistics.
  • Use the Oracle-provided defaults.
  • Except on Exadata, where you can consider using the Exadata defaults, and perhaps not even then on a mixed workload. You will have to test this for yourself.

On any Oracle system, the default system statistics can be reset with 

exec DBMS_STATS.GATHER_SYSTEM_STATS('NOWORKLOAD');

This sets the system statistics as follows:

  • MBRC=8
  • IOSEEKTIM=10
  • IOTFRSPEED=10

Thus:

SREADTIM = IOSEEKTIM + DB_BLOCK_SIZE / IOTFRSPEED
         = 10 + 8192 / 4096
         = 12 (ms)

MREADTIM = IOSEEKTIM + (DB_BLOCK_SIZE * MBRC) / IOTFRSPEED
         = 10 + (8192 * 8 ) / 4096
         = 10 + 16 
         = 26 (ms)

However, on Exadata, you can set 'system statistics take into account the unique capabilities of Oracle Exadata, such as large I/O size and high I/O throughput

exec DBMS_STATS.GATHER_SYSTEM_STATS('EXADATA');

Some system statistics are then set differently:

  • MBRC=128
  • IOSEEKTIM=10
  • IOTFRSPEED=204800

Thus

SREADTIM = IOSEEKTIM + DB_BLOCK_SIZE / IOTFRSPEED
         = 10 + 8192 / 204800
         = 10.04 (ms)

MREADTIM = IOSEEKTIM + (DB_BLOCK_SIZE * MBRC) / IOTFRSPEED
         = 10 + (8192 . 128 ) / 204800
         = 10 + 5.1200 
         = 15.12000 (ms)

Now, I can put these numbers back into the formula Oracle uses to calculate the I/O cost of a full scan.

  • IO Cost = (HWM / MBRC) . (MREADTIM / SREADTIM)

Let us suppose that we are going to read 100M blocks.  The I/O cost of that scan will be very different with Exadata system statistics rather than the normal default system statistics.

Normal IO Cost  = (100000000/8) . (26/12)
                = 27,083,333.3
 
Exadata IO Cost = (100000000/128) . (15.12/10.04)
                = 1,176,543.8

Thus, introducing Exadata system statistics significantly reduces the cost of the full scan, making the database more likely to use a full scan than index lookups.  That may or may not be desirable.

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name           | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                |   428K|    93M|       |  2834K  (8)| 00:01:51 |       |       |
|   1 |  HASH GROUP BY                 |                |   428K|    93M|   111M|  2834K  (8)| 00:01:51 |       |       |
|*  2 |   HASH JOIN                    |                |   428K|    93M|    46M|  2819K  (8)| 00:01:51 |       |       |
|   3 |    JOIN FILTER CREATE          | :BF0001        |   428K|    41M|       |  1476   (7)| 00:00:01 |       |       |
|   4 |     PART JOIN FILTER CREATE    | :BF0000        |   428K|    41M|       |  1476   (7)| 00:00:01 |       |       |
|*  5 |      TABLE ACCESS STORAGE FULL | PS_JRNL_HEADER |   428K|    41M|       |  1476   (7)| 00:00:01 |       |       |
|   6 |    JOIN FILTER USE             | :BF0001        |  1120K|   136M|       |  2817K  (8)| 00:01:51 |       |       |
|   7 |     PARTITION RANGE JOIN-FILTER|                |  1120K|   136M|       |  2817K  (8)| 00:01:51 |:BF0000|:BF0000|
|*  8 |      TABLE ACCESS STORAGE FULL | PS_JRNL_LN     |  1120K|   136M|       |  2817K  (8)| 00:01:51 |:BF0000|:BF0000|
-------------------------------------------------------------------------------------------------------------------------

If I look at the same example query that I used earlier, then with the Exadata default system statistics, the cost has come down significantly (from 66M to 2817K).  It is a significant improvement, but it is still greater than the cost of the nested loop (1730K).  Therefore, for this query, I still only get this execution plan if I hint the statement to force it.  I still need to make the full scan cheaper.

Different queries will have different costs and will flip between the nested loop and Full scan/Bloom filter/hash join at different points.

Non-Exadata System Statistics 

If you are not running on Exadata, then the advice from Oracle is clear and unambiguous: use the default system statistics that can be reset with 
exec DBMS_STATS.GATHER_SYSTEM_STATS('NOWORKLOAD');
You will have to consider other techniques to reduce the cost of the full table scan.

Monday, June 23, 2025

Optimising Journal Line Queries: 1. Problem Statement

In each PeopleSoft product, certain tables usually grow to become the largest in the implementations at most customers. The challenges they present and the options for dealing with them also are common to most systems.  Most PeopleSoft Financials systems use General Ledger.  In General Ledger, the ledger, summary ledger and journal line tables are usually the largest tables, and present the biggest challenges.

This is the first of five articles where I look at the challenges typically presented by queries on the journal line table (PS_JRNL_LN).

  1. Problem Statement
  2. Exadata System Statistics
  3. Partitioning
  4. Compression
  5. Conclusion

Problem Statement

In General Ledger, we typically see many queries on the ledger (or summary ledger) tables and also queries in the application, drill-down queries in nVision reporting, and ad-hoc PS/Queries that query details of journals posted to the ledger. Below is part of a typical query. The statement and execution plans below were taken from a PeopleSoft Financials system.  It is running on Oracle 19c on Exadata.  Making use of Exadata features will also be a topic.
SELECT A.FISCAL_YEAR, A.ACCOUNTING_PERIOD, B.BUSINESS_UNIT, B.JOURNAL_ID, TO_CHAR(B.JOURNAL_DATE,'YYYY-MM-DD'), B.LEDGER, B.ACCOUNT
, B.PRODUCT, B.PROJECT_ID, B.CHARTFIELD1, B.CHARTFIELD2, B.CURRENCY_CD, B.AFFILIATE, SUM( B.MONETARY_AMOUNT), B.FOREIGN_CURRENCY
, SUM( B.FOREIGN_AMOUNT), A.REVERSAL_CD, A.REVERSAL_ADJ_PER, A.JRNL_HDR_STATUS, TO_CHAR(A.POSTED_DATE,'YYYY-MM-DD'), A.OPRID, A.DESCR254
, B.DEPTID, A.SOURCE, B.ALTACCT, TO_CHAR(CAST((A.DTTM_STAMP_SEC) AS TIMESTAMP),'YYYY-MM-DD-HH24.MI.SS.FF'), B.LINE_DESCR 
FROM PS_JRNL_HEADER A, PS_JRNL_LN B 
WHERE (A.BUSINESS_UNIT = B.BUSINESS_UNIT 
AND A.JOURNAL_ID = B.JOURNAL_ID 
AND A.JOURNAL_DATE = B.JOURNAL_DATE 
AND A.UNPOST_SEQ = B.UNPOST_SEQ 
AND A.JRNL_HDR_STATUS IN('P','V','U') 
AND A.FISCAL_YEAR IN (2024) 
AND A.ACCOUNTING_PERIOD BETWEEN 1 AND 12 
AND B.CHARTFIELD1 IN ('1234567','1234568','1234569')
AND B.LEDGER IN ('LEDGER')) 
GROUP BY A.FISCAL_YEAR, A.ACCOUNTING_PERIOD, B.BUSINESS_UNIT, B.JOURNAL_ID, B.JOURNAL_DATE, B.LEDGER, B.ACCOUNT, B.PRODUCT
, B.PROJECT_ID, B.CHARTFIELD1, B.CHARTFIELD2, B.CURRENCY_CD, B.AFFILIATE, B.FOREIGN_CURRENCY, A.REVERSAL_CD, A.REVERSAL_ADJ_PER
, A.JRNL_HDR_STATUS, A.POSTED_DATE, A.OPRID, A.DESCR254, B.DEPTID, A.SOURCE, B.ALTACCT, A.DTTM_STAMP_SEC, B.LINE_DESCR
  • The journal line table (PS_JRNL_LN) is joined to its parent, the journal header table (PS_JRNL_HEADER), by the 4 key columns on the journal header (BUSINESS_UNIT, JOURNAL_ID, JOURNAL_DATE, UNPOST_SEQ).
  • There are criteria on both the journal header and line tables.
    • The number of journal line rows per header is usually highly variable, and it also varies from customer to customer depending on the shape of their data.  It is not unusual to see thousands of journal line rows per header row.  Filtering it by FISCAL_YEAR and perhaps also ACCOUNTING_PERIOD could be very effective.  However, these columns are on PS_JRNL_HEADER, and not on PS_JRNL_LN. 
    • Queries often include criteria on other attribute columns on PS_JRNL_LN.  However, these columns are not indexed by default, though many customers add such indexes.
Below is the execution plan that Oracle produced for this query.  The optimizer chose to full scan the PS_JRNL_HEADER table, expecting to find 428,000 rows, and then do an index look-up on PS_JRNL_LN for each header row.  The optimizer predicts that the query will run for about 68s.  In practice, this query runs for over an hour, spending most of its time on the index lookup of PS_JRNL_LN by its unique index of the same name.
------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name           | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                |   428K|    93M|       |  1730K  (1)| 00:01:08 |       |       |
|   1 |  HASH GROUP BY                      |                |   428K|    93M|   111M|  1730K  (1)| 00:01:08 |       |       |
|   2 |   NESTED LOOPS                      |                |   428K|    93M|       |  1715K  (1)| 00:01:07 |       |       |
|   3 |    NESTED LOOPS                     |                |   428K|    93M|       |  1715K  (1)| 00:01:07 |       |       |
|*  4 |     TABLE ACCESS STORAGE FULL       | PS_JRNL_HEADER |   428K|    41M|       |  1476   (7)| 00:00:01 |       |       |
|   5 |     PARTITION RANGE ITERATOR        |                |     1 |       |       |     3   (0)| 00:00:01 |   KEY |   KEY |
|*  6 |      INDEX RANGE SCAN               | PS_JRNL_LN     |     1 |       |       |     3   (0)| 00:00:01 |   KEY |   KEY |
|*  7 |    TABLE ACCESS BY LOCAL INDEX ROWID| PS_JRNL_LN     |     1 |   128 |       |     4   (0)| 00:00:01 |     1 |     1 |
------------------------------------------------------------------------------------------------------------------------------
If I add hints to the SQL statement forcing it to full scan PS_JRNL_HEADER and generate a Bloom filter that it will apply during a full scan of PS_JRNL_LN, then the cost of the execution plan goes up (from 1730K to 66M), and the estimated run time goes up to 43 minutes, due to the cost of the full scan.  However, the actual execution time comes down to under 3 minutes.
-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name           | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                |   428K|    93M|       |    66M  (1)| 00:43:30 |       |       |
|   1 |  HASH GROUP BY                 |                |   428K|    93M|   111M|    66M  (1)| 00:43:30 |       |       |
|*  2 |   HASH JOIN                    |                |   428K|    93M|    46M|    66M  (1)| 00:43:30 |       |       |
|   3 |    JOIN FILTER CREATE          | :BF0001        |   428K|    41M|       | 32501   (1)| 00:00:02 |       |       |
|   4 |     PART JOIN FILTER CREATE    | :BF0000        |   428K|    41M|       | 32501   (1)| 00:00:02 |       |       |
|*  5 |      TABLE ACCESS STORAGE FULL | PS_JRNL_HEADER |   428K|    41M|       | 32501   (1)| 00:00:02 |       |       |
|   6 |    JOIN FILTER USE             | :BF0001        |  1132K|   137M|       |    66M  (1)| 00:43:28 |       |       |
|   7 |     PARTITION RANGE JOIN-FILTER|                |  1132K|   137M|       |    66M  (1)| 00:43:28 |:BF0000|:BF0000|
|*  8 |      TABLE ACCESS STORAGE FULL | PS_JRNL_LN     |  1132K|   137M|       |    66M  (1)| 00:43:28 |:BF0000|:BF0000|
-------------------------------------------------------------------------------------------------------------------------
I could solve the problem by adding hints either directly to the statement or with a SQL profile or SQL patch.  However, this is one of many statements, and the users will continuously produce more with the ad-hoc PS/Query tool.  I need a generic solution.
The Cost-Based Optimizer chose the nested loop plan because it calculated that it was cheaper.  However, that does not correspond to the change in the actual execution time.  The plan that was executed more quickly cost more, indicating a problem with the cost model.
  • Why is the full scan so expensive?  
  • How can I make it cheaper?

Thursday, May 08, 2025

Logging Run Controls and Bind Variables for Scheduled PS/Queries

This blog proposes additional logging for scheduled PS/Queries so that long-running queries can be reconstructed and analysed.

Previous blog posts have discussed limiting PS/Query runtime with the resource manager (see Management of Long Running PS/Queries Cancelled by Resource Manager CPU Limit).  From 19c, on Engineered Systems only, the 'Oracle Database automatically quarantines the plans for SQL statements terminated by … the Resource Manager for exceeding resource limits'.  SQL Quarantine is enabled by default in Oracle 19c on Exadata (unless patch 30104721 is applied that backports the new 23c parameters, see Oracle Doc ID 2635030.1: 19c New Feature SQL Quarantine - How To Stop Automatic SQL Quarantine).

What is the Problem?

SQL Quarantine prevents a query from executing.  Therefore, AWR will not capture the execution plan.  AWR will also purge execution plans where an execution has not been captured within the AWR retention period.  The original long-running query execution that was quarantined, if captured by AWR, will be aged out because it will not execute again.

If we want to investigate PS/Queries that produced execution plans that exceeded the runtime limit and were then quarantined, we need to reproduce the execution plan, either with the EXPLAIN PLAN FOR command or by executing the query in a session where the limited resource manager consumer group does not apply.

However, PS/Queries with bind variables present a challenge.  A PS/Query run with different bind variables can produce different execution plans.  One execution plan might be quarantined and so never complete, while another may complete within an acceptable time.  

In AWR, a plan is only captured once for each statement.  Therefore, it is possible to find one set of bind variables for each plan, although there may be many sets of bind variables that all produce the same execution plan.  However, we cannot obtain Oracle bind variables for quarantined execution plans that did not execute.  To regenerate their execution plans, we need another way to obtain their bind variables.

This problem occurs more generally where the Diagnostics Pack is not available, then it is not possible to reconstruct long-running queries without additional logging or tracing.

Solution

Scheduled PS/Queries are executed by the PSQUERY application engine.  The name of the query and the bind variables are passed via two run control records.  Users typically reuse an existing run control but provide different bind variable values.  I propose to introduce two tables to hold a copy of the data in these tables for each process instance.
  • PS_QUERY_RUN_CNTRL: Scheduled Query Run Control.  This record identifies the query executed.  Rows in this table will be copied to PS_QRYRUN_CTL_HST.
  • PS_QUERY_RUN_PARM: Scheduled Query Run Parameters.  This record holds the bind variables and the values passed to the query.  The table contains a row for each bind variable for each execution.  Rows in this table will be copied to PS_QRYRUN_PARM_HST

Two database triggers manage the history tables:

  • A database trigger that fires when the run status of the request is updated to '7' (processing).  It copies rows for the current run control into two corresponding history tables.  Thus, we will have a log of every bind variable for every scheduled query.
  • A second database trigger will fire when a PSQUERY request record is deleted.  It deletes the corresponding rows from these history tables.

When a PS/Query produces a quarantined execution plan, the PSQUERY process terminates with error ORA-56955: quarantined plan used (see Quarantined SQL Plans for PS/Queries).  Now we can obtain the bind variables that resulted in attempts to execute a quarantined query execution plan.

Implementation

The following script (ps_query_run_cntrl_hist_trigger.sql) creates the tables and triggers.  

REM ps_query_run_cntrl_hist_trigger.sql
REM 21.4.2025 - trigger and history tables to capture 
set echo on serveroutput on timi on
clear screen
spool ps_query_run_cntrl_hist_trigger
rollback;

CREATE TABLE SYSADM.PS_QRYRUN_CTL_HST 
  (PRCSINSTANCE INTEGER  DEFAULT 0 NOT NULL,
   OPRID VARCHAR2(30)  DEFAULT ' ' NOT NULL,
   RUN_CNTL_ID VARCHAR2(30)  DEFAULT ' ' NOT NULL,
   DESCR VARCHAR2(30)  DEFAULT ' ' NOT NULL,
   QRYTYPE SMALLINT  DEFAULT 1 NOT NULL,
   PRIVATE_QUERY_FLAG VARCHAR2(1)  DEFAULT 'N' NOT NULL,
   QRYNAME VARCHAR2(30)  DEFAULT ' ' NOT NULL,
   URL VARCHAR2(254)  DEFAULT ' ' NOT NULL,
   ASIAN_FONT_SETTING VARCHAR2(3)  DEFAULT ' ' NOT NULL,
   PTFP_FEED_ID VARCHAR2(30)  DEFAULT ' ' NOT NULL) TABLESPACE PTTBL
/
CREATE UNIQUE  iNDEX SYSADM.PS_QRYRUN_CTL_HST 
ON SYSADM.PS_QRYRUN_CTL_HST (PRCSINSTANCE) TABLESPACE PSINDEX PARALLEL NOLOGGING
/
ALTER INDEX SYSADM.PS_QRYRUN_CTL_HST NOPARALLEL LOGGING
/
CREATE TABLE SYSADM.PS_QRYRUN_PARM_HST 
  (PRCSINSTANCE INTEGER  DEFAULT 0 NOT NULL,
   OPRID VARCHAR2(30)  DEFAULT ' ' NOT NULL,
   RUN_CNTL_ID VARCHAR2(30)  DEFAULT ' ' NOT NULL,
   BNDNUM SMALLINT  DEFAULT 0 NOT NULL,
   FIELDNAME VARCHAR2(18)  DEFAULT ' ' NOT NULL,
   BNDNAME VARCHAR2(30)  DEFAULT ' ' NOT NULL,
   BNDVALUE CLOB) TABLESPACE PSIMAGE2 
/
CREATE UNIQUE  iNDEX SYSADM.PS_QRYRUN_PARM_HST 
ON SYSADM.PS_QRYRUN_PARM_HST (PRCSINSTANCE, BNDNUM) TABLESPACE PSINDEX PARALLEL NOLOGGING
/
ALTER INDEX SYSADM.PS_QRYRUN_PARM_HST NOPARALLEL LOGGING
/
  • PSQUERY is not a restartable Application Engine program.  Therefore, there is no risk of duplicate inserts into the history tables.
  • The exception handlers in the triggers deliberately suppress any error, in case that causes the process scheduler to crash.
CREATE OR REPLACE TRIGGER sysadm.query_run_cntrl_hist_ins
BEFORE UPDATE OF runstatus ON sysadm.psprcsrqst
FOR EACH ROW
WHEN (new.runstatus ='7' AND old.runstatus != '7' AND new.prcsname = 'PSQUERY' AND new.prcstype = 'Application Engine')
BEGIN
  INSERT INTO PS_QRYRUN_CTL_HST 
  (PRCSINSTANCE, OPRID, RUN_CNTL_ID, DESCR ,QRYTYPE, PRIVATE_QUERY_FLAG, QRYNAME, URL, ASIAN_FONT_SETTING, PTFP_FEED_ID)
  SELECT :new.prcsinstance, OPRID, RUN_CNTL_ID, DESCR ,QRYTYPE, PRIVATE_QUERY_FLAG, QRYNAME, URL, ASIAN_FONT_SETTING, PTFP_FEED_ID 
  FROM ps_query_run_cntrl WHERE oprid = :new.oprid AND run_cntl_id = :new.runcntlid;
  
  INSERT INTO PS_QRYRUN_PARM_HST
  (PRCSINSTANCE, OPRID, RUN_CNTL_ID, BNDNUM, FIELDNAME, BNDNAME, BNDVALUE) 
  SELECT :new.prcsinstance prcsinstance, OPRID, RUN_CNTL_ID, BNDNUM, FIELDNAME, BNDNAME, BNDVALUE
  FROM ps_query_run_parm WHERE oprid = :new.oprid AND run_cntl_id = :new.runcntlid;

  EXCEPTION WHEN OTHERS THEN NULL; --exception deliberately coded to suppress all exceptions 
END;
/

CREATE OR REPLACE TRIGGER sysadm.query_run_cntrl_hist_del
BEFORE DELETE ON sysadm.psprcsrqst
FOR EACH ROW
WHEN (old.prcsname = 'PSQUERY' AND old.prcstype = 'Application Engine')
BEGIN
  DELETE FROM PS_QRYRUN_CTL_HST WHERE prcsinstance = :old.prcsinstance;
  DELETE FROM PS_QRYRUN_PARM_HST WHERE prcsinstance = :old.prcsinstance;

  EXCEPTION WHEN OTHERS THEN NULL; --exception deliberately coded to suppress all exceptions
END;
/ 
show errors

spool off

Example

When a query is scheduled to run on the process scheduler, the bind variables are specified through this generic dialogue.

Scheduled Query Bind Variable Diaglogue

Once the PSQUERY process has started (it immediately commits its update to RUNSTATUS), these values are written to the new history tables.

select * from ps_qryrun_ctl_hst;

PRCSINSTANCE OPRID    RUN_CNTL_ID  DESCR                             QRYTYPE P QRYNAME                        URL                                                ASI PTFP_FEED_ID                  
------------ -------- ------------ ------------------------------ ---------- - ------------------------------ -------------------------------------------------- --- ------------------------------
    12345678 ABCDEF   042225       Journal Line Detail - Account           1 N XXX_JRNL_LINE_DTL_ACCT         https://xxxxxxx.yyyyy.com/psp/XXXXXXX/EMPLOYEE/ERP                                   

select * from ps_qryrun_ctl_hst;

PRCSINSTANCE OPRID    RUN_CNTL_ID  BNDNUM FIELDNAME          BNDNAME              BNDVALUE                      
------------ -------- ------------ ------ ------------------ -------------------- ------------------------------
    12345678 ABCDEF   042225            1 bind1              BUSINESS_UNIT        354XX 
    12345678 ABCDEF   042225            2 bind2              BUSINESS_UNIT        354XX 
    12345678 ABCDEF   042225            3 FISCAL_YEAR        FISCAL_YEAR          2025 
    12345678 ABCDEF   042225            4 ACCOUNTING_PD_FROM ACCOUNTING_PD_FROM   2 
    12345678 ABCDEF   042225            5 ACCOUNTING_PD_TO   ACCOUNTING_PD_TO     2 
    12345678 ABCDEF   042225            6 bind6              ACCOUNT              23882XXXXX 
    12345678 ABCDEF   042225            7 bind7              ACCOUNT              23882XXXXX 
    12345678 ABCDEF   042225            8 bind8              ALTACCOUNT           23882XXXXX 
    12345678 ABCDEF   042225            9 bind9              ALTACCOUNT           23882XXXXX

Conclusion

If the query is quarantined, PSQUERY will terminate with error ORA-56955: quarantined plan used. The SQL statement can be extracted from the message log, and the execution plan can be generated with the EXPLAIN PLAN FOR command, using the bind variable values captured in the history tables.

Note: The signature of the SQL Quarantine directive is the exact matching signature of the SQL text (it can be generated from the SQL text with dbms_sqltune.sqltext_to_signature).  There can be multiple PLAN_HASH_VALUEs for the same signature (because there can be multiple execution plans for the same SQL). Verify that the FULL_PLAN_HASH_VALUE of the execution plan generated with the captured bind variables corresponds to the PLAN_HASH_VALUE of a SQL Quarantine directive.

Thursday, March 06, 2025

A Resource Manager CPU Time Limit for PS/Queries Executed On-line in the PIA

 In previous posts, I have proposed:

This blog looks at whether a limit could also be set for PS/Queries run via the PeopleSoft Internet Architecture (PIA).

The main objection to having the database terminate a PS/Query running in the PIA is that the resulting Oracle error message will be displayed in the browser without any further explanation.  Therefore, I think it is better to allow the PIA to handle termination of the query.  However, I also think that it would be prudent to prevent queries continuing to run in background after the client session has been terminated.

Timeout Settings for PS/Query

The inactivity timeout in the PIA, is delivered at 20 minutes (1200 seconds), with a warning 2 minutes earlier.  Then the user's session in the PIA is terminated.

There are timeouts on every Tuxedo service.  In the PeopleSoft Application Server configuration, the service timeout is specified for each type of application server/queue.  

…
[PSQRYSRV]
;=========================================================================
; Settings for PSQRYSRV
;=========================================================================

;-------------------------------------------------------------------------
; UBBGEN settings
Min Instances=3
Max Instances=3
Service Timeout=1200
All the services advertised on the queue are given that timeout.  
…
ICQuery         SRVGRP=APPSRV
                LOAD=50 PRIO=50
{QUERYSRV}
                SVCTIMEOUT={$PSQRYSRV\Service Timeout}
{QUERYSRV}
{!QUERYSRV}
                SVCTIMEOUT={$PSAPPSRV\Service Timeout}
{!QUERYSRV}
                BUFTYPE="ALL"
…

PS/Queries are run by the ICQuery service that is advertised on PSQRYSRV server if configured, and whose service timeout is also delivered set to 1200s.  Otherwise, it is advertised on PSAPPSRV whose timeout is 300s.  

Recommendations
  • PSQRYSRV should always be configured if PS/Queries are to be run online in an application server domain.  Partly, so that it has the longer timeout, and partly so that long running PS/Queries do not block short on-line requests.
  • Please avoid the temptation to increase either of these timeouts.  If a query needs to run for more than 20 minutes, then it should be run on the process scheduler.

When the service timeout is reached, the Tuxedo server process will terminate.  This may not terminate the query on the database until the current fetch operation completes.  If a query involves a large group or sort operation, it can be a long time before the first fetch returns.

Oracle Terminated Connection Timeout (sometimes known as Dead Connect Detection) should be configured by setting SQLNET.EXPIRE_TIME in the SQLNET.ORA file.  Then database shadow process periodically sends a probe to the otherwise idle client, and if the client doesn't respond the session terminates.  However, this process is not perfect.

Therefore, it would be prudent to set a timeout in the consumer group for online PS/Query sessions.  In PSFT_PLAN, that is the PSQUERY_ONLINE_GROUP consumer group.  We don't want the timeout to terminate the session before either ICQuery service times out, nor do we want the query to run on afterwards.  Therefore, the consumer group timeout should be set to the same value as the PSQRYSRV timeout, so also 1200 seconds.

  DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
    'PSFT_PLAN', 'PSQUERY_ONLINE_GROUP'
    ,mgmt_p6 => 90
    ,switch_group => 'CANCEL_SQL'
    ,switch_time => 1200
    ,switch_for_call => TRUE
  );

If you change the default inactivity timeout, these three settings should all be set to the same value.

If you are not running on Exadata, then it is safe to set this timeout without any further configuration.

Disabling SQL Quarantine on Exadata for PS/Query While Using it Elsewhere

However, on Exadata on 19c, the database will automatically create SQL quarantine directives for statements that exceed the CPU time limit specified in the consumer group and are terminated with error ORA-00040: active time limit exceeded - call aborted. It may take a few minutes for the database to create the quarantine directive - see 2634990.1: 19c New Feature SQL Quarantine Not Work.  Then, the next time the same query generates the same execution plan, it will immediately be terminated with error ORA-56955: quarantined plan used.  Again, we don't want such SQL errors produced in the PIA.

In Oracle 21c, two new parameters have been introduced to control SQL Quarantine.

  • OPTIMIZER_CAPTURE_SQL_QUARANTINE enables or disables the automatic creation of SQL Quarantine configurations.  The default value is FALSE.  
  • OPTIMIZER_USE_SQL_QUARANTINE determines whether the optimizer considers SQL Quarantine configurations when choosing an execution plan for a SQL statement.  The default value is TRUE.

The parameters can be backported to Oracle 19.3 or later by applying patch 30104721 (see Oracle Doc ID 2635030.1: 19c New Feature SQL Quarantine - How To Stop Automatic SQL Quarantine, and Go-Faster Oracle Blog: New Parameters In 21c To Control Automatic SQL Quarantine Can Be Backported To 19c).  Both these parameters can be set at system and session level.

If you want to take advantage of SQL Quarantine, you have to enable it.  However, I suggest leaving the new parameters at their default values at system level, even though this means a change of behaviour in 19c when the patch is applied.

  • OPTIMIZER_CAPTURE_SQL_QUARANTINE = FALSE
  • OPTIMIZER_USE_SQL_QUARANTINE = TRUE

Then set both parameters to TRUE at session level in the session for 

  • PSQUERY processes run on the process scheduler.
  • SQL*Plus and SQL Developer processes.

Ensure they are not set for PIA application server processes

  • PSAPPSRV
  • PSQRYSRV

CPU Timeouts in the Sample PSFT_PLAN Resource Plan

In my sample resource plan for PeopleSoft, three consumer groups now have timeouts.  SQL Quarantine works in conjunction with consumer groups that have CPU timeouts.  I want SQL Quarantine disabled in PSQUERY_ONLINE_GROUP, but enabled in PSQUERY_BATCH_GROUP and LOW_LIMITED_GROUP.

  DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
    'PSFT_PLAN', 'PSQUERY_ONLINE_GROUP'
    ,mgmt_p6 => 90
    ,switch_group => 'CANCEL_SQL'
    ,switch_time => 1200 /*same as ICQuery service timeout*/
    ,switch_elapsed_time => 1200
    ,switch_estimate => FALSE /*do not timeout on basis of estimated time*/
    ,switch_for_call => TRUE
  );
  DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
    'PSFT_PLAN', 'PSQUERY_BATCH_GROUP'
    ,mgmt_p6 => 1
    ,switch_group => 'CANCEL_SQL'
    ,switch_time => 14400
    ,switch_estimate => TRUE 
    ,switch_for_call => TRUE
  );
  DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
    'PSFT_PLAN', 'LOW_LIMITED_GROUP'
    ,mgmt_p8 => 1
    ,switch_group => 'CANCEL_SQL'
    ,switch_time => 7200
    ,switch_elapsed_time => 7200
    ,switch_estimate => TRUE 
    ,switch_for_call => TRUE
  );

The consumer groups are mapped to sessions by program name, module and action.

DBMS_RESOURCE_MANAGER.set_consumer_group_mapping (attribute => DBMS_RESOURCE_MANAGER.CLIENT_PROGRAM    , value => 'PSQRYSRV%'                 , consumer_group => 'PSQUERY_ONLINE_GROUP');
DBMS_RESOURCE_MANAGER.set_consumer_group_mapping (attribute => DBMS_RESOURCE_MANAGER.MODULE_NAME_ACTION, value => 'QUERY_MANAGER.QUERY_VIEWER', consumer_group => 'PSQUERY_ONLINE_GROUP');

DBMS_RESOURCE_MANAGER.set_consumer_group_mapping (attribute => DBMS_RESOURCE_MANAGER.MODULE_NAME       , value => 'PSQUERY'                   , consumer_group => 'PSQUERY_BATCH_GROUP');
DBMS_RESOURCE_MANAGER.set_consumer_group_mapping (attribute => DBMS_RESOURCE_MANAGER.MODULE_NAME       , value => 'PSAE.PSQUERY.%'            , consumer_group => 'PSQUERY_BATCH_GROUP');

DBMS_RESOURCE_MANAGER.set_consumer_group_mapping (attribute => DBMS_RESOURCE_MANAGER.CLIENT_PROGRAM    , value => 'SQL Developer%'            , consumer_group => 'LOW_LIMITED_GROUP');
DBMS_RESOURCE_MANAGER.set_consumer_group_mapping (attribute => DBMS_RESOURCE_MANAGER.CLIENT_PROGRAM    , value => 'sqlplus%'                  , consumer_group => 'LOW_LIMITED_GROUP');
DBMS_RESOURCE_MANAGER.set_consumer_group_mapping (attribute => DBMS_RESOURCE_MANAGER.CLIENT_PROGRAM    , value => 'Toad%'                     , consumer_group => 'LOW_LIMITED_GROUP');

Enabling SQL Quarantine for Scheduled PSQUERY Processes

Initialisation parameters can be set for processes run on the process scheduler using a trigger on the PSPRCSRQST table.  This technique was described in a previous blog post: Setting Oracle Session Parameters for Specific Process Scheduler Processes.  

When a PeopleSoft process is initiated on the process scheduler, the first thing it does is to update its RUNSTATUS on PSPRCSRQST to '7' indicating that it is processing.  The SET_PRCS_SESS_PARM_TRIG trigger fires on that transition.  It dynamically generates ALTER SESSION commands for the metadata that matches the current process.  

  • Script to create trigger set_prcs_sess_parm_trg.sql 
  • Example script to create metadata: set_prcs_sess_parm.sql.  It includes the following statement that will create metadata for PSQUERY to set the new SQL Quarantine parameters if they are available and if running on Exadata.

----------------------------------------------------------------------------------------------------
--Settings for SQL Quarantine in PSQuery on Exadata -- note SQL checks that parameter is valid
----------------------------------------------------------------------------------------------------
DELETE FROM sysadm.PS_PRCS_SESS_PARM where prcsname = 'PSQUERY' 
AND param_name IN('optimizer_capture_sql_quarantine','optimizer_use_sql_quarantine')
/
----------------------------------------------------------------------------------------------------
INSERT INTO sysadm.PS_PRCS_SESS_PARM (prcstype, prcsname, oprid, runcntlid, keyword, param_name, parmvalue)
with n as ( --returns a row on Exadata only
SELECT  COUNT(DISTINCT cell_name) num_exadata_cells
FROM    v$cell
HAVING  COUNT(DISTINCT cell_name)>0
), x (param_name, keyword, parmvalue) as ( --returns rows if parameters available
select  name, 'SET', 'TRUE' 
from    v$parameter, n
where   name IN('optimizer_capture_sql_quarantine','optimizer_use_sql_quarantine')
), y (prcstype, prcsname, oprid, runcntlid) as (
select  prcstype, prcsname, ' ', ' ' 
from    ps_prcsdefn
where   prcsname = 'PSQUERY'
)
select  y.prcstype, y.prcsname, y.oprid, y.runcntlid, x.keyword, x.param_name, x.parmvalue
from    x,y
/

Enabling/Disabling SQL Quarantine for Other Processes at Logon

This is done with a AFTER LOGON trigger and another metadata data table that is similar to the scheduled process trigger.  The mappings in the metadata in this script must match the mappings for the consumer groups where automatic SQL quarantine capture is required.

TL;DR

  • Set the timeout for the consumer group for PS/Query to be the same value as the ICQuery Tuxedo service timeout (usually also the same value as the PIA inactivity timeout).
  • SQL Quarantine is only available on Exadata.  On other platforms the query just runs to the CPU timeout, by which time the ICQuery service has already timed out.
  • If on Exadata, then 
    • Apply patch 30104721 on 19c to backport new 21c parameters.
    • Leave the quarantine parameters at their default values at system-level
    • Set both parameters to TRUE at session level in the session for 
        • PSQUERY processes run on the process scheduler.
        • SQL*Plus and SQL Developer processes.

Monday, February 24, 2025

Quarantined SQL Plans for PS/Queries

This follows on from my previous post, Management of Long Running PS/Queries Cancelled by Resource Manager.

From 19c, on Engineered Systems only (such as Exadata and Exadata Cloud Service) the 'Oracle Database automatically quarantines the plans for SQL statements terminated by … the Resource Manager for exceeding resource limits.

The Resource Manager can set a maximum estimated execution time for a SQL statement, for example, 20 minutes. If a statement execution exceeds this limit, then the Resource Manager terminates the statement. However, the statement may run repeatedly before being terminated, wasting 20 minutes of resources each time it is executed.

Starting in Oracle Database 19c, if a statement exceeds the specified resource limit, then the Resource Manager terminates the execution and “quarantines” the plan. To quarantine the plan means to put it on a blacklist of plans that the database will not execute. Note that the plan is quarantined, not the statement itself.'

[Oracle SQL Tuning Guide: 4.7 About Quarantined SQL Plans]

When an attempt is made to execute a quarantined execution plan an error is produced: ORA-56955: quarantined plan used.

Oracle does not log timed-out or quarantined queries.  On V$SQL and V$SQLSTAT, AVOIDED_EXECUTIONS records the number of times a SQL query has been prevented from running.  However, this will not stay in the library cache long on a PeopleSoft system, due to the continual parse of dynamically generated SQL statements.  As of Oracle 19.20, it is not recorded in AWR on DBA_HIST_SQLSTAT.  

If an error condition occurs during a PSQUERY process run on the process scheduler, the process terminates with an error.  The SQL statement and the error message are recorded in the Application Engine message log.  As demonstrated in the previous blog, we can detect such failures by inspecting the message log of the PSQUERY process that did not finish successfully (ie. it has an end time, but the run status does not indicate success).

Matching Quarantine Directives to Cancelled Queries

Quarantine directives are visible via DBA_SQL_QUARANTINE, including SQL text and execution plan hash value.

It would be useful to know which quarantine directive relates to which query.  However, it is not easy to match the SQL in the PeopleSoft message log entries with that in the quarantine entries. The SQL text in the message log can have multiple spaces. These are stripped out in the DBA_SQL_QUARANTINE view where the normalised SQL statement is visible.  

The timestamp of creation and last execution of the quarantine directive is stored on it, but matching these to when the query was running can result in false positives.

Also, you cannot tell which quarantine directive was created by which consumer group.  The maximum CPU timeout is recorded on DBA_SQL_QUARANTINE.  In my example, it is only possible to distinguish the originating consumer group because the two consumer groups happen to have different timeouts.

A method that matches exactly, but only returns partial rows is to:

  • Obtain ASH data for queries terminated by the resource manager.  It can be matched by timestamp, MODULE, and ACTION (provided that EnableAEMonitoring is enabled).
    • Profile the ASH to find the statement that took the longest during each PSQUERY process, and that is almost certain to be the SQL query.  Thus obtaining the SQL_ID, SQL Plan Hash Value and consumer group ID.  It is also possible to determine the total database time for the query, and the database time spent on CPU.
    • The consumer group name can then be obtained from DBA_HIST_RSRC_CONSUMER_GROUP
  • Obtain the SQL text for the long-running query.  It would also have to be captured by an AWR snapshot.  This does often occur because it was a long-running SQL, but it is not certain.
  • The signature for the SQL statement (not the force-matching signature) can be derived using the SQLTEXT_TO_SIGNATURE function in DBMS_SQLTUNE.  This can be matched to the signature recorded in DBA_SQL_QUARANTINE.
  • You can have multiple quarantine directives for the same signature (i.e. the same SQL statement), each with a different plan hash value.  
    • NB: The plan hash value on DBA_SQL_QUARANTINE is the adaptive plan hash value (with all of its possible plan alternatives), and therefore it matches SQL_FULL_PLAN_HASH_VALUE in the ASH data, and not SQL_PLAN_HASH_VALUE (the plan that actually executed).
Note that
  • When a query executes until timed-out, producing ORA-00040, you usually can find the SQL statement in the AWR repository and so generate the signature to exactly match the quarantine record.
  • When an attempt is made to run a quarantined statement and execution plan, you usually cannot find the SQL statement because it hasn't run for long enough to produce an ASH sample.  Even when it has, you also have to rely on the statement having been captured previously by AWR.  Those conditions only come together occasionally.
This matching process is done by this query: message_log_checker-psquery2.sql.  Below is a sample output.  
  • We can see the quarantine directives that were created when the resource manager cancelled a query, raising error ORA-00040: active time limit exceeded - call aborted.  
  • However, where quarantine directives have prevented SQL from executing, raising error ORA-56955: quarantined plan used, the ASH data from the event that originally created the directive has since been purged, so we cannot use it to match directives.
Mon Feb 24                                                                                                                                                           page    1
                                                     PS/Queries terminated by Resource Manager/quarantined Execution Plan

                                                          Public/                                                              ASH
                                                          Private                                Ru Oracle      Exec    ASH    CPU Message Log
     P.I. DBNAME  OPRID    RUNCNTLID                      Query   QRYNAME                        St Err. #      Secs   Secs   Secs Date/Time Stamp              SQL_ID
--------- ------- -------- ------------------------------ ------- ------------------------------ -- --------- ------ ------ ------ ---------------------------- -------------
   SQL Plan   Full Plan                                                                                      CPU
 Hash Value  Hash Value Consumer Group Name                   SIGNATURE Quarantine Name                      Time  Quarantine Created           Quarantine Last Executed
----------- ----------- ------------------------- --------------------- ------------------------------------ ----- ---------------------------- ----------------------------
…
 31452465 FSPROD  USR0001  GBR_JRNL_LN_DETAIL_ACCT        Public  GBR_JRNL_LN_DETAIL_ACCT        10 ORA-56955     36     10     10 20-FEB-25 06.28.03.578218 PM 0wm9g6xkys12h
 4009529842   653370716 PSQUERY_BATCH_GROUP         5584654620166156419

 31451318 FSPROD  USR0002  GBR_JRNL_LN_DETAIL_ACCT        Public  GBR_JRNL_LN_DETAIL_ACCT        10 ORA-56955     36               20-FEB-25 02.36.38.590841 PM


 31451292 FSPROD  USR0002  GBR_JRNL_LN_DETAIL_ACCT        Public  GBR_JRNL_LN_DETAIL_ACCT        10 ORA-56955     36               20-FEB-25 02.30.51.777351 PM


 31438602 FSPROD  USR0003  1                              Private DK_GBR_GL_DETAIL_NEW           10 ORA-00040  28316  28275  14203 18-FEB-25 11.39.19.502412 PM 5qrbrf775whky
 3446094907  3491308607 PSQUERY_BATCH_GROUP        16266909742923016361 SQL_QUARANTINE_f3gxc76u48u59d019243f 14400 18-FEB-25 11.49.33.091081 PM

 31437925 FSPROD  USR0004  16                             Private TB_TEST2                       10 ORA-00040  17684  17654  17541 18-FEB-25 06.09.14.060615 PM 06xqrgj18wp05
 4256462904  2062199471 PSQUERY_BATCH_GROUP         6341132966559464532 SQL_QUARANTINE_5h01uuscnrg2n7aeaaaaf 14400 18-FEB-25 06.17.20.679769 PM

 31437907 FSPROD  USR0004  16                             Private TB_TEST2                       10 ORA-00040  17694  17695  17592 18-FEB-25 06.04.05.942470 PM 4yurn75y2p0t2
 3232504707   121066138 PSQUERY_BATCH_GROUP         4966087806133732884 SQL_QUARANTINE_49usqjjc001hn0737529a 14400 18-FEB-25 06.17.24.869185 PM
 …