This follows on from my previous post, Management of Long Running PS/Queries Cancelled by Resource Manager.
From 19c, on Engineered Systems only (such as Exadata and Exadata Cloud Service) the 'Oracle Database automatically quarantines the plans for SQL statements terminated by … the Resource Manager for exceeding resource limits.
The Resource Manager can set a maximum estimated execution time for a SQL statement, for example, 20 minutes. If a statement execution exceeds this limit, then the Resource Manager terminates the statement. However, the statement may run repeatedly before being terminated, wasting 20 minutes of resources each time it is executed.
Starting in Oracle Database 19c, if a statement exceeds the specified resource limit, then the Resource Manager terminates the execution and “quarantines” the plan. To quarantine the plan means to put it on a blacklist of plans that the database will not execute. Note that the plan is quarantined, not the statement itself.'
[Oracle SQL Tuning Guide: 4.7 About Quarantined SQL Plans]
- See also Tim Hall's article: SQL Quarantine in Oracle Database 19c
When an attempt is made to execute a quarantined execution plan an error is produced: ORA-56955: quarantined plan used.
Oracle does not log timed-out or quarantined queries. On V$SQL and V$SQLSTAT, AVOIDED_EXECUTIONS records the number of times a SQL query has been prevented from running. However, this will not stay in the library cache long on a PeopleSoft system, due to the continual parse of dynamically generated SQL statements. As of Oracle 19.20, it is not recorded in AWR on DBA_HIST_SQLSTAT.
If an error condition occurs during a PSQUERY process run on the process scheduler, the process terminates with an error. The SQL statement and the error message are recorded in the Application Engine message log. As demonstrated in the previous blog, we can detect such failures by inspecting the message log of the PSQUERY process that did not finish successfully (ie. it has an end time, but the run status does not indicate success).
Matching Quarantine Directives to Cancelled Queries
Quarantine directives are visible via DBA_SQL_QUARANTINE, including SQL text and execution plan hash value.
It would be useful to know which quarantine directive relates to which query. However, it is not easy to match the SQL in the PeopleSoft message log entries with that in the quarantine entries. The SQL text in the message log can have multiple spaces. These are stripped out in the DBA_SQL_QUARANTINE view where the normalised SQL statement is visible.
The timestamp of creation and last execution of the quarantine directive is stored on it, but matching these to when the query was running can result in false positives.
Also, you cannot tell which quarantine directive was created by which consumer group. The maximum CPU timeout is recorded on DBA_SQL_QUARANTINE. In my example, it is only possible to distinguish the originating consumer group because the two consumer groups happen to have different timeouts.
A method that matches exactly, but only returns partial rows is to:
- Obtain ASH data for queries terminated by the resource manager. It can be matched by timestamp, MODULE, and ACTION (provided that EnableAEMonitoring is enabled).
- Profile the ASH to find the statement that took the longest during each PSQUERY process, and that is almost certain to be the SQL query. Thus obtaining the SQL_ID, SQL Plan Hash Value and consumer group ID. It is also possible to determine the total database time for the query, and the database time spent on CPU.
- The consumer group name can then be obtained from DBA_HIST_RSRC_CONSUMER_GROUP
- Obtain the SQL text for the long-running query. It would also have to be captured by an AWR snapshot. This does often occur because it was a long-running SQL, but it is not certain.
- The signature for the SQL statement (not the force-matching signature) can be derived using the SQLTEXT_TO_SIGNATURE function in DBMS_SQLTUNE. This can be matched to the signature recorded in DBA_SQL_QUARANTINE.
- You can have multiple quarantine directives for the same signature (i.e. the same SQL statement), each with a different plan hash value.
- NB: The plan hash value on DBA_SQL_QUARANTINE is the adaptive plan hash value (with all of its possible plan alternatives), and therefore it matches SQL_FULL_PLAN_HASH_VALUE in the ASH data, and not SQL_PLAN_HASH_VALUE (the plan that actually executed).
- When a query executes until timed-out, producing ORA-00040, you usually can find the SQL statement in the AWR repository and so generate the signature to exactly match the quarantine record.
- When an attempt is made to run a quarantined statement and execution plan, you usually cannot find the SQL statement because it hasn't run for long enough to produce an ASH sample. Even when it has, you also have to rely on the statement having been captured previously by AWR. Those conditions only come together occasionally.
- We can see the quarantine directives that were created when the resource manager cancelled a query, raising error ORA-00040: active time limit exceeded - call aborted.
- However, where quarantine directives have prevented SQL from executing, raising error ORA-56955: quarantined plan used, the ASH data from the event that originally created the directive has since been purged, so we cannot use it to match directives.
Mon Feb 24 page 1
PS/Queries terminated by Resource Manager/quarantined Execution Plan
Public/ ASH
Private Ru Oracle Exec ASH CPU Message Log
P.I. DBNAME OPRID RUNCNTLID Query QRYNAME St Err. # Secs Secs Secs Date/Time Stamp SQL_ID
--------- ------- -------- ------------------------------ ------- ------------------------------ -- --------- ------ ------ ------ ---------------------------- -------------
SQL Plan Full Plan CPU
Hash Value Hash Value Consumer Group Name SIGNATURE Quarantine Name Time Quarantine Created Quarantine Last Executed
----------- ----------- ------------------------- --------------------- ------------------------------------ ----- ---------------------------- ----------------------------
…
31452465 FSPROD USR0001 GBR_JRNL_LN_DETAIL_ACCT Public GBR_JRNL_LN_DETAIL_ACCT 10 ORA-56955 36 10 10 20-FEB-25 06.28.03.578218 PM 0wm9g6xkys12h
4009529842 653370716 PSQUERY_BATCH_GROUP 5584654620166156419
31451318 FSPROD USR0002 GBR_JRNL_LN_DETAIL_ACCT Public GBR_JRNL_LN_DETAIL_ACCT 10 ORA-56955 36 20-FEB-25 02.36.38.590841 PM
31451292 FSPROD USR0002 GBR_JRNL_LN_DETAIL_ACCT Public GBR_JRNL_LN_DETAIL_ACCT 10 ORA-56955 36 20-FEB-25 02.30.51.777351 PM
31438602 FSPROD USR0003 1 Private DK_GBR_GL_DETAIL_NEW 10 ORA-00040 28316 28275 14203 18-FEB-25 11.39.19.502412 PM 5qrbrf775whky
3446094907 3491308607 PSQUERY_BATCH_GROUP 16266909742923016361 SQL_QUARANTINE_f3gxc76u48u59d019243f 14400 18-FEB-25 11.49.33.091081 PM
31437925 FSPROD USR0004 16 Private TB_TEST2 10 ORA-00040 17684 17654 17541 18-FEB-25 06.09.14.060615 PM 06xqrgj18wp05
4256462904 2062199471 PSQUERY_BATCH_GROUP 6341132966559464532 SQL_QUARANTINE_5h01uuscnrg2n7aeaaaaf 14400 18-FEB-25 06.17.20.679769 PM
31437907 FSPROD USR0004 16 Private TB_TEST2 10 ORA-00040 17694 17695 17592 18-FEB-25 06.04.05.942470 PM 4yurn75y2p0t2
3232504707 121066138 PSQUERY_BATCH_GROUP 4966087806133732884 SQL_QUARANTINE_49usqjjc001hn0737529a 14400 18-FEB-25 06.17.24.869185 PM
…
No comments :
Post a Comment