Tuesday, November 26, 2024

PeopleSoft PS/Query: Finding Users' Cartesian Joins

Many PeopleSoft users like its ad hoc query tool because they can write their own queries directly on the system, without having to learn to write structured query language (SQL), or getting a developer to write it for them.   

What is the Problem?

It is easy for users to create poor queries, that either don't work as intended or can run for long periods, even indefinitely, consuming resources without ever producing results.  This can consume significant amounts of CPU, and in the cloud, that is mostly what you pay for!  The effect can be mitigated with the database's resource manager, but it is better not to do it in the first place.

One cause of long-running queries that I come across is missing join criteria leading the database to perform Cartesian Merge Joins.  I should stress that not all Cartesian joins are evil.  For example, in some data warehouse queries (e.g. GL nVision reporting), it can be a very effective strategy to Cartesian join dimension tables before visiting the fact table, especially if you can use Bloom filter a full scan on the fact table.  It works well with parallel query, and on engineered systems this can also be pushed down to the storage cells.

Finding Execution Plans with Cartesian Joins

The following query profiles database time by execution plan from ASH for SQL statements from PS/Queries run via the PSQUERY application engine program on a process scheduler.  It returns the longest-running statement for each execution plan.

The data is generated and processed through several common table expressions.

  • R returns the PSQUERY processes that ran in the time window of interest
  • P returns the execution plans captured by AWR that generate Cartesian products for which the SQL text is also captured.  
  • X returns the ASH data for Cartesian join executions. When P is joined with the ASH data, then we just get the queries that performed Cartesian joins.
  • Y sums and groups the ASH data by statement and process
  • Z sums the data by execution plan and identifies the longest-running SQL statement for that plan.

REM qry_cartesianplans.sql
WITH r as ( /*processes of interest*/
SELECT /*+MATERIALIZE*/ r.oprid, r.prcsinstance, r.prcsname, r.begindttm, r.enddttm
,      DECODE(c.private_query_flag,'Y','Private','N','Public') private_query_flag, c.qryname
FROM   psprcsrqst r
       LEFT OUTER JOIN ps_query_run_cntrl c ON c.oprid = r.oprid AND c.run_cntl_id = r.runcntlid
WHERE prcsname = 'PSQUERY'
AND r.begindttm >= trunc(SYSDATE)-0+8/24
AND r.begindttm <= trunc(SYSDATE)-0+19/24
), p as ( /*known Cartesian plans with SQL text*/
SELECT /*+MATERIALIZE*/ p.plan_hash_value, MAX(p.options) options
FROM   dbA_hist_sql_plan p
,      dba_hist_sqltext t
WHERE  t.sql_id = p.sql_id
AND    (p.id = 0 OR p.options = 'CARTESIAN')
GROUP BY p.plan_hash_Value
), x AS ( /*ASH for processes*/
SELECT /*+materialize leading(r x)*/  r.prcsinstance, r.oprid, r.private_query_flag, r.qryname
,      h.event, x.dbid, h.sample_id, h.sample_time, h.instance_number
,      CASE WHEN h.module IS NULL       THEN REGEXP_SUBSTR(h.program, '[^@]+',1,1)
            WHEN h.module LIKE 'PSAE.%' THEN REGEXP_SUBSTR(h.module, '[^.]+',1,2) 
            ELSE                             REGEXP_SUBSTR(h.module, '[^.@]+',1,1) 
       END AS module
,      h.action
,      NULLIF(h.top_level_sql_id, h.sql_id) top_level_sql_id
,      h.sql_id, h.sql_plan_hash_value, h.force_matching_signature, h.sql_exec_id
,      h.session_id, h.session_serial#, h.qc_instance_id, h.qc_Session_id, h.qc_Session_serial#
,      f.name, p.options
,      NVL(usecs_per_row,1e7) usecs_per_row
,      CASE WHEN p.plan_hash_value IS NOT NULL THEN NVL(usecs_per_row,1e7) ELSE 0 END usecs_per_row2
FROM   dba_hist_snapshot x
,      dba_hist_active_sess_history h
       LEFT OUTER JOIN p ON p.plan_hash_value = h.sql_plan_hash_value
       LEFT OUTER JOIN dba_sql_profiles f ON h.force_matching_signature = f.signature
,      r
,      sysadm.psprcsque q
WHERE  h.SNAP_id = X.SNAP_id
AND    h.dbid = x.dbid
AND    h.instance_number = x.instance_number
AND    x.end_interval_time >= r.begindttm
AND    x.begin_interval_time <= NVL(r.enddttm,SYSDATE)
AND    h.sample_time BETWEEN r.begindttm AND NVL(r.enddttm,SYSDATE)
AND    q.prcsinstance = r.prcsinstance
AND    (  (h.module = r.prcsname AND h.action like 'PI='||r.prcsinstance||':Processing')
       OR  h.module like 'PSAE.'||r.prcsname||'.'||q.sessionidnum)
), y as( /*profile time by statement/process*/
SELECT prcsinstance, oprid, private_query_flag, qryname, sql_plan_hash_value, sql_id, force_matching_signature, name
,      dbid, module, action, top_level_sql_id
,      count(distinct qc_session_id||qc_session_serial#||sql_id||sql_exec_id) execs
,      sum(usecs_per_row)/1e6 ash_Secs
,      sum(usecs_per_Row2)/1e6 awr_secs
,      avg(usecs_per_row)/1e6*count(distinct sample_time) elapsed_secs
,      count(distinct instance_number||session_id||session_serial#) num_procs
,      max(options) options
FROM   x 
GROUP BY prcsinstance, oprid, private_query_flag, qryname, sql_plan_hash_value, sql_id, force_matching_signature, name
,      dbid, module, action, top_level_sql_id, qc_instance_id, qc_session_id, qc_session_serial#
), z as ( /*find top statement per plan and sum across all executions*/
SELECT row_number() over (partition by force_matching_signature, sql_plan_hash_value order by awr_secs desc) plan_seq
,      prcsinstance, oprid, name, private_query_flag, NVL(qryname,action) qryname, options
,      sql_id, sql_plan_hash_Value, force_matching_signature
,      count(distinct sql_id) over (partition by force_matching_signature, sql_plan_hash_value) sql_ids
,      sum(execs) over (partition by force_matching_signature, sql_plan_hash_value) plan_execs
,      sum(ash_Secs) over (partition by force_matching_signature, sql_plan_hash_value) plan_ash_secs
,      sum(awr_Secs) over (partition by force_matching_signature, sql_plan_hash_value) plan_awr_secs
,      sum(elapsed_Secs) over (partition by force_matching_signature, sql_plan_hash_value) elap_secs
,      sum(num_procs) over (partition by force_matching_signature, sql_plan_hash_value) max_procs
FROM   y
)
Select z.*, z.plan_ash_secs/z.elap_secs eff_para
from   z
where  plan_seq = 1
and    sql_id is not null
and    plan_ash_secs >= 300
ORDER BY plan_ash_secs DESC
FETCH FIRST 50 ROWS ONLY
/
There are two SQL statements for the same private query. XXX_GL_BJU run by user BXXXXXX that exhibited a Cartesian join.

Plan    Plan
     Process                                          Private                                                           SQL Plan        Force Matching  SQL   Plan     ASH     AWR Elapsed  Max  Eff.
 #  Instance OPRID     NAME                           Query   QRYNAME                        OPTIONS   SQL_ID         Hash Value             Signature  IDs  Execs    Secs    Secs Seconds  Prc  Para
-- --------- --------- ------------------------------ ------- ------------------------------ --------- ------------- ----------- --------------------- ---- ------ ------- ------- ------- ---- -----
 1  12344342 NXXXXXX                                  Public  XXX_TRIAL_BALANCE_BY_BU_XXX_V2           c4zfcub2bnju8  2128864041   4468535744829993986    4      4  103473  103473  103473    4   1.0
 1  12344471 FXXXXXX                                  Public  XXXAM_FIN_GL_AP                          d8jnxzmgx20mq  4189069557  16033793374717384734    1      1   32599   32599   32599    1   1.0
 1  12344448 VXXXXXX                                  Private XXX1_LEDGERBAL1_UPRDAC_XXXX1             ftn7nz1xafh5z           0  15193759933860031914    2      2   20615   20615   20615    2   1.0
 1  12345574 BXXXXXX                                  Private XXX_GL_BJU                     CARTESIAN ab2v91h9zj3hv   603930234   4189289347608449750    1      1   16862   16862   16862    1   1.0
 1  12345681 BXXXXXX                                  Private XXX_GL_BJU                     CARTESIAN 05tphb379fu8j   603930234   6203431496815450503    1      1   15452   15452   15452    1   1.0
 1  12345852 WXXXXXX                                  Public  XXXINSOLVENTS_JRNL_DETAIL                51aw4ahxba0gq  3918624993  11145663850623390044    1      1   13435   13435   13435    1   1.0
 1  12345863 CXXXXXX                                  Public  XXX_COMMUTATIONS_JRNL_DTL                7q9kt75bh35dg           0  11985643849566057390    1      1   13283   13283   13283    1   1.0
 1  12344773 WXXXXXX                                  Private XXX_COMMUTATION_JRNL_DETAIL_2            361gck3w3mak7           0  18367721225324700858    1      2   12883   12883   12883    2   1.0
 1  12344682 DXXXXXX                                  Private COMBINED_JE_DETAIL_DV                    2gchgaf465ku5           0   5375582220398622005    1      1    9279    9279    9279    1   1.0
 1  12345618 DXXXXXX                                  Private COMBINED_JE_DETAIL_DV_NO_AFF             2q2faj9c6003u           0  15355473744647942117    1      1    5079    5079    5079    1   1.0
…
The SQL statement and execution plan can be extracted from AWR using DBMS_XPLAN.DISPLAY_WORKLOAD_REPSITORY.  

SELECT * FROM table(dbms_xplan.display_workload_repository('ab2v91h9zj3hv',603930234,'ADVANCED +ADAPTIVE'));

In this example, there are two similar SQL statements, with different force matching signatures, that produce the same execution plan.  The difference is that one has an IN list of 3 accounts, and the other has an equi-join to just one account.  This is enough to produce a different force matching signature.  This is why I often group ASH data by execution plan hash value.  Even if the SQL statement is different, if the execution plan is the same, then the issues and solutions tend to be the same.

The statements have been reformated to make them easier to read.  Both are just joins between two objects.  There are criteria on PS_JRNL_DRILL_VW (a view on PS_JRNL_LN), but there are no join criteria between it and its parent table JRNL_HEADER, thus a meaningless Cartesian product that joins every journal line to every journal header was created and sorted.

SQL_ID ab2v91h9zj3hv 
--------------------
SELECT A.BUSINESS_UNIT, A.JOURNAL_ID,
TO_CHAR(A.JOURNAL_DATE,'YYYY-MM-DD'), B.DESCR254, A.ACCOUNT,
A.LINE_DESCR, SUM( A.MONETARY_AMOUNT), A.LEDGER, B.ACCOUNTING_PERIOD,
B.SOURCE, B.OPRID, A.PRODUCT, A.CLASS_FLD, A.PROGRAM_CODE,
A.CHARTFIELD1, A.CHARTFIELD3, A.CURRENCY_CD, A.FOREIGN_CURRENCY 
FROM PS_JRNL_DRILL_VW A, PS_JRNL_HEADER B 
WHERE ( A.BUSINESS_UNIT IN('12341','12347') 
AND A.LEDGER IN ('CORE','LOCAL_ADJ','LOCAL_ADJ2') 
AND A.ACCOUNT IN ('1234510040','1234510000','1234510060') 
AND A.ACCOUNTING_PERIOD BETWEEN 1 AND 12 AND A.FISCAL_YEAR = 2023) 
GROUP BY A.BUSINESS_UNIT, A.JOURNAL_ID, A.JOURNAL_DATE, B.DESCR254, A.ACCOUNT,
A.LINE_DESCR, A.LEDGER, B.ACCOUNTING_PERIOD, B.SOURCE, B.OPRID,
A.PRODUCT, A.CLASS_FLD, A.PROGRAM_CODE, A.CHARTFIELD1, A.CHARTFIELD3,
A.CURRENCY_CD, A.FOREIGN_CURRENCY ORDER BY 11

SQL_ID 05tphb379fu8j
--------------------
SELECT A.BUSINESS_UNIT, A.JOURNAL_ID,
TO_CHAR(A.JOURNAL_DATE,'YYYY-MM-DD'), B.DESCR254, A.ACCOUNT,
A.LINE_DESCR, SUM( A.MONETARY_AMOUNT), A.LEDGER, B.ACCOUNTING_PERIOD,
B.SOURCE, B.OPRID, A.PRODUCT, A.CLASS_FLD, A.PROGRAM_CODE,
A.CHARTFIELD1, A.CHARTFIELD3, A.CURRENCY_CD, A.FOREIGN_CURRENCY 
FROM PS_JRNL_DRILL_VW A, PS_JRNL_HEADER B 
WHERE ( A.BUSINESS_UNIT IN('12341','12347') 
AND A.LEDGER IN ('CORE','LOCAL_ADJ','LOCAL_ADJ2') 
AND A.ACCOUNT = '1234510000' 
AND A.ACCOUNTING_PERIOD BETWEEN 1 AND 12 
AND A.FISCAL_YEAR = 2023) 
GROUP BY A.BUSINESS_UNIT, A.JOURNAL_ID, A.JOURNAL_DATE, B.DESCR254, A.ACCOUNT
, A.LINE_DESCR, A.LEDGER, B.ACCOUNTING_PERIOD, B.SOURCE, B.OPRID
, A.PRODUCT, A.CLASS_FLD,A.PROGRAM_CODE, A.CHARTFIELD1, A.CHARTFIELD3
, A.CURRENCY_CD, A.FOREIGN_CURRENCY ORDER BY 11
Line 2 of the execution plan reports a MERGE JOIN CARTESIAN operation that feeds into the SORT GROUP operation at line 1.

Plan hash value: 603930234

---------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                        | Name           | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop | Inst   |IN-OUT|
---------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                 |                |       |       | 84648 (100)|          |       |       |        |      |
|   1 |  SORT GROUP BY                                   |                |    57 |  9063 | 84648   (1)| 00:00:04 |       |       |        |      |
|   2 |   MERGE JOIN CARTESIAN                           |                |    57 |  9063 | 84647   (1)| 00:00:04 |       |       |        |      |
|   3 |    NESTED LOOPS                                  |                |     1 |   145 |  1636   (0)| 00:00:01 |       |       |        |      |
|   4 |     VIEW                                         | PS_JRNL_HEADER |   112 |  4032 |  1188   (0)| 00:00:01 |       |       |        |      |
|   5 |      UNION-ALL                                   |                |       |       |            |          |       |       |        |      |
|   6 |       REMOTE                                     | PS_JRNL_HEADER |    76 |  5624 |    18   (0)| 00:00:01 |       |       | FSARC~ | R->S |
|   7 |       INLIST ITERATOR                            |                |       |       |            |          |       |       |        |      |
|   8 |        TABLE ACCESS BY INDEX ROWID BATCHED       | PS_JRNL_HEADER | 16679 |   586K| 11634   (1)| 00:00:01 |       |       |        |      |
|*  9 |         INDEX RANGE SCAN                         | PSEJRNL_HEADER | 16679 |       |   347   (0)| 00:00:01 |       |       |        |      |
|  10 |     VIEW                                         | PS_JRNL_LN     |     1 |   109 |     4   (0)| 00:00:01 |       |       |        |      |
|  11 |      UNION-ALL PARTITION                         |                |       |       |            |          |       |       |        |      |
|* 12 |       FILTER                                     |                |       |       |            |          |       |       |        |      |
|  13 |        REMOTE                                    | PS_JRNL_LN     |     1 |   217 |     5   (0)| 00:00:01 |       |       | FSARC~ | R->S |
|* 14 |       FILTER                                     |                |       |       |            |          |       |       |        |      |
|  15 |        PARTITION RANGE SINGLE                    |                |     1 |   109 |     5   (0)| 00:00:01 |   KEY |   KEY |        |      |
|* 16 |         TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PS_JRNL_LN     |     1 |   109 |     5   (0)| 00:00:01 |   KEY |   KEY |        |      |
|* 17 |          INDEX RANGE SCAN                        | PS_JRNL_LN     |     1 |       |     4   (0)| 00:00:01 |   KEY |   KEY |        |      |
|  18 |    BUFFER SORT                                   |                |  7749K|   103M| 84644   (1)| 00:00:04 |       |       |        |      |
|  19 |     VIEW                                         | PS_JRNL_HEADER |  7749K|   103M| 83011   (1)| 00:00:04 |       |       |        |      |
|  20 |      UNION-ALL                                   |                |       |       |            |          |       |       |        |      |
|  21 |       REMOTE                                     | PS_JRNL_HEADER |  5698K|  1880M| 50467   (1)| 00:00:02 |       |       | FSARC~ | R->S |
|  22 |       TABLE ACCESS STORAGE FULL                  | PS_JRNL_HEADER |  2050K|    86M| 32544   (1)| 00:00:02 |       |       |        |      |
---------------------------------------------------------------------------------------------------------------------------------------------------
A profile of database time for that execution plan by event and plan line ID shows that most of the time is spent on line 1, sorting the output of the Cartesian product.
                                                                                                                    Stmt    Stmt
   SQL Plan SQL Plan                                                                  H   E I     ASH                ASH    Elap
 Hash Value  Line ID EVENT                                                            P P x M    Secs  ELAP_SECS    Secs    Secs
----------- -------- ---------------------------------------------------------------- - - - - ------- ---------- ------- -------
  603930234        1 CPU+CPU Wait                                                     N N Y N  217091 23405.3608  299088   32314
                  18 direct path read temp                                            N N Y N   64395 7034.44748  299088   32314
                  18 CPU+CPU Wait                                                     N N Y N   16998 1812.39445  299088   32314
                   1 ASM IO for non-blocking poll                                     N N Y N     195 20.4802032  299088   32314
                  21 CPU+CPU Wait                                                     N N Y N     195   20.47995  299088   32314
                  16 CPU+CPU Wait                                                     N N Y N     113   10.24021  299088   32314
                     CPU+CPU Wait                                                     N N Y N     103   10.25244  299088   32314
This query never finished because the Cartesian product was so large.  The time recorded was spent in two executions that were eventually cancelled by system operators.

The answer in this particular case is to fix the code.  We have to go back to the user, explain why it is necessary to join parent and child tables and get them to correct their PS/Query.

Finding PS/Queries Without Joins on Related Records

In PeopleSoft, the parent of a child record is recorded on PSRECDEFN in the column PARENTRECNAME.  However, this does not translate into a foreign key relationship in any database supported by PeopleSoft.  This is part of PeopleSoft's original platform-agnosticism.  Not all databases previously supported by PeopleSoft supported database enforced referential integrity.  Therefore it never became part of the implementation, and there is no guarantee that the applications were written in such a way to honour foreign-key constraints (i.e. insert parents before children, delete children before parents etc.).

The below query looks at pairs of parent-child records in each select block of each PS/Query and counts the number of key columns for which there are criteria on the child record that are joined to the parent record.  It is restricted to just the journal header/line tables and views.

It returns rows where no joined key columns are found.  These queries are therefore suspected of being faulty.  However, there may be false positives where child records are joined to grandparents rather than immediate parents.  Such an approach in SQL is perfectly valid, and can even result in better performance.  

WITH x as (
SELECT r1.oprid, r1.qryname, r1.selnum
, r1.rcdnum rcdnum1, r1.recname recname1, r1.corrname corrname1
, r2.rcdnum rcdnum2, r2.recname recname2, r2.corrname corrname2
, (SELECT count(*) 
   FROM psqryfield qf1 --INNER JOIN psrecfielddb f1 ON f1.recname = r1.recname AND f1.fieldname = qf1.fieldname
   ,    psqryfield qf2 INNER JOIN psrecfielddb f2 ON f2.recname = r2.recname AND f2.fieldname = qf2.fieldname 
                                                 AND MOD(f2.useedit,2)=1 /*key fields only*/
   , psqrycriteria c
   WHERE qf1.oprid = r1.oprid AND qf1.qryname = r1.qryname AND qf1.selnum = r1.selnum AND qf1.recname = r1.recname AND qf1.fldrcdnum = r1.rcdnum
   AND   qf2.oprid = r2.oprid AND qf2.qryname = r2.qryname AND qf2.selnum = r2.selnum AND qf2.recname = r2.recname AND qf2.fldrcdnum = r2.rcdnum
   AND    c.oprid = r1.oprid AND c.qryname = r1.qryname AND  c.selnum = r1.selnum 
   AND   (  (c.lcrtselnum = r1.selnum AND c.lcrtfldnum = qf1.fldnum AND c.r1crtselnum = r2.selnum AND c.r1crtfldnum = qf2.fldnum)
         OR (c.lcrtselnum = r2.selnum AND c.lcrtfldnum = qf2.fldnum AND c.r1crtselnum = r1.selnum AND c.r1crtfldnum = qf1.fldnum))
-- AND rownum = 1
  ) num_key_fields
FROM psrecdefn r
, psqryrecord r1
  INNER JOIN psqryrecord r2 ON r1.oprid = r2.oprid AND r1.qryname = r2.qryname AND r1.selnum = r2.selnum AND r1.rcdnum != r2.rcdnum --AND r1.corrname < r2.corrname
WHERE r.recname = r2.recname AND r.parentrecname = r1.recname
)
SELECT x.* FROM x
WHERE num_key_fields = 0
AND recname1 IN('JRNL_HEADER')
AND recname2 IN('JRNL_LN','JRNL_DRILL_VW')
ORDER BY 1,2,3
/

However, these queries may not have been run recently.  Users tend to write queries, save a modification as a new version, and then abandon the old version.

                                          Sel Rec1                    Cor Rec2                    Cor #Key
OPRID     QRYNAME                           #    # Record 1           #1     # Record             #2  Flds
--------- ------------------------------ ---- ---- ------------------ --- ---- ------------------ --- ----
          2_XX_CHI_JOURNAL_MES2_RE          1    1 JRNL_HEADER        A      2 JRNL_LN            B      0
          12300_GL_ACCOUNT_DETAIL           1    1 JRNL_HEADER        A      2 JRNL_LN            B      0
          123_DK                            1    1 JRNL_HEADER        A      2 JRNL_LN            B      0
          123_NEW                           1    1 JRNL_HEADER        A      2 JRNL_LN            B      0
          12345_ACCRUAL_JE_DETAILS          1    1 JRNL_HEADER        A      2 JRNL_LN            C      0
          12345_ACCRUAL_JE_DETAILS_V2       1    1 JRNL_HEADER        A      2 JRNL_LN            C      0
          12345_ACCRUAL_JE_DETAILS_V3       1    1 JRNL_HEADER        A      2 JRNL_LN            C      0
          12345_HARDSOFT_JE_DETAILS_V3      1    1 JRNL_HEADER        A      2 JRNL_LN            C      0
          12345_BM_CURR_ACTIVITY2           1    1 JRNL_HEADER        A      2 JRNL_LN            B      0
          AAIC_CBP_POOLS                    1    1 JRNL_HEADER        A      2 JRNL_LN            B      0
…

I demonstrated how to identify long-running PS/Queries on the process schedulers in an earlier blog post. The following query merges in that query, so that it only considers queries that have run on a process scheduler within the purge period, for which join criteria may be missing.  They are sorted by descending execution time.

REM qry_missingjoins.sql
WITH q1 as (
SELECT r.prcsinstance
, r.oprid runoprid, r.runcntlid
, DECODE(c.private_query_flag,'Y','Private','N','Public') private_query_flag
, DECODE(c.private_query_flag,'Y',r.oprid,' ') oprid
, c.qryname
, CAST(begindttm AS DATE) begindttm
, CAST(enddttm AS DATE) enddttm
, runstatus
, (CAST(NVL(enddttm,SYSDATE) AS DATE)-CAST(begindttm AS DATE))*86400 exec_Secs
FROM psprcsrqst r
  LEFT OUTER JOIN ps_query_run_cntrl c ON c.oprid = r.oprid AND c.run_cntl_id = r.runcntlid
WHERE prcsname = 'PSQUERY'
AND dbname IN(select DISTINCT dbname from ps.psdbowner)
--AND r.begindttm >= trunc(SYSDATE)-2+8/24
--AND r.begindttm <= trunc(SYSDATE)-2+19/24
), q as (
Select /*+MATERIALIZE*/ oprid, qryname
, SUM(exec_secs) exec_secs
, COUNT(*) num_execs
, COUNT(DECODE(runstatus,'9',1,NULL)) complete_execs
, COUNT(DISTINCT runoprid) runoprids
FROM q1
GROUP BY oprid, qryname
), x as (
SELECT r1.oprid, r1.qryname, r1.selnum
, r1.rcdnum rcdnum1, r1.recname recname1, r1.corrname corrname1
, r2.rcdnum rcdnum2, r2.recname recname2, r2.corrname corrname2
, (SELECT count(*) 
   FROM psqryfield qf1 --INNER JOIN psrecfielddb f1 ON f1.recname = r1.recname AND f1.fieldname = qf1.fieldname
   ,    psqryfield qf2 INNER JOIN psrecfielddb f2 ON f2.recname = r2.recname AND f2.fieldname = qf2.fieldname AND MOD(f2.useedit,2)=1
   , psqrycriteria c
   WHERE qf1.oprid = r1.oprid AND qf1.qryname = r1.qryname AND qf1.selnum = r1.selnum AND qf1.recname = r1.recname AND qf1.fldrcdnum = r1.rcdnum
   AND   qf2.oprid = r2.oprid AND qf2.qryname = r2.qryname AND qf2.selnum = r2.selnum AND qf2.recname = r2.recname AND qf2.fldrcdnum = r2.rcdnum
   AND    c.oprid = r1.oprid AND  c.qryname = r1.qryname AND  c.selnum = r1.selnum 
   AND   (  (c.lcrtselnum = r1.selnum AND c.lcrtfldnum = qf1.fldnum AND c.r1crtselnum = r2.selnum AND c.r1crtfldnum = qf2.fldnum)
         OR (c.lcrtselnum = r2.selnum AND c.lcrtfldnum = qf2.fldnum AND c.r1crtselnum = r1.selnum AND c.r1crtfldnum = qf1.fldnum))
   AND rownum = 1
  ) num_key_fields
FROM psrecdefn r
, psqryrecord r1
  INNER JOIN psqryrecord r2 ON r1.oprid = r2.oprid AND r1.qryname = r2.qryname AND r1.selnum = r2.selnum AND r1.rcdnum != r2.rcdnum --AND r1.corrname < r2.corrname
WHERE r.recname = r2.recname AND r.parentrecname = r1.recname
)
SELECT /*+LEADING(Q)*/ q.*, x.selnum
, x.rcdnum1, x.recname1, x.corrname1
, x.rcdnum2, x.recname2, x.corrname2, x.num_key_fields
FROM x
  INNER JOIN q ON q.oprid = x.oprid AND q.qryname = x.qryname
WHERE num_key_fields = 0
AND exec_secs >= 600
ORDER BY exec_secs desc

/

Now I have a list of candidate queries that have been used recently and may be missing joins that I investigate further.

                                                                                          Sel Rec1                    Cor Rec2                    Cor #Key
OPRID     QRYNAME                         EXEC_SECS  NUM_EXECS COMPLETE_EXECS  RUNOPRIDS    #    # Record 1           #1     # Record 2           #2  Flds
--------- ------------------------------ ---------- ---------- -------------- ---------- ---- ---- ------------------ --- ---- ------------------ --- ----
UKXXXXXXX AR_VENDOR_LOCATION_DETAILB         264317        361            360          1    1    1 VENDOR             A      8 VNDR_LOC_SCROL     H      0
          XX_COL_MOV_ALT_ACCT2_PERIO         193692       2096           2051         14    1    1 JRNL_HEADER        A      3 OPEN_ITEM_GL       C      0
          APC_123_LEDGER_ACTIVITY_BY_BU      151438       2959           2938         73    2    1 JRNL_HEADER        B      2 JRNL_LN            C      0
MXXXXXX   MT_AUSTRALIA_TAX_PMTS              137471         36             28          1    1    1 JRNL_HEADER        A      2 JRNL_LN            B      0
          XX_PAN_ASIA_JOURNALS_REF           135825         48             47          4    1    1 JRNL_HEADER        A      5 JRNL_OPENITM_VW    E      0
          XXX_STKCOMP_LIFE                   120537        526            523          1    1    1 JRNL_HEADER        A      2 JRNL_LN            B      0
          XXX_123_TB_LEDGER_BAL_BU           100848       2093           2044         17    3    1 JRNL_HEADER        B      2 JRNL_LN            C      0
KXXXXXX   XXX_JRNL_LIST_AUDIT_KL              99843        489            482          1    1    2 JRNL_HEADER        B      1 JRNL_DRILL_VW      A      0
          XXX_JE_ID_QUERY                     86106        156            151          1    1    1 JRNL_HEADER        A      2 JRNL_LN            C      0
          XXX_ACTIVITY_DETAILS_2              85356        336            302          5    1    1 JRNL_HEADER        A      2 JRNL_LN            B      0
…

Anyone can inspect any public queries, but you must be logged in as the owner of a private query to be able to see it.

The scripts in this article can be downloaded from GitHub davidkurtz/psscripts.

Monday, November 25, 2024

PeopleSoft PS/Query: Identify Long Running Queries (on Process Schedulers)

This is the first of a series in which I will share some of my PeopleSoft scripts, and explain how they work.
Many PeopleSoft users like its ad hoc query tool because they can write their own queries directly on the system, without having to learn to write structured query language (SQL), or getting a developer to write it for them.  

What is the Problem?

This tool is disliked and even feared by database administrators (DBAs) and system administrators, because it is easy for users to create poor queries, that either don't work as intended or can run for long periods, sometimes indefinitely, without even producing results.  This can consume significant amounts of CPU.

Managing Queries Scheduled on the Process Scheduler

The PSQUERY application engine program runs queries on the process scheduler. Users should be encouraged to use this rather than running them online.  
Queries run online via the PeopleSoft Internet Architecture (PIA) cannot be managed.  
  • There is no limit to the number of queries that users can initiate concurrently.  
  • The number that can actually execute concurrently is limited by the number of PSQRYSRV processes in each application server domain.  Any additional requests will simply queue up in Tuxedo.
  • It is possible to set maximum execution times in the PeopleSoft configuration, on the ICQuery service on the PSQRYSRV server in the application server.  
It is easier to manage and monitor the queries run in PSQUERY processes on the process scheduler.  They don't put any load on the PIA, but they put load on the database.
  • A system-wide maximum number of concurrently executing instances of the application engine program can be set on the process definition.
  • A maximum number of concurrently executing instances of the application engine program per process scheduler can be set (by using a process class).
  • The application engine, or its process class, can be given a lower priority so that other queued processes are run in preference.

PS/Queries run either in the PIA or on the process scheduler can be mapped to low-priority consumer groups in an Oracle database resource manager plan so that they do not starve the rest of the system of CPU (see PeopleSoft DBA Blog: PSFT_PLAN: A Sample Oracle Database Resource Manager Plan for PeopleSoft).
A maximum run time, or maximum estimated run time, can be defined for a consumer group.  If the limit is breached an Oracle error is raised: ORA-00040: active time limit exceeded - call aborted. In the PIA, the error message is simply presented to the user.  The scheduled PSQUERY application engine process will terminate and the error will be logged.  In both cases, the user has to recognise the error message and understand what it means.  Otherwise, they will raise the issue with support.
The various methods of setting maximum execution time limits are quite blunt instruments.  They are essentially one-size-fits-all approaches.  Typically, some queries are expected to run for a long time, and then the limits must be set to accommodate them. 

Queries Scheduled on the Process Scheduler

I can query who has run which queries, and how long they ran for.  Simply outer join the run control record for the PSQUERY application engine (PS_QUERY_RUN_CNTL) to the process scheduler request table (PSPRCSRQST).

In this case, I am interested in 
the top 50 PS/Queries by cumulative execution
with a cumulative execution time of over 5 minutes (300s)
that were scheduled yesterday between 8am and 7pm
REM qry_missingjoins.sql
WITH x as (
SELECT r.prcsinstance, r.oprid, r.runcntlid
,      DECODE(c.private_query_flag,'Y','Private','N','Public') private_query_flag, c.qryname
,      CAST(begindttm AS DATE) begindttm
,      CAST(enddttm AS DATE) enddttm
,      runstatus
,      (CAST(NVL(enddttm,SYSDATE) AS DATE)-CAST(begindttm AS DATE))*86400 exec_Secs
FROM   psprcsrqst r
  LEFT OUTER JOIN ps_query_run_cntrl c ON c.oprid = r.oprid AND c.run_cntl_id = r.runcntlid
WHERE  prcsname = 'PSQUERY'
AND    r.begindttm >= TRUNC(SYSDATE)-0+8/24 /*from 8am*/
AND    r.begindttm <= TRUNC(SYSDATE)-0+19/24 /*to 7pm*/
)
SELECT x.* FROM x
WHERE  exec_Secs >= 300 /*Over 5 minutes*/
ORDER BY exec_secs desc /*descending order of elapsed time*/
FETCH FIRST 50 ROWS ONLY /*top 50 ROWS ONLY*/
/
I now have a profile of top queries that I can use to direct further investigation.
  Process                                           Private                                                                    Run     Exec
 Instance OPRID      RUNCNTLID                      Query   QRYNAME                        BEGINDTTM         ENDDTTM           Stat    Secs
--------- ---------- ------------------------------ ------- ------------------------------ ----------------- ----------------- ---- -------
 12344471 F******    ***AM_FIN_GL_AP                Public  ***AM_FIN_GL_AP                10:06:21 19.**.** 19:08:52 19.**.** 8      32551
 12344342 N******    ownxxxxxxxxxxxx                Public  ***_TRIAL_BALANCE_BY_BU_***_V2 09:41:58 19.**.** 18:20:09 19.**.** 10     31091
 12344336 N******    ojnxxxxxxxxxx                  Public  ***_TRIAL_BALANCE_BY_BU_***    09:40:27 19.**.** 16:51:11 19.**.** 10     25844
 12345209 N******    eowxxxxxxxxxxxxx               Public  ***_TRIAL_BALANCE_BY_BU_***    12:41:17 19.**.** 19:08:30 19.**.** 8      23233
 12345213 N******    iwoxxxxxxxxxxxxx               Public  ***_TRIAL_BALANCE_BY_BU_***_V2 12:41:53 19.**.** 19:08:56 19.**.** 8      23223
 12345574 B******    gl                             Private ***_GL_BJU                     14:27:32 19.**.** 19:08:59 19.**.** 8      16887
 12345681 B******    gl                             Private ***_GL_BJU                     14:51:06 19.**.** 19:09:02 19.**.** 8      15476
 12345852 W******    insolvents                     Public  ***INSOLVENTS_JRNL_DETAIL      15:24:41 19.**.** 19:09:04 19.**.** 8      13463
…
                                                                                                                                    -------
sum                                                                                                                                  268112
Notes: 
  • Some details have been redacted from this real-world example.
  • The result is not guaranteed to be completely accurate.  A user might have reused a run control record and can only get the current value.
  • This and other scripts can be downloaded from GitHub davidkurtz/psscripts.