Wednesday, July 09, 2014

Introducing the Analytic Keep Clause for Effective-Dated/Sequence Queries in PeopleSoft

Those of us who work with PeopleSoft, and especially the HCM product, are all too familiar with the concept of effect-dated data, and the need to find data that was effective at a particular date.  PeopleSoft products have always made extensive use of correlated sub-queries to determine the required rows from an effective-dated record.
The JOB record is the heart of HCM. It is both effective-dated and effective sequenced. I will use it for the demonstrations in this article. I am going to suggest an alternative, although Oracle-specific, SQL construction.
 Let's start by looking at the job data for an employee in the demo database. Employee KF0018 has 17 rows of data two concurrent jobs.  The question I am going to ask is "What was the annual salary for this employee on 11 February 1995?".  Therefore, I am interested in the rows marked below with the asterisks. 
column annual_rt format 999,999
SELECT  emplid, empl_rcd, effdt, effseq, action, deptid, currency_cd, annual_rt
FROM	ps_job j
WHERE	j.emplid = 'KF0018'
ORDER BY 1,2,3,4
/ 
EMPLID        EMPL_RCD EFFDT         EFFSEQ ACT DEPTID     CUR ANNUAL_RT
----------- ---------- --------- ---------- --- ---------- --- ---------
KF0018               0 12-JUN-83          0 HIR 13000      FRF   120,000
KF0018               0 01-JAN-84          0 PAY 13000      FRF   123,600
KF0018               0 01-JAN-85          0 PAY 13000      FRF   127,308
KF0018               0 01-JAN-86          0 PAY 13000      FRF   131,764
KF0018               0 01-JAN-87          0 PAY 13000      FRF   136,376
KF0018               0 01-JAN-88          0 PAY 13000      FRF   140,467
KF0018               0 01-JAN-89          0 PAY 13000      FRF   147,490
KF0018               0 22-JAN-95          0 PRO 13000      FRF   147,490
KF0018               0 22-JAN-95          1 PAY 13000      FRF   294,239 *
KF0018               0 22-JAN-96          0 PAY 13000      FRF   318,575
KF0018               0 01-JAN-98          0 PAY 13000      FRF   346,156
KF0018               0 01-JAN-00          0 DTA 13000      FRF   346,156
KF0018               0 01-JAN-02          0 PAY 13000      EUR    52,771
KF0018               1 01-NOV-89          0 ASG 21300      GBP    22,440
KF0018               1 31-DEC-93          0 ASC 21300      GBP    22,440
KF0018               1 01-JAN-94          0 ASG 12000      GBP    22,440 *
KF0018               1 31-DEC-95          0 ASC 10000      GBP    22,440
I will set the statistics level to ALL so I can obtain detailed information about how the SQL statements execute:
ALTER SESSION SET statistics_level = ALL;
I extracted the execution plans and execution statistics with the following command
select * from table(dbms_xplan.display_cursor(null,null,'IOSTATS')) 

Typical PeopleSoft Platform Agnostic Construction

This is the usual way to construct the query in PeopleSoft. It is also valid on all databases platforms supported by PeopleSoft, not just Oracle. 
SELECT  emplid, empl_rcd, effdt, effseq, action, deptid, currency_cd, annual_rt
FROM	ps_job j
WHERE	j.effdt = ( 
        SELECT MAX (j1.effdt) FROM ps_job j1 
        WHERE j1.emplid = j.emplid 
        AND j1.empl_rcd = j.empl_rcd 
        AND j1.effdt <= TO_DATE('19950211','YYYYMMDD')) 
AND j.effseq = ( 
        SELECT MAX (j2.effseq) FROM ps_job j2 
        WHERE j2.emplid = j.emplid 
        AND j2.empl_rcd = j.empl_rcd 
        AND j2.effdt = j.effdt)
AND j.emplid = 'KF0018'
ORDER BY 1,2,3,4
/
EMPLID        EMPL_RCD EFFDT         EFFSEQ ACT DEPTID     CUR ANNUAL_RT
----------- ---------- --------- ---------- --- ---------- --- ---------
KF0018               0 22-JAN-95          1 PAY 13000      FRF   294,239
KF0018               1 01-JAN-94          0 ASG 12000      GBP    22,440
This required three access of indexes on the PS_JOB table, and two accesses of the table, using 26 consistent reads.
Plan hash value: 2299825310
----------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |         |      1 |        |      2 |00:00:00.01 |      26 |      2 |
|   1 |  SORT ORDER BY                    |         |      1 |      1 |      2 |00:00:00.01 |      26 |      2 |
|   2 |   NESTED LOOPS                    |         |      1 |      1 |      2 |00:00:00.01 |      26 |      2 |
|   3 |    NESTED LOOPS                   |         |      1 |      1 |      3 |00:00:00.01 |      21 |      2 |
|   4 |     VIEW                          | VW_SQ_1 |      1 |      1 |      2 |00:00:00.01 |      14 |      2 |
|*  5 |      FILTER                       |         |      1 |        |      2 |00:00:00.01 |      14 |      2 |
|   6 |       HASH GROUP BY               |         |      1 |      1 |      2 |00:00:00.01 |      14 |      2 |
|   7 |        TABLE ACCESS BY INDEX ROWID| PS_JOB  |      1 |      1 |     12 |00:00:00.01 |      14 |      2 |
|*  8 |         INDEX RANGE SCAN          | PS_JOB  |      1 |      1 |     12 |00:00:00.01 |       2 |      2 |
|   9 |     TABLE ACCESS BY INDEX ROWID   | PS_JOB  |      2 |      1 |      3 |00:00:00.01 |       7 |      0 |
|* 10 |      INDEX RANGE SCAN             | PSAJOB  |      2 |      1 |      3 |00:00:00.01 |       4 |      0 |
|* 11 |    VIEW PUSHED PREDICATE          | VW_SQ_2 |      3 |      1 |      2 |00:00:00.01 |       5 |      0 |
|* 12 |     FILTER                        |         |      3 |        |      3 |00:00:00.01 |       5 |      0 |
|  13 |      SORT AGGREGATE               |         |      3 |      1 |      3 |00:00:00.01 |       5 |      0 |
|* 14 |       FILTER                      |         |      3 |        |      5 |00:00:00.01 |       5 |      0 |
|* 15 |        INDEX RANGE SCAN           | PSAJOB  |      3 |      1 |      5 |00:00:00.01 |       5 |      0 |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter("J1"."EMPLID"='KF0018')
   8 - access("J1"."EMPLID"='KF0018' AND "J1"."SYS_NC00164$">=HEXTORAW('883CFDF4FEF8FEFAFF') )
       filter(SYS_OP_UNDESCEND("J1"."SYS_NC00164$")<=TO_DATE(' 1995-02-11 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))
  10 - access("J"."EMPLID"='KF0018' AND "ITEM_2"="J"."EMPL_RCD" AND
              "J"."SYS_NC00164$"=SYS_OP_DESCEND("MAX(J1.EFFDT)"))
       filter(SYS_OP_UNDESCEND("J"."SYS_NC00164$")="MAX(J1.EFFDT)")
  11 - filter(SYS_OP_UNDESCEND("J"."SYS_NC00165$")="MAX(J2.EFFSEQ)")
  12 - filter(COUNT(*)>0)
  14 - filter('KF0018'="J"."EMPLID")
  15 - access("J2"."EMPLID"='KF0018' AND "J2"."EMPL_RCD"="J"."EMPL_RCD" AND
              "J2"."SYS_NC00164$"=SYS_OP_DESCEND(SYS_OP_UNDESCEND("J"."SYS_NC00164$")))
       filter(SYS_OP_UNDESCEND("J2"."SYS_NC00164$")=SYS_OP_UNDESCEND("J"."SYS_NC00164$"))
This construction is also the reason you are required to set
_UNNEST_SUBQUERY=FALSE
on all PeopleSoft systems

Analytic Function and In-Line View/Sub-query Factor

I have seen people use a combination of analytic functions and in-line views to avoid having to use the correlated sub-query construction. This has been possible since Oracle 9i.
WITH X AS (
SELECT  emplid, empl_rcd, effdt, effseq, action, deptid, currency_cd, annual_rt
,	ROW_NUMBER() OVER (PARTITION BY emplid, empl_rcd 
                           ORDER BY effdt DESC, effseq DESC) myrowseq
FROM	ps_job j
WHERE	j.effdt <= TO_DATE('19950211','YYYYMMDD')
AND	j.emplid = 'KF0018'
)
SELECT 	emplid, empl_rcd, effdt, effseq, action, deptid, currency_cd, annual_rt
FROM 	x
WHERE	myrowseq = 1
ORDER BY 1,2,3,4
/
EMPLID        EMPL_RCD EFFDT         EFFSEQ ACT DEPTID     CUR ANNUAL_RT
----------- ---------- --------- ---------- --- ---------- --- ---------
KF0018               0 22-JAN-95          1 PAY 13000      FRF   294,239
KF0018               1 01-JAN-94          0 ASG 12000      GBP    22,440
We get the same result, but now the index is scanned just once and we only need 14 consistent reads, so it produces a significant improvement. However, it still includes a sort operation in addition to the window function. We have to create a sequence number field in the in-line view and filter by that in the final query.
Plan hash value: 1316906785
---------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |        |      1 |        |      2 |00:00:00.01 |      14 |
|   1 |  SORT ORDER BY                 |        |      1 |      1 |      2 |00:00:00.01 |      14 |
|*  2 |   VIEW                         |        |      1 |      1 |      2 |00:00:00.01 |      14 |
|*  3 |    WINDOW NOSORT               |        |      1 |      1 |     12 |00:00:00.01 |      14 |
|   4 |     TABLE ACCESS BY INDEX ROWID| PS_JOB |      1 |      1 |     12 |00:00:00.01 |      14 |
|*  5 |      INDEX RANGE SCAN          | PSAJOB |      1 |      1 |     12 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("MYROWSEQ"=1)
   3 - filter(ROW_NUMBER() OVER ( PARTITION BY "EMPLID","EMPL_RCD" ORDER BY
              "J"."SYS_NC00164$","J"."SYS_NC00165$")<=1)
   5 - access("J"."EMPLID"='KF0018' AND "J"."SYS_NC00164$">=HEXTORAW('883CFDF4FEF8FEFAFF')
              )
       filter(SYS_OP_UNDESCEND("J"."SYS_NC00164$")<=TO_DATE(' 1995-02-11 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))

Analytic Function Keep Clause

This form of the analytic functions is documented for the first time in 12c, but is available in 10g (my thanks to Tony Hasler for introducing me to it). It works by effectively keeping a running maximum value of the columns in the order by clause within in group.
SELECT emplid, empl_rcd
, MAX(effdt) KEEP (DENSE_RANK LAST ORDER BY effdt, effseq) AS effdt
, MAX(effseq) KEEP (DENSE_RANK LAST ORDER BY effdt, effseq) AS effseq
, MAX(action) KEEP (DENSE_RANK LAST ORDER BY effdt, effseq) AS action
, MAX(deptid) KEEP (DENSE_RANK LAST ORDER BY effdt, effseq) AS deptid
, MAX(currency_cd) KEEP (DENSE_RANK LAST ORDER BY effdt, effseq) AS currency_cd
, MAX(annual_rt) KEEP (DENSE_RANK LAST ORDER BY effdt, effseq) AS annual_rt
FROM ps_job j
WHERE	j.effdt <= TO_DATE('19950211','YYYYMMDD')
AND	j.emplid = 'KF0018'
GROUP BY emplid, empl_rcd
/

EMPLID        EMPL_RCD EFFDT         EFFSEQ ACT DEPTID     CUR ANNUAL_RT
----------- ---------- --------- ---------- --- ---------- --- ---------
KF0018               0 22-JAN-95          1 PAY 13000      FRF   294,239
KF0018               1 01-JAN-94          0 ASG 12000      GBP    22,440
Although this construction uses an additional consistent read, it has the advantage of not using either an inline view or a window function and does not sort the data.
Plan hash value: 1550496807
-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |        |      1 |        |      2 |00:00:00.01 |      15 |
|   1 |  SORT GROUP BY NOSORT        |        |      1 |      1 |      2 |00:00:00.01 |      15 |
|   2 |   TABLE ACCESS BY INDEX ROWID| PS_JOB |      1 |      1 |     12 |00:00:00.01 |      15 |
|*  3 |    INDEX RANGE SCAN          | PS_JOB |      1 |      1 |     12 |00:00:00.01 |       3 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("J"."EMPLID"='KF0018' AND "J"."SYS_NC00164$">=HEXTORAW('883CFDF3FEF8FEFAFF'
              ) )
       filter(SYS_OP_UNDESCEND("J"."SYS_NC00164$")<=TO_DATE(' 1995-02-12 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))
I think this construction could be useful in PeopleSoft.  At first glance, the SQL appears more complicated, but in this example, it removed two correlated sub-queries. 

Using Analytic Functions in PS/Query

Of course, you can code it anywhere where you can simply enter SQL as text.  However, it also has the advantage over the other analytic function construction that it can be coded in the PS/Query tool.  The analytic functions in the select clause should be created in PS/Query expressions with the aggregate expression checkbox ticked.
Analytic 'Keep' function in PS/Query Aggregate Expression
Analytic Function in Aggregated Expression in Windows Client version of PS/Query 
The analytic functions can be selected in the PS/Query, and their lengths and titles can be tidied up.
Analytic PS/Query
PS/Query with Analytic 'Keep' Functions

This is the resulting SQL which is the same as before (with row-level security added by PS/Query) and produces the same results.
SELECT A.EMPLID, A.EMPL_RCD, MAX(effdt) KEEP (DENSE_RANK LAST ORDER BY effdt, effseq)
, MAX(effseq) KEEP (DENSE_RANK LAST ORDER BY effdt, effseq), MAX(action) KEEP (DENSE_RANK LAST ORDER BY effdt, effseq), MAX(deptid) KEEP (DENSE_RANK LAST ORDER BY effdt, effseq), MAX(currency_cd) KEEP (DENSE_RANK LAST ORDER BY effdt, effseq), MAX(annual_rt) KEEP (DENSE_RANK LAST ORDER BY effdt, effseq)					
  FROM PS_JOB A, PS_EMPLMT_SRCH_QRY A1
  WHERE ( A.EMPLID = A1.EMPLID
    AND A.EMPL_RCD = A1.EMPL_RCD
    AND A1.OPRID = 'PS'
    AND ( A.EFFDT <= TO_DATE('1995-02-11','YYYY-MM-DD')
     AND A.EMPLID = 'KF0018' ) )
  GROUP BY  A.EMPLID,  A.EMPL_RCD