UKOUG Applications Conference 2014

Wednesday, September 03, 2014

Who is using this index?

Or, to put it another way, I want to change or drop this index, who and what will I impact?

The Challenge 

The problem that I am going to outline is certainly not exclusive to PeopleSoft, but I am going to illustrate it with examples from PeopleSoft. I often find tables with far more indexes than are good for them.
  • The Application Designer tool makes it very easy for developers to add indexes to tables. Sometimes, too easy!
  • Sometimes, DBAs are too quick to unquestioningly follow the advice of the Oracle tuning advisor to add indexes.
Recently, I have been working on 3 different PeopleSoft Financials systems where I have found major tables with a host of indexes.

There are several concerns:
  • Indexes are maintained during data modification. The more indexes you have, the greater the overhead. 
  • The more indexes you have, particularly if they lead on the same columns, the more likely Oracle is to use the wrong one, resulting in poorer performance.
  • There is of course also a space overhead for each index, but this is often of less concern. 
If you can get rid of an index, Oracle doesn't store, maintain or use it. 

In some cases, I have wanted to remove unnecessary indexes, and in others to adjust indexes. However, this immediately raises the question of where are these indexes used, and who will be impacted by the change. Naturally, I turn to the Active Session History (ASH) to help me find the answers. 

Index Maintenance Overhead during DDL 

ASH reports the object number, file number, block number and (from 11g) row number within the block being accessed by physical file operations. However, the values reported in v$active_session_history (and later other views) are not reliable for other events because they are merely left over from the previous file event that reported them. So, we can profile the amount of time spent on physical I/O on different tables and indexes, but not for other forms of DB Time, such as CPU time, spent accessing the blocks in the buffer cache.

Let me take an extreme example from PeopleSoft Global Payroll. The table PS_GP_RSLT_ACUM is one of the principal result tables. It has only a single unique index (with the same name). The table is populated with the simplest of insert statements.
INSERT /*GPPRDMGR_I_ACUM*/ INTO PS_GP_RSLT_ACUM
(CAL_RUN_ID ,EMPLID ,EMPL_RCD ,GP_PAYGROUP ,CAL_ID ,RSLT_SEG_NUM ,PIN_NUM ,EMPL_RCD_ACUM ,ACM_FROM_DT ,ACM_THRU_DT ,USER_KEY1 ,USER_KEY2 ,USER_KEY3 ,USER_KEY4 ,USER_KEY5 ,USER_KEY6 ,SLICE_BGN_DT ,SLICE_END_DT ,COUNTRY ,ACM_TYPE ,ACM_PRD_OPTN ,CALC_RSLT_VAL ,CALC_VAL ,USER_ADJ_VAL ,PIN_PARENT_NUM ,CORR_RTO_IND ,ORIG_CAL_RUN_ID ,SEQ_NUM8 ,VALID_IN_SEG_IND ,CALLED_IN_SEG_IND )
VALUES
(:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15,:16,:17,:18,:19,:20,:21,:22,:23,:24,:25,:26,:27,:28,:29,:30)
I can profile the ASH data for just this statement over the last week on a production system. Note that DBA_OBJECTS and DBA_DATA_FILES are outer joined to the ASH data and only matched for events like 'db file%'
SELECT o.object_type, o.object_name
,      f.tablespace_name, NVL(h.event,'CPU+CPU Wait') event
,      SUM(10) ash_Secs
FROM dba_hist_Active_sess_history h
  LEFT OUTER JOIN dba_objects o
    ON o.object_id = h.current_obj#
   AND h.event like 'db file%'
  LEFT OUTER JOIN dba_data_files f
    ON f.file_id = h.current_file#
   AND h.event like 'db file%'
WHERE h.sql_id = '4ru0618dswz3y'
AND   h.sample_time >= sysdate-7
GROUP BY o.object_type, o.object_name, h.event, f.tablespace_name
ORDER BY ash_secs DESC
/
A full payroll calculation inserts over 3 million rows on this particular system. The calculation is run incrementally several times per week during which old rows are deleted and newly recalculated rows inserted.  Looking at just this insert statement:
  • 30% of the time is spent on CPU operations, we cannot profile that time further with ASH.
  • 38% of the time is spent reading from the table and index, yet this is a simple INSERT … VALUES statement.
OBJECT_TYPE         OBJECT_NAME        TABLESPACE_NAME EVENT                      ASH_SECS
------------------- ------------------ --------------- ------------------------ ----------
                                                       CPU+CPU Wait                   1040
                                       UNDOTBS1        db file sequential read         900
INDEX SUBPARTITION  PS_GP_RSLT_ACUM    GP201408IDX     db file sequential read         750
TABLE SUBPARTITION  PS_GP_RSLT_ACUM    GP201408TAB     db file sequential read         550
                                                       gc current grant 2-way           70
                                                       cursor: pin S wait on X          60
                                                       db file sequential read          10
                                                       buffer exterminate               10
                                                       row cache lock                   10
                                                                                ----------
                                                                                      3400 
More time is spent reading the index than the table.  That is not a surprise.  When you insert a row into a table, you also insert it into the index. Rows in index leaf blocks are ordered by the key columns, and the new entry must go into the right place, so you have to read down the index from the root block, through the branch blocks, to find the correct leaf block for the new entry.
[Digression: Counter-intuitively index compression can improve DML performance. It does for this index.  The overhead of the compression processing can be outweighed by the saving in physical I/O.  It depends.]

Profile Physical I/O by Object 

I can twist this query around and profile DB_TIME by object for 'db file%' events
SELECT o.object_type, o.object_name, sum(10) ash_secs
FROM   dba_hist_active_sess_history h
,      dba_objects o
WHERE  o.object_id = h.current_obj#
AND    h.event LIKE 'db file%'
AND    h.sample_time > sysdate-7
GROUP  BY o.object_type, o.object_name
ORDER BY ash_Secs DESC
Now I can see upon which objects the most time is spent on physical I/O.
OBJECT_TYP OBJECT_NAME          ASH_SECS
---------- ------------------ ----------
TABLE      PS_ITEM                101130
INDEX      PS_WS_ITEM              98750
TABLE      PS_PROJ_RESOURCE        97410
TABLE      PS_BI_LINE              85040
INDEX      PSAPSAPMSGSUBCON        75070
TABLE      PS_BI_HDR               37230
TABLE      PS_RS_ASSIGNMENT        29460
INDEX      PS_PSAPMSGPUBHDR        23230
INDEX      PS_BI_ACCT_ENTRY        21490
TABLE      PS_VOUCHER              21330
TABLE      PS_VCHR_ACCTG_LINE      21250
TABLE      PS_BI_ACCT_ENTRY        18860
…
                              ----------
sum                              1382680
This is a worthwhile exercise, it shows the sources of physical I/O in an application.

However, if you want to find where an index is used, then this query will also identify SQL_IDs where the index is either used in the query or maintained by DML. If I am interested in looking for places where changing or deleting an index could have an impact then I am only interested in SQL query activity. ASH samples which relate to index maintenance are a false positive. Yet, I cannot simply eliminate ASH samples where the SQL_OPNAME is not SELECT because the index may be used in a query within the DML statement.

Another problem with this method is that it matches SQL to ASH by object ID. If someone has rebuilt an index, then its object number changes.

A different approach is required.

Index Use from SQL Plans Captured by AWR 

During an AWR snapshot the top-n SQL statements by each SQL criteria in the AWR report (Elapsed Time, CPU Time, Parse Calls, Shareable Memory, Version Count) , see dbms_workload_repository. The SQL plans are exposed by the view DBA_HIST_SQL_PLAN.

On PeopleSoft systems, I generally recommend decreasing the snapshot interval from the default of 60 minutes to 15. The main reason is that SQL gets aged out of the library cache very quickly in PeopleSoft systems. They generate lots of dynamic code, often with literal values rather than bind variables. Cursor sharing is not recommended for PeopleSoft, so different bind variables result in different SQL_IDs. The dynamic code also results in different SQL IDs even with cursor sharing. Therefore, increasing the snapshot frequency means that will capture more SQL statement. This will increase total volume of the AWR repository simply because there are more snapshots. However, the overall volume of ASH data captured does not change, it just gets copied to the repository earlier.

On DBA_HIST_SQL_PLAN the object ID, owner, type and name are recorded, so I can find the plans which referenced a particular object. I am going to carry on with the example from a PeopleSoft Financials system, and look at indexes on the PS_PROJ_RESOURCE table.

These are some of the indexes on PS_PROJ_RESOURCE. We have 4 indexes that all lead on PROCESS_INSTANCE. I suspect that not all are essential, but I need to work out what is using them, and which one I should retain.
                          Col
INDEX_NAME                Pos COLUMN_NAME          COLUMN_EXPRESSION
------------------ ---------- -------------------- ----------------------------------
…
PSJPROJ_RESOURCE            1 PROCESS_INSTANCE
                            2 BUSINESS_UNIT_GL
                            3 BUSINESS_UNIT
                            4 PROJECT_ID
                            5 ACTIVITY_ID
                            6 CUST_ID

PSLPROJ_RESOURCE            1 PROCESS_INSTANCE
                            2 EMPLID
                            3 EMPL_RCD
                            4 TRANS_DT

PSMPROJ_RESOURCE            1 PROCESS_INSTANCE
                            2 BUSINESS_UNIT
                            3 PROJECT_ID
                            4 ACTIVITY_ID
                            5 RESOURCE_ID

PSNPROJ_RESOURCE            1 PROCESS_INSTANCE
                            2 BUSINESS_UNIT
                            3 TIME_RPTG_CD
…
I find it easier to extract the ASH data to my own working storage table. For each index on PS_PROJ_RESOURCE, I am going to extract a distinct list of plan hash values. I will then extract all ASH data for those plans. Note, that I have not joined the SQL_ID on DBA_HIST_SQL_PLAN. That is because different SQL_IDs can produce the same execution plan. The plan is equally valid for all SQL_IDs that produce the plan, not just the one where the SQL_ID also matches.
DROP TABLE my_ash purge
/
CREATE TABLE my_ash COMPRESS AS
WITH p AS (
  SELECT DISTINCT p.plan_hash_value, p.object#, p.object_owner, p.object_type, p.object_name
  FROM      dba_hist_sql_plan p
  WHERE     p.object_name like 'PS_PROJ_RESOURCE'
 AND p.object_type LIKE 'INDEX%'
  AND p.object_owner = 'SYSADM'
 )
SELECT p.object# object_id, p.object_owner, p.object_type, p.object_name
,      h.*
FROM   dba_hist_active_sess_history h
,      p
WHERE  h.sql_plan_hash_value = p.plan_hash_value
/ 
I am fortunate that PeopleSoft is a well instrumented application. Module and Action are set to fairly sensible values that will tell me whereabouts in the application the ASH sample relates. In the following query, I have omitted any ASH data generated by SQL*Plus, Toad, or SQL Developer, and also any generated by Oracle processes to prevent statistics collection jobs being included.
Set pages 999 lines 150 trimspool on
break on object_name skip 1 
compute sum of ash_secs on object_name
column ash_secs heading 'ASH|Secs' format 9999999
column module format a20
column action format a32
column object_name format a18
column max_sample_time format a19 heading 'Last|Sample'
column sql_plans heading 'SQL|Plans' format 9999
column sql_execs heading 'SQL|Execs' format 99999
WITH h AS (
  SELECT   object_name
  ,  CASE WHEN h.module IS NULL THEN REGEXP_SUBSTR(h.program,'[^.@]+',1,1)
                 WHEN h.module LIKE 'PSAE.%' THEN REGEXP_SUBSTR(h.module,'[^.]+',1,2)
                         ELSE REGEXP_SUBSTR(h.program,'[^.@]+',1,1)
   END as module
  ,  CASE WHEN h.action LIKE 'PI=%' THEN NULL
        ELSE h.action
                    END as action
  , CAST(sample_time AS DATE) sample_time
  ,  sql_id, sql_plan_hash_value, sql_exec_id
  FROM    my_ash h
)
SELECT  object_name, module, action
, sum(10) ash_secs
, COUNT(DISTINCT sql_plan_hash_value) sql_plans
, COUNT(DISTINCT sql_id||sql_plan_hash_value||sql_exec_id) sql_execs
, MAX(sample_time) max_sample_time
FROM     h
WHERE NOT LOWER(module) IN('oracle','toad','sqlplus','sqlplusw')
AND       NOT LOWER(module) LIKE 'sql%'
GROUP BY object_name, module, action
ORDER BY SUBSTR(object_name,4), object_name, ash_Secs desc
/
Spool off
I now have a profile of how much each index is used. In this particular case I found something using every index.  It is possible that you will not find anything that uses some indexes.
                                                                             ASH   SQL    SQL Last
OBJECT_NAME        MODULE               ACTION                              Secs Plans  Execs Sample
------------------ -------------------- -------------------------------- ------- ----- ------ -------------------
…
PSJPROJ_RESOURCE   PC_TL_TO_PC          GFCPBINT_AE.CallmeA.Step24.S        7300     1     66 06:32:57 27/08/2014
                   PC_PRICING           GFCPBINT_AE.CallmeA.Step24.S          40     1      2 08:38:57 22/08/2014
******************                                                       -------
sum                                                                         7340

PSLPROJ_RESOURCE   PC_TL_TO_PC          GFCPBINT_AE.CallmeA.Step28.S        1220     1     53 06:33:17 27/08/2014
******************                                                       -------
sum                                                                         1220

PSMPROJ_RESOURCE   PC_TL_TO_PC          GFCPBINT_AE.XxBiEDM.Step07.S          60     2      6 18:35:18 20/08/2014
******************                                                       -------
sum                                                                           60

PSNPROJ_RESOURCE   PC_TL_TO_PC          GFCPBINT_AE.CallmeA.Step26.S        6720     1     49 18:53:58 26/08/2014
                   PC_TL_TO_PC          GFCPBINT_AE.CallmeA.Step30.S        3460     1     60 06:33:27 27/08/2014
                   GFCOA_CMSN           GFCOA_CMSN.01INIT.Step01.S          2660     1     47 19:19:40 26/08/2014
                   PC_TL_TO_PC          GFCPBINT_AE.CallmeA.Step06.S        1800     1     52 18:53:28 26/08/2014
                   PC_TL_TO_PC          GFCPBINT_AE.CallmeG.Step01.S        1740     1     61 06:34:17 27/08/2014
                   PC_TL_TO_PC          GFCPBINT_AE.CallmeA.Step02.S        1680     1     24 18:53:18 26/08/2014
                   PC_TL_TO_PC          GFCPBINT_AE.CallmeA.Step10.S        1460     1     33 17:26:26 22/08/2014
                   PC_TL_TO_PC          GFCPBINT_AE.CallmeA.Step08.S         920     1     26 17:26:16 22/08/2014
                   PC_TL_TO_PC          GFCPBINT_AE.CallmeA.Step36.S         460     1     18 18:26:38 20/08/2014
                   PC_TL_TO_PC          GFCPBINT_AE.CallmeA.Step09.S         420     1     16 06:33:07 27/08/2014
                   PC_PRICING           GFCPBINT_AE.CallmeG.Step01.S         200     1     10 08:09:55 22/08/2014
                   PC_AP_TO_PC          GFCPBINT_AE.CallmeH.Step00A.S        170     1     17 21:53:26 21/08/2014
                   PC_PRICING           GFCPBINT_AE.CallmeA.Step36.S          20     1      1 08:02:46 05/08/2014
                   PC_PRICING           GFCPBINT_AE.CallmeA.Step30.S          20     1      1 13:42:48 04/08/2014
                   PC_PRICING           GFCPBINT_AE.CallmeA.Step06.S          20     1      1 15:58:35 28/07/2014
                   PC_TL_TO_PC          GFCPBINT_AE.CallmeA.Pseudo.S          20     1      1 19:45:11 06/08/2014
******************                                                       -------
sum                                                                        21770
…
The next stage is to look at individual SQL statements This query looks for which SQL statement is using a particular index on PROJ_RESOURCE. If I can't find the SQL which cost the most time, then just choose another SQL with the same plan
  • I have found that sometimes a plan is captured by AWR, but the SQL statement is not. Personally, I think that is a bug. Working around it has made the following query quite complicated.
Break on object_name skip 1 
column ash_secs heading 'ASH|Secs' format 9999999
Set long 50000
Column cmd Format a200
Spool dmk

WITH h AS (
  SELECT  h.object_name
  ,       CASE WHEN h.module IS NULL THEN REGEXP_SUBSTR(h.program,'[^.@]+',1,1)
               WHEN h.module LIKE 'PSAE.%' THEN REGEXP_SUBSTR(h.module,'[^.]+',1,2)
               ELSE REGEXP_SUBSTR(h.program,'[^.@]+',1,1)
          END as module
  ,       CASE WHEN h.action LIKE 'PI=%' THEN NULL
               ELSE h.action
          END as action
  ,       h.sql_id, h.sql_plan_hash_value
  ,       t.command_type –-not null if plan and statement captured
  FROM    my_ash h
  LEFT OUTER JOIN (
    SELECT t1.*
    FROM dba_hist_sqltext t1
    ,    dba_hist_sql_plan p1
   WHERE t1.sql_id = p1.sql_id
    AND p1.id = 1
    ) t
  ON   t.sql_id = h.sql_id
   AND  t.dbid = h.dbid
  WHERE   h.object_name IN('PSMPROJ_RESOURCE')
  AND     h.object_Type = 'INDEX'
  AND     h.object_owner = 'SYSADM'
  And     NOT LOWER(module) IN('oracle','toad','sqlplus','sqlplusw')
  AND     NOT LOWER(module) LIKE 'sql%'
), x AS ( --aggregate DB time by object and statement
SELECT    object_name, sql_id, sql_plan_hash_value
, sum(10) ash_secs
,  10*COUNT(command_type) sql_secs  --DB time for captured statements only
FROM      h
WHERE NOT LOWER(module) IN('oracle','toad','sqlplus','sqlplusw')
AND       NOT LOWER(module) LIKE 'sql%'
GROUP BY  object_name, sql_id, sql_plan_hash_value
), y AS ( --rank DB time per object and plan
SELECT  object_name, sql_id, sql_plan_hash_value
,  ash_secs
, SUM(ash_secs) OVER (PARTITION BY object_name, sql_plan_hash_value) plan_ash_secs
, row_number() OVER (PARTITION BY object_name, sql_plan_hash_value ORDER BY sql_Secs DESC) ranking
FROM x
), z AS (
SELECT object_name
, CASE WHEN t.sql_text IS NOT NULL THEN y.sql_id
       ELSE (SELECT t1.sql_id
             FROM   dba_hist_sqltext t1
             ,      dba_hist_sql_plan p1
             WHERE  t1.sql_id = p1.sql_id
             AND    p1.plan_hash_value = y.sql_plan_hash_value
             AND    rownum = 1) --if still cannot find statement just pick any one
  END AS sql_id
, y.sql_plan_hash_value, y.plan_ash_secs
, CASE WHEN t.sql_text IS NOT NULL THEN t.sql_text 
       ELSE (SELECT t1.sql_Text
             FROM   dba_hist_sqltext t1
             ,      dba_hist_sql_plan p1
             WHERE  t1.sql_id = p1.sql_id
             AND    p1.plan_hash_value = y.sql_plan_hash_value
             AND    rownum = 1) --if still cannot find statement just pick any one
  END AS sql_text
from y
 left outer join dba_hist_sqltext t
 on t.sql_id = y.sql_id
WHERE ranking = 1 --captured statement with most time
)
SELECT *
--'SELECT * FROM table(dbms_xplan.display_awr('''||sql_id||''','||sql_plan_hash_value||',NULL,''ADVANCED''))/*'||object_name||':'||plan_ash_Secs||'*/;' cmd
FROM z
ORDER BY object_name, plan_ash_secs DESC
/
Spool off
So now I can see the individual SQL statements.
PSJPROJ_RESOURCE   f02k23bqj0xc4          3393167302          7340 UPDATE PS_PROJ_RESOURCE C SET (C.Operating_Unit, C.CHARTFIELD1, C.PRODUCT, C.CLA
                                                                   SS_FLD, C.CHARTFIELD2, C.VENDOR_ID, C.contract_num, C.contract_line_num, …

PSLPROJ_RESOURCE   2fz0gcb2774y0           821236869          1220 UPDATE ps_proj_resource p SET p.deptid = NVL (( SELECT j.deptid FROM ps_job j WH
                                                                   ERE j.emplid = p.emplid AND j.empl_rcd = p.empl_rcd AND j.effdt = ( SELECT MAX (…

PSMPROJ_RESOURCE   96cdkb7jyq863           338292674            50 UPDATE PS_GFCBI_EDM_TA04 a SET a.GFCni_amount = ( SELECT x.resource_amount FROM
                                                                   PS_PROJ_RESOURCE x WHERE x.process_instance = …

                   1kq9rfy8sb8d4          4135884683            10 UPDATE PS_GFCBI_EDM_TA04 a SET a.GFCni_amount = ( SELECT x.resource_amount FROM
                                                                   PS_PROJ_RESOURCE x WHERE x.process_instance = …

PSNPROJ_RESOURCE   ga2x2u4jw9p0x          2282068749          6760 UPDATE PS_PROJ_RESOURCE P SET (P.RESOURCE_TYPE, P.RESOURCE_SUB_CAT) = …

                   9z5qsq6wrr7zp          3665912247          3500 UPDATE PS_PROJ_RESOURCE P SET P.TIME_SHEET_ID = …
If I replace the last select clause with the commented line, then I can generate the commands to extract the statement and plan from the AWR repository.
SELECT * FROM table(dbms_xplan.display_awr('45ggt0yfrh5qp',3393167302,NULL,'ADVANCED'))/*PSJPROJ_RESOURCE:7340*/;

SELECT * FROM table(dbms_xplan.display_awr('8ntxj3694r6kg',821236869,NULL,'ADVANCED'))/*PSLPROJ_RESOURCE:1220*/;

SELECT * FROM table(dbms_xplan.display_awr('96cdkb7jyq863',338292674,NULL,'ADVANCED'))/*PSMPROJ_RESOURCE:50*/;

SELECT * FROM table(dbms_xplan.display_awr('1kq9rfy8sb8d4',4135884683,NULL,'ADVANCED'))/*PSMPROJ_RESOURCE:10*/;

SELECT * FROM table(dbms_xplan.display_awr('ga2x2u4jw9p0x',2282068749,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:6760*/;
SELECT * FROM table(dbms_xplan.display_awr('9z5qsq6wrr7zp',3665912247,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:3500*/;
SELECT * FROM table(dbms_xplan.display_awr('b28btd6k3x8jt',1288409804,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:3060*/;
SELECT * FROM table(dbms_xplan.display_awr('avs70c19khxmw',2276811880,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:2660*/;
SELECT * FROM table(dbms_xplan.display_awr('b78qhsch85g4a',1019599680,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:1960*/;
SELECT * FROM table(dbms_xplan.display_awr('65kq2v1ubybps',3138703971,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:1820*/;
SELECT * FROM table(dbms_xplan.display_awr('1dj17ra70c801',1175874548,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:1460*/;
SELECT * FROM table(dbms_xplan.display_awr('3w71v896s7m5d',3207074729,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:500*/;
SELECT * FROM table(dbms_xplan.display_awr('35mz5bw7p5ubw',2447377432,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:170*/;
Ultimately, I have needed to look through the SQL plans that use an index to decide whether I need to keep that index or decide whether the statement would perform adequately using another index. In this case, on this particular system, I think the index PSMPROJ_RESOURCE would be adequate for this statement, and I would consider dropping PSLPROJ_RESOURCE.
>SELECT * FROM table(dbms_xplan.display_awr('8ntxj3694r6kg',821236869,NULL,'ADVANCED'))/*PSLPROJ_RESOURCE:1220*/;
--------------------
UPDATE ps_proj_resource p SET p.deptid = NVL (( SELECT j.deptid FROM
ps_job j WHERE j.emplid = p.emplid AND j.empl_rcd = p.empl_rcd AND
j.effdt = ( SELECT MAX (j1.effdt) FROM ps_job j1 WHERE j1.emplid =
j.emplid AND j1.empl_rcd = j.empl_rcd AND j1.effdt <= p.trans_dt) AND
j.effseq = ( SELECT MAX (j2.effseq) FROM ps_job j2 WHERE j2.emplid =
j.emplid AND j2.empl_rcd = j.empl_rcd AND j2.effdt = j.effdt)),
p.deptid ) 
WHERE p.process_instance = … 
AND EXISTS ( SELECT
j.deptid FROM ps_job j WHERE j.emplid = p.emplid AND j.empl_rcd =
p.empl_rcd AND j.effdt = ( SELECT MAX (j1.effdt) FROM ps_job j1 WHERE
j1.emplid = j.emplid AND j1.empl_rcd = j.empl_rcd AND j1.effdt <=
p.trans_dt) AND j.effseq = ( SELECT MAX (j2.effseq) FROM ps_job j2
WHERE j2.emplid = j.emplid AND j2.empl_rcd = j.empl_rcd AND j2.effdt =
j.effdt))

Plan hash value: 821236869

-----------------------------------------------------------------------------------------
| Id  | Operation            | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT     |                  |       |       | 63104 (100)|          |
|   1 |  UPDATE              | PS_PROJ_RESOURCE |       |       |            |          |
|   2 |   INDEX RANGE SCAN   | PSLPROJ_RESOURCE |   365 | 11315 |    22   (0)| 00:00:01 |
|   3 |    INDEX RANGE SCAN  | PSAJOB           |     1 |    23 |     3   (0)| 00:00:01 |
|   4 |     SORT AGGREGATE   |                  |     1 |    20 |            |          |
|   5 |      INDEX RANGE SCAN| PSAJOB           |     1 |    20 |     3   (0)| 00:00:01 |
|   6 |     SORT AGGREGATE   |                  |     1 |    23 |            |          |
|   7 |      INDEX RANGE SCAN| PSAJOB           |     1 |    23 |     3   (0)| 00:00:01 |
|   8 |   INDEX RANGE SCAN   | PSAJOB           |     1 |    29 |     3   (0)| 00:00:01 |
|   9 |    SORT AGGREGATE    |                  |     1 |    20 |            |          |
|  10 |     INDEX RANGE SCAN | PSAJOB           |     1 |    20 |     3   (0)| 00:00:01 |
|  11 |    SORT AGGREGATE    |                  |     1 |    23 |            |          |
|  12 |     INDEX RANGE SCAN | PSAJOB           |     1 |    23 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
…
I carried on with examination of SQL statements and execution plans to determine whether each index is really needed or another index (or even no index at all) would do as well.  This decision also requires some background knowledge about the application. Eventually, I decided that I want drop the J, L and N indexes on PROJ_RESOURCE and just keep M. 

Limitations of Method

AWR does not capture all SQLs, nor all SQL plans. First the SQL has to be in the library cache and then it must be one of the top-n. A SQL that is efficient because it uses an appropriate index may not be captured, and will not be detected by this approach. This might lead you to erronously believe that the index could be dropped.
ASH data is purged after a period of time, by default 31 days. If an index is only used by a process that has not run within the retention period, then it will not be detected by this approach. This is another reason to retain ASH and AWR in a repository for a longer period. I have heard 400 days suggested, so that you have ASH for a year and a month.
  • However, this also causes the SYSAUX tablespace to be become very large, so I would suggest regularly moving the data to a separate database. I know one customer who has built a central AWR repository for all their production and test databases and automated regular transfer of data. That repository has been of immense diagnostic value.
[Update] This analysis will not detect index use in support constraint validation (PeopleSoft doesn't use database referential integrity constraints).  As Mark Farnham points out below, that may be a reason for retaining a particular index.

Getting Rid of Indexes 

Obviously any index changes need to be tested carefully in all the places that reference the index, but on the other hand it is not viable to do a full regression test every time you want to change an index.
Therefore, if all the testing is successful and you decide to go ahead and drop the index in production, you might prefer to make it invisible first for a while before actually dropping it. It is likely that the indexes you choose to examine are large and will take time to rebuild. An invisible index will not be used by the optimizer, but it will continue to be maintained during DML. If there are any unfortunate consequences, you can immediately make the index visible without having to rebuild it.

Monday, August 25, 2014

To Hint or not to hint (Application Engine), that is the question

Over the years Oracle has provided a number of plan stability technologies to control how SQL statements are executed.  At the risk of over simplification, Outlines (deprecated in 11g), Profiles, Baselines and Patches work by injecting a set of hints into a SQL statement at parse time.  There is quite a lot of advice from Oracle to use these technologies to fix errant execution plans rather than hint the application.  I think it is generally good advice, however, there are times when this approach does not work well with PeopleSoft, and that is due to the behaviour and structure of PeopleSoft rather than the Oracle database.

It is possible to produce a SQL profile from a plan captured by AWR.  A part of distribution for the SQLT Diagnostic Tool (Doc ID 215187.1) is a script called coe_xfr_sql_profile.sql written by Carlos Sierra.
The only thing I would change in the delivered script, (for use with PeopleSoft and as suggested in a comment) is to create the profile with FORCE_MATCHING so that similar statements with different literal values still match. 

The Slings and Arrows of outrageous execution plans

Let's take an example of a statement (from the vanilla Financials product that has not been customised) that performed poorly because it didn't generate a good execution plan (although I have cut out most of the statement for readability.  Note, that it references instance 5 of PeopleTools temporary record CA_SUM_TAO.
INSERT INTO PS_CA_SUM_RC_TAO5 (…) SELECT
…
FROM PS_CA_PR_SUMM A, PS_CA_SUM_TAO5 B , PS_CA_SUM_IN_USE C WHERE
…
B.PROCESS_INSTANCE = 51381955 AND C.IN_USE_FLAG = 'Y'

Plan hash value: 2039212279
----------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                |                  |       |       | 14424 (100)|          |
|   1 |  LOAD TABLE CONVENTIONAL        |                  |       |       |            |          |
|   2 |   NESTED LOOPS                  |                  |       |       |            |          |
|   3 |    NESTED LOOPS                 |                  |     1 |   430 | 14424   (1)| 00:02:54 |
|   4 |     NESTED LOOPS                |                  |     1 |   318 | 14421   (1)| 00:02:54 |
|   5 |      TABLE ACCESS FULL          | PS_CA_SUM_IN_USE |     1 |    85 | 14420   (1)| 00:02:54 |
|   6 |      TABLE ACCESS BY INDEX ROWID| PS_CA_SUM_TAO5   |     1 |   233 |     1   (0)| 00:00:01 |
|   7 |       INDEX UNIQUE SCAN         | PS_CA_SUM_TAO5   |     1 |       |     0   (0)|          |
|   8 |     INDEX RANGE SCAN            | PSACA_PR_SUMM    |     1 |       |     2   (0)| 00:00:01 |
|   9 |    TABLE ACCESS BY INDEX ROWID  | PS_CA_PR_SUMM    |     1 |   112 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
However, below is the plan we get on instance 4.  We get this plan because there is already a profile that has been applied in the past, but now we are on a different non-shared instance of the temporary table, so the profile cannot match because we are on different objects, and we get the same problem, but on different non-shared instances of the temporary record.  Different literal values, such as those for Process Instance can be handled by FORCE_MATCHING, but not different tables.  This is a totally different SQL statement.
SQL_ID 5gtxdqbx0d0c3
--------------------
INSERT INTO PS_CA_SUM_RC_TAO4 (…) SELECT 
…
FROM PS_CA_PR_SUMM A, PS_CA_SUM_TAO4 B , PS_CA_SUM_IN_USE C WHERE
…
B.PROCESS_INSTANCE = 51377796 AND C.IN_USE_FLAG = 'Y'

Plan hash value: 3552771247

---------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT               |                  |       |       | 36361 (100)|          |
|   1 |  LOAD TABLE CONVENTIONAL       |                  |       |       |            |          |
|   2 |   HASH JOIN                    |                  |     1 |   430 | 36361   (3)| 00:07:17 |
|   3 |    TABLE ACCESS FULL           | PS_CA_SUM_IN_USE |     1 |    85 | 14347   (1)| 00:02:53 |
|   4 |    NESTED LOOPS                |                  |       |       |            |          |
|   5 |     NESTED LOOPS               |                  |     1 |   345 | 22014   (3)| 00:04:25 |
|   6 |      TABLE ACCESS FULL         | PS_CA_PR_SUMM    |  5268K|   562M| 21539   (1)| 00:04:19 |
|   7 |      INDEX UNIQUE SCAN         | PS_CA_SUM_TAO4   |     1 |       |     0   (0)|          |
|   8 |     TABLE ACCESS BY INDEX ROWID| PS_CA_SUM_TAO4   |     1 |   233 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Note
-----
   - SQL profile "coe_gn3n77gs6xj2a_3552771247" used for this statement
Of course, the statement on instance 4 had a profile because it was added as a short term fix and then left in situ long term.  It worked fine until a process errored, left the non-shared instance of the temporary record allocated to that process instance, and so PeopleSoft allocated instance 5 on the next execution.
So we could just create another profile using the coe_xfr_sql_profile.sql script
SPO coe_xfr_sql_profile_5gtxdqbx0d0c3_3552771247.log;
SET ECHO ON TERM ON LIN 2000 TRIMS ON NUMF 99999999999999999999;
REM
REM $Header: 215187.1 coe_xfr_sql_profile_5gtxdqbx0d0c3_3552771247.sql 11.4.1.4 2014/08/13 csierra $
REM
REM Copyright (c) 2000-2010, Oracle Corporation. All rights reserved.
REM
REM AUTHOR
REM   carlos.sierra@oracle.com
REM
REM SCRIPT
REM   coe_xfr_sql_profile_5gtxdqbx0d0c3_3552771247.sql
REM
REM DESCRIPTION
REM   This script is generated by coe_xfr_sql_profile.sql
REM   It contains the SQL*Plus commands to create a custom
REM   SQL Profile for SQL_ID 5gtxdqbx0d0c3 based on plan hash
REM   value 3552771247.
REM   The custom SQL Profile to be created by this script
REM   will affect plans for SQL commands with signature
REM   matching the one for SQL Text below.
REM   Review SQL Text and adjust accordingly.
REM
REM PARAMETERS
REM   None.
REM
REM EXAMPLE
REM   SQL> START coe_xfr_sql_profile_5gtxdqbx0d0c3_3552771247.sql;
REM
REM NOTES
REM   1. Should be run as SYSTEM or SYSDBA.
REM   2. User must have CREATE ANY SQL PROFILE privilege.
REM   3. SOURCE and TARGET systems can be the same or similar.
REM   4. To drop this custom SQL Profile after it has been created:
REM      EXEC DBMS_SQLTUNE.DROP_SQL_PROFILE('coe_5gtxdqbx0d0c3_3552771247');
REM   5. Be aware that using DBMS_SQLTUNE requires a license
REM      for the Oracle Tuning Pack.
REM
WHENEVER SQLERROR EXIT SQL.SQLCODE;
REM
VAR signature NUMBER;
REM
DECLARE
sql_txt CLOB;
h       SYS.SQLPROF_ATTR;
BEGIN
sql_txt := q'[
INSERT INTO PS_CA_SUM_RC_TAO4 (PROCESS_INSTANCE, BUSINESS_UNIT, PROJECT_ID, ACTIVITY_ID, ANALYSIS_TYPE, RESOURCE_TYPE, RESOURCE_CATEGORY, RESOURCE_SUB_CAT, BI_DISTRIB_STATUS, GL_DISTRIB_STATUS, FOREIGN_CURRENCY, CONTRACT_CURRENCY, CONTRACT_NUM, CONTRACT_LINE_NUM, CA_FEE_STATUS, RESOURCE_QUANTITY, FOREIGN_AMOUNT_BSE, FOREIGN_AMOUNT_INC, FOREIGN_AMOUNT, CONTRACT_AMT_BSE, CONTRACT_AMT_INC, CONTRACT_AMT, MIN_TRANS_DT, MAX_TRANS_DT, CAND_MIN_TRANS_DT, CAND_MAX_TRANS_DT) SELECT B.PROCESS_INSTANCE, A.BUSINESS_UNIT, A.PROJECT_ID, A.ACTIVITY_ID, A.ANALYSIS_TYPE, A.RESOURCE_TYPE, A.RESOURCE_CATEGORY, A.RESOURCE_SUB_CAT, A.BI_DISTRIB_STATUS, A.GL_DISTRIB_STATUS,
A.FOREIGN_CURRENCY, A.CONTRACT_CURRENCY, A.CONTRACT_NUM, A.CONTRACT_LINE_NUM, A.CA_FEE_STATUS, (A.RESOURCE_QUANTITY+B.RESOURCE_QUANTITY), A.FOREIGN_AMOUNT, B.FOREIGN_AMOUNT, (A.FOREIGN_AMOUNT+B.FOREIGN_AMOUNT), A.CONTRACT_AMT, B.CONTRACT_AMT, (A.CONTRACT_AMT+B.CONTRACT_AMT), A.MIN_TRANS_DT, A.MAX_TRANS_DT, B.CAND_MIN_TRANS_DT, B.CAND_MAX_TRANS_DT FROM PS_CA_PR_SUMM A, PS_CA_SUM_TAO4 B , PS_CA_SUM_IN_USE C WHERE B.BUSINESS_UNIT = C.BUSINESS_UNIT AND B.PROJECT_ID = C.PROJECT_ID AND B.ACTIVITY_ID = C.ACTIVITY_ID AND B.ANALYSIS_TYPE = C.ANALYSIS_TYPE AND B.RESOURCE_TYPE = C.RESOURCE_TYPE AND B.RESOURCE_CATEGORY = C.RESOURCE_CATEGORY AND B.RESOURCE_SUB_CAT =
C.RESOURCE_SUB_CAT AND B.BI_DISTRIB_STATUS = C.BI_DISTRIB_STATUS AND B.GL_DISTRIB_STATUS = C.GL_DISTRIB_STATUS AND B.FOREIGN_CURRENCY = C.FOREIGN_CURRENCY AND B.CONTRACT_CURRENCY = C.CONTRACT_CURRENCY AND B.CONTRACT_NUM = C.CONTRACT_NUM AND B.CONTRACT_LINE_NUM = C.CONTRACT_LINE_NUM AND B.CA_FEE_STATUS = C.CA_FEE_STATUS AND A.BUSINESS_UNIT = B.BUSINESS_UNIT AND A.PROJECT_ID = B.PROJECT_ID AND A.ACTIVITY_ID = B.ACTIVITY_ID AND A.ANALYSIS_TYPE = B.ANALYSIS_TYPE AND A.RESOURCE_TYPE = B.RESOURCE_TYPE AND A.RESOURCE_CATEGORY = B.RESOURCE_CATEGORY AND A.RESOURCE_SUB_CAT = B.RESOURCE_SUB_CAT AND A.BI_DISTRIB_STATUS = B.BI_DISTRIB_STATUS AND A.GL_DISTRIB_STATUS =
B.GL_DISTRIB_STATUS AND A.FOREIGN_CURRENCY = B.FOREIGN_CURRENCY AND A.CONTRACT_CURRENCY = B.CONTRACT_CURRENCY AND A.CONTRACT_NUM = B.CONTRACT_NUM AND A.CONTRACT_LINE_NUM = B.CONTRACT_LINE_NUM AND A.CA_FEE_STATUS = B.CA_FEE_STATUS AND B.PROCESS_INSTANCE = 51377796 AND C.IN_USE_FLAG = 'Y'
]';
h := SYS.SQLPROF_ATTR(
q'[BEGIN_OUTLINE_DATA]',
q'[IGNORE_OPTIM_EMBEDDED_HINTS]',
q'[OPTIMIZER_FEATURES_ENABLE('11.2.0.3')]',
q'[DB_VERSION('11.2.0.3')]',
q'[OPT_PARAM('_unnest_subquery' 'false')]',
q'[OPT_PARAM('optimizer_dynamic_sampling' 4)]',
q'[OPT_PARAM('_optimizer_cost_based_transformation' 'off')]',
q'[ALL_ROWS]',
q'[OUTLINE_LEAF(@"SEL$1")]',
q'[OUTLINE_LEAF(@"INS$1")]',
q'[FULL(@"INS$1" "PS_CA_SUM_RC_TAO4"@"INS$1")]',
q'[FULL(@"SEL$1" "A"@"SEL$1")]',
q'[INDEX(@"SEL$1" "B"@"SEL$1" ("PS_CA_SUM_TAO4"."PROCESS_INSTANCE" "PS_CA_SUM_TAO4"."BUSINESS_UNIT" "PS_CA_SUM_TAO4"."PROJECT_ID" "PS_CA_SUM_TAO4"."ACTIVITY_ID" "PS_CA_SUM_TAO4"."ANALYSIS_TYPE" "PS_CA_SUM_TAO4"."RESOURCE_TYPE" "PS_CA_SUM_TAO4"."RESOURCE_CATEGORY" "PS_CA_SUM_TAO4"."RESOURCE_SUB_CAT" "PS_CA_SUM_TAO4"."BI_DISTRIB_STATUS" "PS_CA_SUM_TAO4"."GL_DISTRIB_STATUS" "PS_CA_SUM_TAO4"."FOREIGN_CURRENCY" "PS_CA_SUM_TAO4"."CONTRACT_CURRENCY" "PS_CA_SUM_TAO4"."CONTRACT_NUM" ]',
q'[    "PS_CA_SUM_TAO4"."CONTRACT_LINE_NUM" "PS_CA_SUM_TAO4"."CA_FEE_STATUS"))]',
q'[FULL(@"SEL$1" "C"@"SEL$1")]',
q'[LEADING(@"SEL$1" "A"@"SEL$1" "B"@"SEL$1" "C"@"SEL$1")]',
q'[USE_NL(@"SEL$1" "B"@"SEL$1")]',
q'[NLJ_BATCHING(@"SEL$1" "B"@"SEL$1")]',
q'[USE_HASH(@"SEL$1" "C"@"SEL$1")]',
q'[SWAP_JOIN_INPUTS(@"SEL$1" "C"@"SEL$1")]',
q'[END_OUTLINE_DATA]');
:signature := DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE(sql_txt);
DBMS_SQLTUNE.IMPORT_SQL_PROFILE (
sql_text    => sql_txt,
profile     => h,
name        => 'coe_5gtxdqbx0d0c3_3552771247',
description => 'coe 5gtxdqbx0d0c3 3552771247 '||:signature||'',
category    => 'DEFAULT',
validate    => TRUE,
replace     => TRUE,
force_match => TRUE /* TRUE:FORCE (match even when different literals in SQL). FALSE:EXACT (similar to CURSOR_SHARING) */ );
END;
/
WHENEVER SQLERROR CONTINUE
SET ECHO OFF;
PRINT signature
PRO
PRO ... manual custom SQL Profile has been created
PRO
SET TERM ON ECHO OFF LIN 80 TRIMS OFF NUMF "";
SPO OFF;
PRO
PRO COE_XFR_SQL_PROFILE_5gtxdqbx0d0c3_3552771247 completed
But then we must manually change the table and index names from 4 to 5.
DECLARE
sql_txt CLOB;
h       SYS.SQLPROF_ATTR;
BEGIN
sql_txt := q'[
INSERT INTO PS_CA_SUM_RC_TAO5 (PROCESS_INSTANCE, BUSINESS_UNIT, PROJECT_ID, ACTIVITY_ID, ANALYSIS_TYPE, RESOURCE_TYPE, RESOURCE_CATEGORY, RESOURCE_SUB_CAT, BI_DISTRIB_STATUS, GL_DISTRIB_STATUS, FOREIGN_CURRENCY, CONTRACT_CURRENCY, CONTRACT_NUM, CONTRACT_LINE_NUM, CA_FEE_STATUS, RESOURCE_QUANTITY, FOREIGN_AMOUNT_BSE, FOREIGN_AMOUNT_INC, FOREIGN_AMOUNT, CONTRACT_AMT_BSE, CONTRACT_AMT_INC, CONTRACT_AMT, MIN_TRANS_DT, MAX_TRANS_DT, CAND_MIN_TRANS_DT, CAND_MAX_TRANS_DT) SELECT B.PROCESS_INSTANCE, A.BUSINESS_UNIT, A.PROJECT_ID, A.ACTIVITY_ID, A.ANALYSIS_TYPE, A.RESOURCE_TYPE, A.RESOURCE_CATEGORY, A.RESOURCE_SUB_CAT, A.BI_DISTRIB_STATUS, A.GL_DISTRIB_STATUS,
A.FOREIGN_CURRENCY, A.CONTRACT_CURRENCY, A.CONTRACT_NUM, A.CONTRACT_LINE_NUM, A.CA_FEE_STATUS, (A.RESOURCE_QUANTITY+B.RESOURCE_QUANTITY), A.FOREIGN_AMOUNT, B.FOREIGN_AMOUNT, (A.FOREIGN_AMOUNT+B.FOREIGN_AMOUNT), A.CONTRACT_AMT, B.CONTRACT_AMT, (A.CONTRACT_AMT+B.CONTRACT_AMT), A.MIN_TRANS_DT, A.MAX_TRANS_DT, B.CAND_MIN_TRANS_DT, B.CAND_MAX_TRANS_DT FROM PS_CA_PR_SUMM A, PS_CA_SUM_TAO5 B , PS_CA_SUM_IN_USE C WHERE B.BUSINESS_UNIT = C.BUSINESS_UNIT AND B.PROJECT_ID = C.PROJECT_ID AND B.ACTIVITY_ID = C.ACTIVITY_ID AND B.ANALYSIS_TYPE = C.ANALYSIS_TYPE AND B.RESOURCE_TYPE = C.RESOURCE_TYPE AND B.RESOURCE_CATEGORY = C.RESOURCE_CATEGORY AND B.RESOURCE_SUB_CAT =
C.RESOURCE_SUB_CAT AND B.BI_DISTRIB_STATUS = C.BI_DISTRIB_STATUS AND B.GL_DISTRIB_STATUS = C.GL_DISTRIB_STATUS AND B.FOREIGN_CURRENCY = C.FOREIGN_CURRENCY AND B.CONTRACT_CURRENCY = C.CONTRACT_CURRENCY AND B.CONTRACT_NUM = C.CONTRACT_NUM AND B.CONTRACT_LINE_NUM = C.CONTRACT_LINE_NUM AND B.CA_FEE_STATUS = C.CA_FEE_STATUS AND A.BUSINESS_UNIT = B.BUSINESS_UNIT AND A.PROJECT_ID = B.PROJECT_ID AND A.ACTIVITY_ID = B.ACTIVITY_ID AND A.ANALYSIS_TYPE = B.ANALYSIS_TYPE AND A.RESOURCE_TYPE = B.RESOURCE_TYPE AND A.RESOURCE_CATEGORY = B.RESOURCE_CATEGORY AND A.RESOURCE_SUB_CAT = B.RESOURCE_SUB_CAT AND A.BI_DISTRIB_STATUS = B.BI_DISTRIB_STATUS AND A.GL_DISTRIB_STATUS =
B.GL_DISTRIB_STATUS AND A.FOREIGN_CURRENCY = B.FOREIGN_CURRENCY AND A.CONTRACT_CURRENCY = B.CONTRACT_CURRENCY AND A.CONTRACT_NUM = B.CONTRACT_NUM AND A.CONTRACT_LINE_NUM = B.CONTRACT_LINE_NUM AND A.CA_FEE_STATUS = B.CA_FEE_STATUS AND B.PROCESS_INSTANCE = 51377796 AND C.IN_USE_FLAG = 'Y'
]';
h := SYS.SQLPROF_ATTR(
q'[BEGIN_OUTLINE_DATA]',
q'[IGNORE_OPTIM_EMBEDDED_HINTS]',
q'[OPTIMIZER_FEATURES_ENABLE('11.2.0.3')]',
q'[DB_VERSION('11.2.0.3')]',
q'[OPT_PARAM('_unnest_subquery' 'false')]',
q'[OPT_PARAM('optimizer_dynamic_sampling' 4)]',
q'[OPT_PARAM('_optimizer_cost_based_transformation' 'off')]',
q'[ALL_ROWS]',
q'[OUTLINE_LEAF(@"SEL$1")]',
q'[OUTLINE_LEAF(@"INS$1")]',
q'[FULL(@"INS$1" "PS_CA_SUM_RC_TAO5"@"INS$1")]',
q'[FULL(@"SEL$1" "A"@"SEL$1")]',
q'[INDEX(@"SEL$1" "B"@"SEL$1" ("PS_CA_SUM_TAO5"."PROCESS_INSTANCE" "PS_CA_SUM_TAO5"."BUSINESS_UNIT" "PS_CA_SUM_TAO5"."PROJECT_ID" "PS_CA_SUM_TAO5"."ACTIVITY_ID" "PS_CA_SUM_TAO5"."ANALYSIS_TYPE" "PS_CA_SUM_TAO5"."RESOURCE_TYPE" "PS_CA_SUM_TAO5"."RESOURCE_CATEGORY" "PS_CA_SUM_TAO5"."RESOURCE_SUB_CAT" "PS_CA_SUM_TAO5"."BI_DISTRIB_STATUS" "PS_CA_SUM_TAO5"."GL_DISTRIB_STATUS" "PS_CA_SUM_TAO5"."FOREIGN_CURRENCY" "PS_CA_SUM_TAO5"."CONTRACT_CURRENCY" "PS_CA_SUM_TAO5"."CONTRACT_NUM" ]',
q'[    "PS_CA_SUM_TAO5"."CONTRACT_LINE_NUM" "PS_CA_SUM_TAO5"."CA_FEE_STATUS"))]',
q'[FULL(@"SEL$1" "C"@"SEL$1")]',
q'[LEADING(@"SEL$1" "A"@"SEL$1" "B"@"SEL$1" "C"@"SEL$1")]',
q'[USE_NL(@"SEL$1" "B"@"SEL$1")]',
q'[NLJ_BATCHING(@"SEL$1" "B"@"SEL$1")]',
q'[USE_HASH(@"SEL$1" "C"@"SEL$1")]',
q'[SWAP_JOIN_INPUTS(@"SEL$1" "C"@"SEL$1")]',
q'[END_OUTLINE_DATA]');
:signature := DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE(sql_txt);
DBMS_SQLTUNE.IMPORT_SQL_PROFILE (
sql_text    => sql_txt,
profile     => h,
name        => 'coe_5gtxdqbx0d0c3_3552771247',
description => 'coe 5gtxdqbx0d0c3 3552771247 '||:signature||'',
category    => 'DEFAULT',
validate    => TRUE,
replace     => TRUE,
force_match => TRUE /* TRUE:FORCE (match even when different literals in SQL). FALSE:EXACT (similar to CURSOR_SHARING) */ );
END;
/

Or to take Arms against a Sea of statements,

The profile has advantage that it can be applied quickly without a code change.  It is the perfect tool for the DBA with a production performance problem. However, there are some other considerations.
  • If applying to statement that references a PS temp record then we need to apply the profile to all instances of the record (both non-shared instances and the shared instance).
  • We were lucky that we referenced instance 5 of two temporary records. However, you could get a situation where a statement references different instances of different temporary records.  So perhaps instance 5 of one table and instance 6 of another.  In which case, you might also get instance 6 of the first table and instance 5 of the other.  A SQL profile could be needed for each permutation.
  • Bear in mind also that some areas of PeopleSoft use dynamically generated SQL.  So you get similar SQL statements which are sufficiently different for the profile not to match.  
  • Any changes to the expansion of Application Engine and PeopleCode MetaSQL on upgrading PeopleTools, or potentially even patching, will also prevent matching.
  • There is also the challenge of dealing with code changes as the system is upgraded, patched and customised.  A small code change, perhaps just an extra field in the select clause, can result in a performance regression because the profile stops matching. Of course, this challenge is not limited to PeopleSoft systems! 
Profiles are likely to be effective if there are no PeopleSoft temporary records present.  So you can generally use them in COBOL and SQR processes and the on-line application (other than in on-line Application Engine processes).

Aye, there's the rub,

I would use a profile (or a set of profiles) as a short-term temporary fix that is easier to introduce into production, and then add hints to the source code and so fix all instances of the code, not just the ones that have been profiled. Of course, that does entail a code change, and everything that goes with that.  One strong argument against making code change is that you have to change the code again to remove or change the hint if it becomes unnecessary at some time in future after a significant change, such as an Oracle upgrade.  However, on balance, I think it is better than the scenario where the profile stops working one day without warning.

The rest is silence.

Unless you add a comment.

Wednesday, July 09, 2014

Introducing the Analytic Keep Clause for Effective-Dated/Sequence Queries in PeopleSoft

Those of us who work with PeopleSoft, and especially the HCM product, are all too familiar with the concept of effect-dated data, and the need to find data that was effective at a particular date.  PeopleSoft products have always made extensive use of correlated sub-queries to determine the required rows from an effective-dated record.

The JOB record is a the heart of HCM. It is both effective-dated and effective sequenced. I will use it for the demonstrations in this article. I am going to suggest an alternative, although Oracle-specific, SQL construction.

 Let's start by looking at the job data for an employee in the demo database. Employee KF0018 has 17 rows of data two concurrent jobs.  The question I am going to ask is "What was the annual salary for this employee on 11 February 1995?".  Therefore, I am interested in the rows marked below with the asterisks. 
column annual_rt format 999,999
SELECT  emplid, empl_rcd, effdt, effseq, action, deptid, currency_cd, annual_rt
FROM	ps_job j
WHERE	j.emplid = 'KF0018'
ORDER BY 1,2,3,4
/ 

EMPLID        EMPL_RCD EFFDT         EFFSEQ ACT DEPTID     CUR ANNUAL_RT
----------- ---------- --------- ---------- --- ---------- --- ---------
KF0018               0 12-JUN-83          0 HIR 13000      FRF   120,000
KF0018               0 01-JAN-84          0 PAY 13000      FRF   123,600
KF0018               0 01-JAN-85          0 PAY 13000      FRF   127,308
KF0018               0 01-JAN-86          0 PAY 13000      FRF   131,764
KF0018               0 01-JAN-87          0 PAY 13000      FRF   136,376
KF0018               0 01-JAN-88          0 PAY 13000      FRF   140,467
KF0018               0 01-JAN-89          0 PAY 13000      FRF   147,490
KF0018               0 22-JAN-95          0 PRO 13000      FRF   147,490
KF0018               0 22-JAN-95          1 PAY 13000      FRF   294,239 *
KF0018               0 22-JAN-96          0 PAY 13000      FRF   318,575
KF0018               0 01-JAN-98          0 PAY 13000      FRF   346,156
KF0018               0 01-JAN-00          0 DTA 13000      FRF   346,156
KF0018               0 01-JAN-02          0 PAY 13000      EUR    52,771
KF0018               1 01-NOV-89          0 ASG 21300      GBP    22,440
KF0018               1 31-DEC-93          0 ASC 21300      GBP    22,440
KF0018               1 01-JAN-94          0 ASG 12000      GBP    22,440 *
KF0018               1 31-DEC-95          0 ASC 10000      GBP    22,440

I will set statistics level to ALL so I can obtain detailed information about how the SQL statements execute:
ALTER SESSION SET statistics_level = ALL;

I extracted the execution plans and execution statistics with the following command
select * from table(dbms_xplan.display_cursor(null,null,'IOSTATS')) 

Typical PeopleSoft Platform Agnostic Construction

This is the usual way to construct the query in PeopleSoft. It is also valid on all databases platforms supported by PeopleSoft, not just Oracle. 
SELECT  emplid, empl_rcd, effdt, effseq, action, deptid, currency_cd, annual_rt
FROM	ps_job j
WHERE	j.effdt = ( 
        SELECT MAX (j1.effdt) FROM ps_job j1 
        WHERE j1.emplid = j.emplid 
        AND j1.empl_rcd = j.empl_rcd 
        AND j1.effdt <= TO_DATE('19950211','YYYYMMDD')) 
AND j.effseq = ( 
        SELECT MAX (j2.effseq) FROM ps_job j2 
        WHERE j2.emplid = j.emplid 
        AND j2.empl_rcd = j.empl_rcd 
        AND j2.effdt = j.effdt)
AND j.emplid = 'KF0018'
ORDER BY 1,2,3,4
/

EMPLID        EMPL_RCD EFFDT         EFFSEQ ACT DEPTID     CUR ANNUAL_RT
----------- ---------- --------- ---------- --- ---------- --- ---------
KF0018               0 22-JAN-95          1 PAY 13000      FRF   294,239
KF0018               1 01-JAN-94          0 ASG 12000      GBP    22,440

This required three access of indexes on the PS_JOB table, and two accesses of the table, using 26 consistent reads.
Plan hash value: 2299825310
----------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |         |      1 |        |      2 |00:00:00.01 |      26 |      2 |
|   1 |  SORT ORDER BY                    |         |      1 |      1 |      2 |00:00:00.01 |      26 |      2 |
|   2 |   NESTED LOOPS                    |         |      1 |      1 |      2 |00:00:00.01 |      26 |      2 |
|   3 |    NESTED LOOPS                   |         |      1 |      1 |      3 |00:00:00.01 |      21 |      2 |
|   4 |     VIEW                          | VW_SQ_1 |      1 |      1 |      2 |00:00:00.01 |      14 |      2 |
|*  5 |      FILTER                       |         |      1 |        |      2 |00:00:00.01 |      14 |      2 |
|   6 |       HASH GROUP BY               |         |      1 |      1 |      2 |00:00:00.01 |      14 |      2 |
|   7 |        TABLE ACCESS BY INDEX ROWID| PS_JOB  |      1 |      1 |     12 |00:00:00.01 |      14 |      2 |
|*  8 |         INDEX RANGE SCAN          | PS_JOB  |      1 |      1 |     12 |00:00:00.01 |       2 |      2 |
|   9 |     TABLE ACCESS BY INDEX ROWID   | PS_JOB  |      2 |      1 |      3 |00:00:00.01 |       7 |      0 |
|* 10 |      INDEX RANGE SCAN             | PSAJOB  |      2 |      1 |      3 |00:00:00.01 |       4 |      0 |
|* 11 |    VIEW PUSHED PREDICATE          | VW_SQ_2 |      3 |      1 |      2 |00:00:00.01 |       5 |      0 |
|* 12 |     FILTER                        |         |      3 |        |      3 |00:00:00.01 |       5 |      0 |
|  13 |      SORT AGGREGATE               |         |      3 |      1 |      3 |00:00:00.01 |       5 |      0 |
|* 14 |       FILTER                      |         |      3 |        |      5 |00:00:00.01 |       5 |      0 |
|* 15 |        INDEX RANGE SCAN           | PSAJOB  |      3 |      1 |      5 |00:00:00.01 |       5 |      0 |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter("J1"."EMPLID"='KF0018')
   8 - access("J1"."EMPLID"='KF0018' AND "J1"."SYS_NC00164$">=HEXTORAW('883CFDF4FEF8FEFAFF') )
       filter(SYS_OP_UNDESCEND("J1"."SYS_NC00164$")<=TO_DATE(' 1995-02-11 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))
  10 - access("J"."EMPLID"='KF0018' AND "ITEM_2"="J"."EMPL_RCD" AND
              "J"."SYS_NC00164$"=SYS_OP_DESCEND("MAX(J1.EFFDT)"))
       filter(SYS_OP_UNDESCEND("J"."SYS_NC00164$")="MAX(J1.EFFDT)")
  11 - filter(SYS_OP_UNDESCEND("J"."SYS_NC00165$")="MAX(J2.EFFSEQ)")
  12 - filter(COUNT(*)>0)
  14 - filter('KF0018'="J"."EMPLID")
  15 - access("J2"."EMPLID"='KF0018' AND "J2"."EMPL_RCD"="J"."EMPL_RCD" AND
              "J2"."SYS_NC00164$"=SYS_OP_DESCEND(SYS_OP_UNDESCEND("J"."SYS_NC00164$")))
       filter(SYS_OP_UNDESCEND("J2"."SYS_NC00164$")=SYS_OP_UNDESCEND("J"."SYS_NC00164$"))

This construction is also the reason you are required to set
_UNNEST_SUBQUERY=FALSE
on all PeopleSoft systems

Analytic Function and In-LineView/Sub-query Factor

I have seen people use a combination of analytic functions and in-line views to avoid having to use the correlated sub-query construction. This has been possible since Oracle 9i.
WITH X AS (
SELECT  emplid, empl_rcd, effdt, effseq, action, deptid, currency_cd, annual_rt
,	ROW_NUMBER() OVER (PARTITION BY emplid, empl_rcd 
                           ORDER BY effdt DESC, effseq DESC) myrowseq
FROM	ps_job j
WHERE	j.effdt <= TO_DATE('19950211','YYYYMMDD')
AND	j.emplid = 'KF0018'
)
SELECT 	emplid, empl_rcd, effdt, effseq, action, deptid, currency_cd, annual_rt
FROM 	x
WHERE	myrowseq = 1
ORDER BY 1,2,3,4
/

EMPLID        EMPL_RCD EFFDT         EFFSEQ ACT DEPTID     CUR ANNUAL_RT
----------- ---------- --------- ---------- --- ---------- --- ---------
KF0018               0 22-JAN-95          1 PAY 13000      FRF   294,239
KF0018               1 01-JAN-94          0 ASG 12000      GBP    22,440

We get the same result, but now the index is scanned just once and we only need 14 consistent reads, so it produces a significant improvement. However, it still includes a sort operation in addition to the window function. We have to create a sequence number field in the in-line view and filter by that in the final query.
Plan hash value: 1316906785
---------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |        |      1 |        |      2 |00:00:00.01 |      14 |
|   1 |  SORT ORDER BY                 |        |      1 |      1 |      2 |00:00:00.01 |      14 |
|*  2 |   VIEW                         |        |      1 |      1 |      2 |00:00:00.01 |      14 |
|*  3 |    WINDOW NOSORT               |        |      1 |      1 |     12 |00:00:00.01 |      14 |
|   4 |     TABLE ACCESS BY INDEX ROWID| PS_JOB |      1 |      1 |     12 |00:00:00.01 |      14 |
|*  5 |      INDEX RANGE SCAN          | PSAJOB |      1 |      1 |     12 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("MYROWSEQ"=1)
   3 - filter(ROW_NUMBER() OVER ( PARTITION BY "EMPLID","EMPL_RCD" ORDER BY
              "J"."SYS_NC00164$","J"."SYS_NC00165$")<=1)
   5 - access("J"."EMPLID"='KF0018' AND "J"."SYS_NC00164$">=HEXTORAW('883CFDF4FEF8FEFAFF')
              )
       filter(SYS_OP_UNDESCEND("J"."SYS_NC00164$")<=TO_DATE(' 1995-02-11 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))

Analytic Function Keep Clause

This form of the analytic functions is documented for the first time in 12c, but is available in 10g (my thanks to Tony Hasler for introducing me to it). It works by effectively keeping a running maximum value of the columns in the order by clause within in group.
SELECT emplid, empl_rcd
, MAX(effdt) KEEP (DENSE_RANK LAST ORDER BY effdt, effseq) AS effdt
, MAX(effseq) KEEP (DENSE_RANK LAST ORDER BY effdt, effseq) AS effseq
, MAX(action) KEEP (DENSE_RANK LAST ORDER BY effdt, effseq) AS action
, MAX(deptid) KEEP (DENSE_RANK LAST ORDER BY effdt, effseq) AS deptid
, MAX(currency_cd) KEEP (DENSE_RANK LAST ORDER BY effdt, effseq) AS currency_cd
, MAX(annual_rt) KEEP (DENSE_RANK LAST ORDER BY effdt, effseq) AS annual_rt
FROM ps_job j
WHERE	j.effdt <= TO_DATE('19950211','YYYYMMDD')
AND	j.emplid = 'KF0018'
GROUP BY emplid, empl_rcd
/

EMPLID        EMPL_RCD EFFDT         EFFSEQ ACT DEPTID     CUR ANNUAL_RT
----------- ---------- --------- ---------- --- ---------- --- ---------
KF0018               0 22-JAN-95          1 PAY 13000      FRF   294,239
KF0018               1 01-JAN-94          0 ASG 12000      GBP    22,440

Although this construction uses an additional consistent read, it has the advantage of not using either an inline view or a window function and does not sort the data.
Plan hash value: 1550496807
-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |        |      1 |        |      2 |00:00:00.01 |      15 |
|   1 |  SORT GROUP BY NOSORT        |        |      1 |      1 |      2 |00:00:00.01 |      15 |
|   2 |   TABLE ACCESS BY INDEX ROWID| PS_JOB |      1 |      1 |     12 |00:00:00.01 |      15 |
|*  3 |    INDEX RANGE SCAN          | PS_JOB |      1 |      1 |     12 |00:00:00.01 |       3 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("J"."EMPLID"='KF0018' AND "J"."SYS_NC00164$">=HEXTORAW('883CFDF3FEF8FEFAFF'
              ) )
       filter(SYS_OP_UNDESCEND("J"."SYS_NC00164$")<=TO_DATE(' 1995-02-12 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))

I think this construction could be useful in PeopleSoft.  At first glance the SQL appears more complicated, but it in this example it removed two correlated sub-queries. 

Using Analytic Functions in PS/Query

Of course you can code it anywhere where you can simply enter SQL as text.  However, it also has the advantage over the other analytic function construction that it can be coded in the PS/Query tool.  The analytic functions in the select caluse should be created in PS/Query expressions with the aggregate expression checkbox ticked.
Analytic 'Keep' function in PS/Query Aggregate Expression
Analytic Function in Aggregated Expression in Windows Client version of PS/Query 
The analytic functions can be selected in the PS/Query, and their lengths and titles can be tidied up.
Analytic PS/Query
PS/Query with Analytic 'Keep' Functions

This is the resulting SQL which is the same as before (with row level security added by PS/Query) and produces the same results.
SELECT A.EMPLID, A.EMPL_RCD, MAX(effdt) KEEP (DENSE_RANK LAST ORDER BY effdt, effseq)
, MAX(effseq) KEEP (DENSE_RANK LAST ORDER BY effdt, effseq), MAX(action) KEEP (DENSE_RANK LAST ORDER BY effdt, effseq), MAX(deptid) KEEP (DENSE_RANK LAST ORDER BY effdt, effseq), MAX(currency_cd) KEEP (DENSE_RANK LAST ORDER BY effdt, effseq), MAX(annual_rt) KEEP (DENSE_RANK LAST ORDER BY effdt, effseq)					
  FROM PS_JOB A, PS_EMPLMT_SRCH_QRY A1
  WHERE ( A.EMPLID = A1.EMPLID
    AND A.EMPL_RCD = A1.EMPL_RCD
    AND A1.OPRID = 'PS'
    AND ( A.EFFDT <= TO_DATE('1995-02-11','YYYY-MM-DD')
     AND A.EMPLID = 'KF0018' ) )
  GROUP BY  A.EMPLID,  A.EMPL_RCD

Thursday, January 30, 2014

Implementing Deferred Segment Creation After an Upgrade

I have written previously about Deferred Segment Creation. Each empty table and its indexes use only 64Kb (assuming an 8Kb blocksize and locally managed tablespaces), but in a PeopleSoft there can be be tens of thousands of such tables and that adds up to a saving worth making.

If you are upgrading your database to 11gR2 , you might want to make sure that you are using it.  Deferred segment creation was introduced in Oracle 11.2.0.2 and it became the default in 11.2.0.3.  However, any table created in a previous version will have a physical segment.

This problem could affect any system, but it also manifests itself in PeopleSoft in a particular way.

When you run the alter scripts in PeopleTools a table may be recreated.  If it is a regular table (record type 0) then the CREATE TABLE command will not specify a segment creation clause and so the segment creation will be deferred until rows are inserted.

CREATE TABLE PS_TL_IPT1 (PROCESS_INSTANCE DECIMAL(10) NOT NULL,
   EMPLID VARCHAR2(11) NOT NULL,
...
   INITIAL_SEQ_NBR DECIMAL(15) NOT NULL) TABLESPACE TLWORK STORAGE
 (INITIAL 40000 NEXT 100000 MAXEXTENTS UNLIMITED PCTINCREASE 0)
 PCTFREE 10 PCTUSED 80
/
However, from PeopleTools 8.51, Application Designer uses the Oracle delivered DBMS_METADATA package to extract the DDL to recreate the object from the actual object.  However, this behaviour only occurs for Temporary working storage tables (record type 7).  Yes, these are exactly the tables that would benefit most from defered segment creation because in many systems there are many unused temporary table instances.  If table was created under a version of the database prior to 11.2.0.2 then the segment will exist and DBMS_METADATA will generate the DDL with the SEGMENT CREATION IMMEDIATE clause. 

-- Create temporary table 
CREATE TABLE PSYPERSON (EMPLID VARCHAR2(11) NOT NULL,
...
   LAST_CHILD_UPDDTM tIMESTAMP) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 80 INITRANS 1 MAXTRANS 255
 NOCOMPRESS LOGGING
  STORAGE(INITIAL 40960 NEXT 106496 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "HRLARGE"
/
You can use DBMS_SPACE_ADMIN.DROP_EMPTY_SEGMENTS to remove the segments for any empty tables (and their indexes) for which the segment has been previously created.  There would be no harm in simply running this program for every table in the system.  If there are rows then DBMS_SPACE_ADMIN will take no action.

The following script identifies candidate tables where the statistics suggest that there are no rowa or where there are no statistics.  I am indebted to Tim Hall for the idea for this tip.

set serveroutput on 
BEGIN
 FOR i IN (
  SELECT owner, table_name
  FROM   all_tables
  WHERE  owner = 'SYSADM'
  AND    segment_created = 'YES'
  AND    temporary = 'N'
  AND   (num_rows = 0 OR num_rows IS NULL)
 ) LOOP
  dbms_output.put_line(i.owner||'.'||i.table_name);
  dbms_space_admin.drop_empty_segments (
    schema_name    => i.owner,
    table_name     => i.table_name);
 END LOOP;
END;
/
As this package drops the empty segments, the SEGMENT_CREATED column on USER_TABLES changes to NO and if you were to extract the DDL with DBMS_METADATA the SEGMENT CREATION clause would have changed to DEFERRED.

As soon as any data is inserted, the segment is created, SEGMENT_CREATED changes to YES and the DDL generated by DBMS_METADATA would have SEGMENT CREATION IMMEDIATE.

The result is that 64Kb of space (assuming a block size of 8Kb) will be freed up for each empty table and index segment that is dropped. Your mileage may vary, but in my demo HR database that is over 20000 tables and 25000 indexes. 2.7Gb isn't a vast amount these days, but it is an easy win.

Added 1.2.2014:
To answer Noons' question below.  So long as the table or partition doesn't have any rows, the segment will be dropped it will as if the segment creation had been deferred.  You don't have to do anything special to the table.  There is no problem applying this to any empty tables create with their segments.  Here is a simple test with my results on 11.2.0.3:

I will create a table and I have explicitly created the segment immediately, then I insert a row, commit the insert and delete the row.  I haven't even bothered to commit the delete.
SQL> create table t(a number) segment creation immediate;
SQL> insert into t values(42);
SQL> commit;
SQL> delete from t;
SQL> select segment_type, segment_name, tablespace_name from user_Segments where segment_name = 'T';

SEGMENT_TYPE       SEGMENT_NAME    TABLESPACE_NAME
------------------ --------------- ------------------------------
TABLE              T               PSDEFAULT

SQL> select table_name, segment_created from user_tables where table_name = 'T';

TABLE_NAME                     SEG
------------------------------ ---
T                              YES

SQL> select dbms_metadata.get_ddl('TABLE','T') from dual;

DBMS_METADATA.GET_DDL('TABLE','T')
--------------------------------------------------------------------------------
  CREATE TABLE "SYSADM"."T"
   ( "A" NUMBER
   ) SEGMENT CREATION IMMEDIATE


So, at the moment the segment exists, it has had rows in it, but they have been deleted and the table is empty. If I run DBMS_SPACE_ADMIN.DROP_EMPTY_SEGMENTS the segment is dropped.

SQL> execute dbms_space_admin.drop_empty_segments (user,'T');
SQL> select segment_type, segment_name, tablespace_name from user_Segments where segment_name = 'T';

no rows selected

SQL> select table_name, segment_created from user_tables where table_name = 'T';

TABLE_NAME                     SEG
------------------------------ ---
T                              NO

SQL> select dbms_metadata.get_ddl('TABLE','T') from dual;

DBMS_METADATA.GET_DDL('TABLE','T')
--------------------------------------------------------------------------------
  CREATE TABLE "SYSADM"."T"
   ( "A" NUMBER
   ) SEGMENT CREATION DEFERRED

Friday, January 03, 2014

Date to Timestamp Conversion during PeopleTools Upgrade

This blog posting describes a script to convert Oracle date columns to Timestamps as used from PeopleTools 8.50 but only rebilding those indexes that reference those columns, rather than drop and recreate every index in the system, thus producing a significant saving of time during the upgrade.

(A longer version of this arcticle is available on my website)

I am working on a PeopleSoft upgrade project.  We are going from PeopleTools 8.49 to 8.53.  One of the things that happens is that some date columns in the Oracle database become timestamps.

Timestamps were introduced by Oracle in version 10g of the database, and provide the ability to store times accurate to the nanosecond (although the default is microsecond). Dates are accurate to the whole second.

There are 3 types of temporal column in PeopleSoft as defined on PSDBFIELD.  Prior to PeopleTools 8.50 they all become Oracle data columns in the database.  However, from PeopleTools 8.50; Time and DateTime fields are built as TimeStamp columns.if bit 5 of PSSTATUS.DATABASE_OPTIONS (value 32) is set.

PeopleTools Field Type
Database Column Type
PeopleTools <= 8.49 PeopleTools >= 8.50
4 Date DATE DATE
5 Time DATE TIMESTAMP
6 DateTime DATE TIMESTAMP

Timestamps must be handled differently to dates in SQL.  Some date arithmetic must be done differently, in particular the difference between two timestamps is a timestamp rather than a number of days.  Therefore this setting also controls how PeopleCode date macros expand on Oracle.

During the upgrade, PeopleSoft Change Assistant simply alters all the Time and DateTime columns from dates to timestamps.  This generally works well.  The data value doesn't appear to get longer, so the block doesn't run out of room leading to row migratition, and so it isn't necessary to rebuild every table that is affected.

However, there are some limitations. If the column being converted to a timestamp falls into one of the following categories you will get an error.
  • The column is a key column in a function-based index.
  • The table or index is partitioned by the column.
The functional key index issue has not affected many customers because the upgrade template drops all the indexes before altering the tables and rebuilding them again.

However, dropping and recreating all these indexes can be very time consuming and increases the duration of the outage required to perform the upgrade.  This has been my incentive to find a better way.

Function-Based Indexes
PeopleSoft Application Desginer defines some key and search fields as descending.  The rows in components and the results of search dialogue are sorted on the key fields in the order specified.  Application Designer then indexes these columns in descending order (prior to PeopleTools 8 and since PeopleTools 8.47).  If any column in an Oracle index is in decending order the index is created as a function-based index.  Consequently, there can be a lot of descending indexes in a PeopleSoft system!  HR systems are particularly affected because many tables are effective-dated, and the field EFFDT is usually a descending key field.

It is not possible to alter a column to a timestamp if it appears anywhere in a function-based index. You get the following error:
ORA-30556: functional index is defined on the column to be modified 

Partitioning
Partitioning is not something that you usually encounter in a vanilla PeopleSoft system, but it can be added by customisation.  You generate the necessary DDL yourself if you want to use it. However, from PeopleTools 8.51 Application Designer will preserve existing partitioning.

In the system on which I am working, when partitioned the audit tables by AUDIT_STAMP which is a DateTime field.
ORA-14060: data type or length of an table partitioning column may not be changed 
We have had no alternative but to rebuild these tables and repopulate the data. This has also dealt with all locally partitioned indexes.

We have also found that we have one global index partitioned on a timestamp.
ORA-14061: data type or length of an index partitioning column may not be changed 
We also have had to drop this index in order to alter the table.

My Approach
We have had no alternative but to rebuild and repopulate our partitioned audit tables which are partitioned by a DateTime field.  However, that is what we did when we first partitioned them.  The scripts are very similar to those generated by Application Designer.  The table is renamed, a new one is built, and the data is copied.  In our case these scripts are built with a PL/SQL utility.  This also addressed the need to rebuild the locally partitioned indexes..

To minimize the number of indexes which must be rebuilt I have written a PL/SQL script (http://www.go-faster.co.uk/scripts/gfc_desc_timestamp_index.sql) that:
  • identifies the indexes that need to be dropped.
  • captures the DDL to recreate the indexes using DBMS_METADATA and stores it in a table,
  • drops the indexes,
  • alters the columns that cannot be altered with the index in place,
  • recreates the index.
The script successfully handles partitioned function-based indexes.

On this particular HR system we only rebuilt about 400 indexes instead of over 10000. Now the standard PeopleSoft upgrade template can be run without dropping or recreating any further indexes.