Friday, November 10, 2023

Prioritising Scheduled Processes by Operator ID/Run Control

Batch processing is like opera (and baseball) - "It ain't over till the fat lady sings".  Users care about when it starts and when it finishes.  If the last process finishes earlier, then that is an improvement in performance. 

This note describes a method of additionally prioritising processes queued to run on the process scheduler in PeopleSoft by their requesting operator ID and run control.  Where processing consists of more instances of the same process than can run concurrently, it can be used to make the process scheduler run longer-running processes before shorter-running processes that were scheduled earlier, thus completing batch processing earlier.

In PeopleSoft, without customisation, it is only possible to prioritise processes queued to run on the process scheduler by assigning a priority to the process definition or their process category.  Higher priority processes are selected to be run in preference to lower priorities.  Otherwise, processes are run in the order of the time at which they are requested to run.

Problem Statement

During an overnight batch, many nVision report books are scheduled to run on the Windows process schedulers by one of several specific batch operator IDs.  Many more reports are scheduled than can run concurrently, so some execute while others queue.  Inevitably, the reports have widely varying execution times.  The maximum concurrency of the nVision report book (RPTBOOK) process definition has been set, and the Oracle database resource manager has also been configured, to prevent too many of these processes from overloading the database.

CPU Utilisation of BatchThis chart shows the database activity when the batch runs.  We often see what has come to be called the 'long tail' while we wait for just a few long-running processes to complete.


The next chart shows the processing time of each nVision report process.  The blue bars run from when the started to when it ended, ordered by start time.   The clear boxes below run from the time when it was requested to when it started, thus showing the period for which the processes were queued on a process scheduler, but were blocked because the maximum number of processes were already processing.
Process Map (without prioritisation)
All these processes run with the same priority because they are the same process definition.  Some long-running jobs execute earlier in the batch simply because they were scheduled earlier, but others that started later, run on beyond the end of the batch.  
It would be better if the longest-running processes were executed earlier, irrespective of the order in which they were requested.  Thus the shorter processes can run later as slots on the scheduler become free, and thus all processes should finished both closer together and earlier.
There is nothing delivered in the PeopleSoft process scheduler configuration that will let you assign different priorities to different executions of the same process.  In PeopleSoft, only three priorities are defined PRCSPRIORITY (1=Low, 5=Medium, 9=High) on the process definition and the server category.  These priorities are transferred to the process request queue (PSPRCSQUE.PRCSPRTY).  
If we could put our own priority into that column we could control the priority of the request.  

Solution

Introduce a database trigger that fires on insert into PSPRCSQUE and sets a priority specified on a new metadata table.  PRCSPRTY is not validated by PeopleSoft, and therefore any value can be specified.  
The files are available in a Github repository davidkurtz/psprcspty. The exact metadata will vary with the use case and requirements, but I provided some examples of how it might be generated.

Metadata Table: PS_XX_GFCPRCSPRTY

We need a table that will hold the priority for each combination of process type, process name, operation ID, and run control ID.  A corresponding record should be created using the Application Designer project in the GitHub repository.

create table sysadm.ps_xx_gfcprcsprty
(prcstype  VARCHAR2(30 CHAR) NOT NULL
,prcsname  VARCHAR2(12 CHAR) NOT NULL
,oprid     VARCHAR2(30 CHAR) NOT NULL
,runcntlid VARCHAR2(30 CHAR) NOT NULL
,prcsprty  NUMBER NOT NULL
--------------------optional columns
,avg_duration NUMBER NOT NULL
,med_duration NUMBER NOT NULL
,max_duration NUMBER NOT NULL
,cum_duration NUMBER NOT NULL
,tot_duration NUMBER NOT NULL
,num_samples  NUMBER NOT NULL
) tablespace ptapp;

create unique index sysadm.ps_xx_gfcprcsprty
on sysadm.ps_xx_gfcprcsprty(prcstype, prcsname, oprid, runcntlid) 
tablespace psindex compress 3;

Trigger Before Insert into PSPRCSQUE

As processes are scheduled in PeopleSoft, a row is inserted into the process scheduler queue table PSPRCSQUE.  A trigger will be created on this table that fires after the insert.  It will look for a matching row on the metadata table, PS_XX_GFCPRCSPRTY for the combination of process type, process name, operator ID, and run control ID.  If found, the trigger will assign the specified priority to the process request.  Otherwise, it will take no action.

CREATE OR REPLACE TRIGGER sysadm.psprcsque_set_prcsprty
BEFORE INSERT ON sysadm.psprcsque
FOR EACH ROW
WHEN (new.prcsname = 'RPTBOOK')
DECLARE
  l_prcsprty NUMBER;
BEGIN
  SELECT prcsprty
  INTO   l_prcsprty
  FROM   ps_xx_gfcprcsprty
  WHERE  prcstype = :new.prcstype
  AND    prcsname = :new.prcsname
  AND    oprid = :new.oprid
  AND    runcntlid = :new.runcntlid;
 
  :new.prcsprty := l_prcsprty;
EXCEPTION
  WHEN no_data_found THEN NULL;
  WHEN others THEN NULL;
END;
/
show errors

In this case, I am only assigning priorities to RPTBOOK processes, so I have added a when clause to the trigger so that it only fires for RPTBOOK process requests.  This can either be changed for other processes or removed entirely.

Priority Metadata

How the priorities should be defined will depend on the specific use case.  In some cases, you may choose to create a set of metadata that remains unchanged.

In this case, the objective is that the processes to take the longest to run should be executed first.  Therefore, I decided that the priority of each nVision report book process (by operator ID and run control ID) will be determined by the median elapsed execution time in the last two months.  The priorities are allocated such that the sum of the median execution times for each priority will be as even as possible.  

I have created a PL/SQL procedure GFCPRCSPRIORITY to truncate the metadata table and then repopulate it using a query on the process scheduler table (although, an Application Engine program could have been written to do this instead).  Some details of that query will vary with the exact use case. The procedure is executed daily, thus providing a feedback loop so if the run time varies over time, or new processes are added to the batch, it will be reflected in the priorities.

REM nvision_prioritisation_by_cumulative_runtime.sql

set serveroutput on
create or replace procedure sysadm.gfcprcspriority as
  PRAGMA AUTONOMOUS_TRANSACTION; --to prevent truncate in this procedure affecting calling session
  l_hist INTEGER := 61 ; --consider nVision processes going back this many days
begin
  EXECUTE IMMEDIATE 'truncate table ps_xx_gfcprcsprty';

--populate priorty table with known nVision processes 
insert into ps_xx_gfcprcsprty
with r as (
select r.prcstype, r.prcsname, r.prcsinstance, r.oprid, r.runcntlid, r.runstatus, r.servernamerun
, CAST(r.rqstdttm AS DATE) rqstdttm
, CAST(r.begindttm AS DATE) begindttm
, CAST(r.enddttm AS DATE) enddttm
from t, psprcsrqst r
  inner join ps.psdbowner p on r.dbname = p.dbname -- in test exclude any history copied from another database
where r.prcstype like 'nVision%' --limit to nVision processes
and r.prcsname like 'RPTBOOK' -- limit to report books
and r.enddttm>r.begindttm  --it must have run to completion
and r.oprid IN('NVISION','NVISION2','NVISION3','NVISION4')  --limit to overnight batch operator IDs
and r.begindttm >= TRUNC(SYSDATE)+.5-l_hist --consider process going back l_hist days from midday today
and r.runstatus = '9' --limit to successful processes
and r.begindttm BETWEEN ROUND(r.begindttm)-5/24 AND ROUND(r.begindttm)+5/24 --started between 7pm and 5am
), x as (
select r.*, CEIL((enddttm-begindttm)*1440) duration -–rounded up to the next minute
from r
), y as (
select prcstype, prcsname, oprid, runcntlid
, AVG(duration) avg_duration
, MEDIAN(CEIL(duration)) med_duration 
, MAX(duration) max_duration
, SUM(CEIL(duration)) sum_duration
, COUNT(*) num_samples
from x
group by prcstype, prcsname, oprid, runcntlid
), z as (
select y.* 
, sum(med_duration) over (order by med_duration rows between unbounded preceding and current row) cum_duration 
, sum(med_duration) over () tot_duration
from y
)
select prcstype, prcsname, oprid, runcntlid 
, avg_duration, med_duration, max_duration, cum_duration, tot_duration, num_samples
--, CEIL(LEAST(tot_duration,cum_duration)/tot_duration*3)*4-3 prcsprty  --3 priorities
, CEIL(LEAST(tot_duration,cum_duration)/tot_duration*9) prcsprty  --9 priorities
--, DENSE_RANK() OVER (order by med_duration) prcsprty --unlimited priorities
from z
order by prcsprty, cum_duration;

  dbms_output.put_line(sql%rowcount||' rows inserted');
  commit;

end gfcprcspriority;
/
show errors
In testing, I found that using just the 3 delivered levels of priority was not sufficiently granular to prioritise the jobs adequately, so I chose to use 9 levels (1 to 9).  The process priority on PRCSQUE is not validated, so I can use any value.  I also found I could just rank the processes from 1 to n by duration, and that would also work.

It is possible to create additional priority levels for process categories (see also More Process Priority Levels for the Process Scheduler), but that still only works for prioritising different processes over each other.

Metadata

This is the metadata produced on a test system by the above query.  It will vary depending on what has been run recently, and how it performed.  There are more, shorter processes in the lower priority groups, and fewer, longer processes in the higher priority groups.  

PRCSTYPE                       PRCSNAME     OPRID        RUNCNTLID                        PRCSPRTY
------------------------------ ------------ ------------ ------------------------------ ----------
…
nVision-ReportBook             RPTBOOK      NVISION2     NVS_RPTBK_XXX1                          1
nVision-ReportBook             RPTBOOK      NVISION2     NVS_RPTBK_XXX3                          1
nVision-ReportBook             RPTBOOK      NVISION2     NVS_RPTBK_LLLL8                         1
…
nVision-ReportBook             RPTBOOK      NVISION2     NVS_RPTBK_LLLL9                         2
nVision-ReportBook             RPTBOOK      NVISION      NVS_RPTBOOK_STAT8                       2
nVision-ReportBook             RPTBOOK      NVISION      NVS_RPTBOOK_STAT1                       2
…
nVision-ReportBook             RPTBOOK      NVISION2     NVS_RPTBK_TEMPXX                        6
nVision-ReportBook             RPTBOOK      NVISION      NVS_RPTBOOK_3                           6
nVision-ReportBook             RPTBOOK      NVISION      NVS_RPTBOOK_17                          6
nVision-ReportBook             RPTBOOK      NVISION      NVS_RPTBOOK_STAT4                       7
nVision-ReportBook             RPTBOOK      NVISION      NVS_RPTBOOK_28                          7
nVision-ReportBook             RPTBOOK      NVISION2     NVS_RPTBK_INCXXX                        8
nVision-ReportBook             RPTBOOK      NVISION2     NVS_RPTBK_MORYYY1                       8
nVision-ReportBook             RPTBOOK      NVISION      NVS_RPTBOOK_24                          9
nVision-ReportBook             RPTBOOK      NVISION      NVS_RPTBOOK_16                          9

A Test Script

This test script inserts some dummy rows into PSPRCSQUE to check whether a priority is assigned by the trigger. The insert is then rolled back.

INSERT INTO psprcsque (prcsinstance, prcstype, prcsname, oprid, runcntlid)
VALUES (-42, 'nVision-ReportBook', 'RPTBOOK', 'NVISION','NVS_RPTBOOK_17');
INSERT INTO psprcsque (prcsinstance, prcstype, prcsname, oprid, runcntlid)
VALUES (-43, 'nVision-ReportBook', 'RPTBOOK', 'NVISION','NVS_RPTBOOK_STAT1');
select prcsinstance, prcstype, prcsname, oprid, runcntlid, prcsprty from psprcsque where prcsinstance IN(-42,-43);
rollback;

You can see that it was successful because priorities 2 and 7 were assigned.

PRCSINSTANCE PRCSTYPE                       PRCSNAME     OPRID        RUNCNTLID                        PRCSPRTY
------------ ------------------------------ ------------ ------------ ------------------------------ ----------
         -43 nVision-ReportBook             RPTBOOK      NVISION      NVS_RPTBOOK_STAT1                       2
         -42 nVision-ReportBook             RPTBOOK      NVISION      NVS_RPTBOOK_17                          7

Monitoring Script

This query in script process_prioritisation_by_cumulative_runtime_report.sql reports on the average, median, and cumulative median execution time for each nVision process that ran to success during the overnight processing window as calculated by the package GFCPRCSPRIORITY and stored in PS_XX_GFCPRCSPRTY.  It also compares that to the priority and last actual run time for that process.

Example Output

                                                                                                      Cum.                                            
                                                                        Average   Median            Median    Total         Last Run   Actual                   
                                                                  Prcs Duration Duration Duration Duration Duration     Num  Process Duration Duration Duration Priorty
PRCSTYPE             PRCSNAME   OPRID        RUNCNTLID            Prty   (mins)   (mins)   (mins)   (mins)   (mins) Samples Priority   (mins)    Diff    % Diff    Diff
-------------------- ---------- ------------ -------------------- ---- -------- -------- -------- -------- -------- ------- -------- -------- -------- -------- -------
nVision-ReportBook   RPTBOOK    NVISION      NVS_RPTBOOK_4           9    90.65      131      209     1834     1997      23        6      189       58       44       3 
nVision-ReportBook   RPTBOOK    NVISION      NVS_RPTBOOK_16          9   159.17      163      209     1997     1997      23        9      177       14        9       0

nVision-ReportBook   RPTBOOK    NVISION      NVS_RPTBOOK_14          8    89.26      127      215     1703     1997      23        6      167       40       31       2
nVision-ReportBook   RPTBOOK    NVISION      NVS_RPTBOOK_24          8   115.87      117      165     1576     1997      23        9      144       27       23      -1

nVision-ReportBook   RPTBOOK    NVISION2     NVS_RPTBK_MORYYY1       7    93.13       85      165     1459     1997      23        8      158       73       86      -1
nVision-ReportBook   RPTBOOK    NVISION      NVS_RPTBOOK_28          7    88.30       80      172     1374     1997      23        8      108       28       35      -1

nVision-ReportBook   RPTBOOK    NVISION2     NVS_RPTBK_INCXXX        6    83.61       79      149     1294     1997      18        7      118       39       49      -1
nVision-ReportBook   RPTBOOK    NVISION      NVS_RPTBOOK_17          6    70.96       69      105     1143     1997      23        7       81       12       17      -1
nVision-ReportBook   RPTBOOK    NVISION      NVS_RPTBOOK_STAT4       6    68.00       72       81     1215     1997       8        7       81        9       13      -1

nVision-ReportBook   RPTBOOK    NVISION2     NVS_RPTBK_MMMMMM        5    52.45       46      119      914     1997      22        5       91       46      100       0
nVision-ReportBook   RPTBOOK    NVISION2     NVS_RPTBK_TEMPXX        5    50.48       49      104      963     1997      23        5       94       45       92       0
nVision-ReportBook   RPTBOOK    NVISION      NVS_RPTBOOK_3           5    55.52       55       99     1018     1997      23        6       79       24       44      -1
nVision-ReportBook   RPTBOOK    NVISION      NVS_RPTBOOK_1           5    47.70       56      137     1074     1997      23        4       56        0        0       1
…

Monitoring Query

The query in prcsmap.sql is used to produce the data for a map of the processes, showing request time, time spent queuing, and time spent executing.  It is the basis of the second chart above. I normally run this in SQL Developer and export the data as an Excel workbook.  There is an example spreadsheet in the Github repository.

9 levels of Prioritisation

With prioritisation, we can see that the long-running jobs with higher priority ran earlier.  

We can also see that some of the higher-priority jobs that are scheduled later are running earlier than those scheduled earlier, and are thus finishing earlier.

There is no longer any tail of processing.  Instead, load drops quickly at the end of the batch, and the batch as a whole finishes earlier.