Friday, June 27, 2025

Optimising Journal Line Queries: 4. Compression

This is the fourth of five articles that examine the challenges posed by typical queries on the journal line table (PS_JRNL_LN).

  1. Problem Statement
  2. Exadata System Statistics
  3. Partitioning
  4. Compression
  5. Conclusion
If it had been possible to referentially partition PS_JRNL_LN by FISCAL_YEAR, then the sample query shown in earlier posts would have been able to prune partitions by fiscal year.  This would have significantly reduced the cost of the full scan.  
Instead, if the sample query has to full scan PS_JRNL_LN, it has to scan all the partitions in PS_JRNL_LN and then filter them against the keys retrieved from PS_JRNL_HEADER.  The only way to reduce the cost of the full scan is to reduce the number of blocks being scanned by purge and compression.  On Exadata, Hybrid Columnar Compression (HCC) will achieve a much higher compression ratio, and unwanted compression units will be eliminated efficiently.
To meet reporting requirements, Financials systems inevitably need to keep journal data going back several years.  Nonetheless, purging unneeded historical journals as aggressively as permissible is the first step in improving or at least preserving query performance.

Compression

Compressing the historical journal line partitions will further reduce the number of blocks in the segments and reduce the cost of the full scan in the query, thus making the optimiser more likely to switch away from the nested loop join to the full scan/Bloom filter/hash join.
In Oracle, basic compression and Hybrid Columnar Compression (HCC) are well suited to data warehouse applications, but they also have application in OLTP systems.  Compression occurs during direct-path insert and segment reorganisation operations and does not apply to anything modified by PeopleSoft processes that use conventional DML.  
Basic compression is available on any Oracle database; essentially, it avoids storing the same data values many times in the same data block.  Hybrid Columnar Compression (HCC) is available on Exadata. Different HCC compression levels use different compression algorithms.  DML on compressed data will decompress it.
I do not recommend using Advanced Compression to compress current periods due to the impact on day-to-day processing.

Which Partitions are Static?

DBA_TAB_MODIFICATIONS shows the number of DML operations in each partition since statistics were last collected.
SELECT p.partition_position, m.*, p.high_value
FROM dba_tab_modifications m
  INNER JOIN dba_tab_partitions p 
    ON p.table_owner = m.table_owner AND p.table_name = m.table_name AND p.partition_name = m.partition_name
WHERE m.table_owner = 'SYSADM' AND m.table_name = 'PS_JRNL_LN'
ORDER BY 1
/
This report was generated in December 2024.  Most of the updates are in the current and previous monthly periods.  There are almost no updates that are older than a year.
Part Table                                                                                              Drop                                                                                 
Pos# Owner    TABLE_NAME   PARTITION_NAME       S   INSERTS   UPDATES   DELETES TIMESTAMP           TRU Segs HIGH_VALUE                                                                      
---- -------- ------------ -------------------- - --------- --------- --------- ------------------- --- ---- --------------------------------------------------------------------------------
…
  40 SYSADM   PS_JRNL_LN   JRNL_LN_202212                 0         0         0 13/10/2024 10:08:56 NO     0 TO_DATE(' 2023-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  41 SYSADM   PS_JRNL_LN   JRNL_LN_202301                 0         0         0 29/09/2024 10:01:16 NO     0 TO_DATE(' 2023-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  42 SYSADM   PS_JRNL_LN   JRNL_LN_202302                 0         0         0 29/09/2024 10:01:16 NO     0 TO_DATE(' 2023-03-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  43 SYSADM   PS_JRNL_LN   JRNL_LN_202303                 0         0         0 29/09/2024 10:01:16 NO     0 TO_DATE(' 2023-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  44 SYSADM   PS_JRNL_LN   JRNL_LN_202304                 0         0         0 29/09/2024 10:01:16 NO     0 TO_DATE(' 2023-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  45 SYSADM   PS_JRNL_LN   JRNL_LN_202305                 0         0         0 29/09/2024 10:01:16 NO     0 TO_DATE(' 2023-06-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  46 SYSADM   PS_JRNL_LN   JRNL_LN_202306                 0         0         0 29/09/2024 10:01:16 NO     0 TO_DATE(' 2023-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  47 SYSADM   PS_JRNL_LN   JRNL_LN_202307                 0         0         0 29/09/2024 10:01:16 NO     0 TO_DATE(' 2023-08-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  48 SYSADM   PS_JRNL_LN   JRNL_LN_202308                 0         0         0 29/09/2024 10:01:16 NO     0 TO_DATE(' 2023-09-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  49 SYSADM   PS_JRNL_LN   JRNL_LN_202309                 0         0         0 13/10/2024 10:08:56 NO     0 TO_DATE(' 2023-10-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  50 SYSADM   PS_JRNL_LN   JRNL_LN_202310                 0         0         0 27/10/2024 10:59:45 NO     0 TO_DATE(' 2023-11-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  51 SYSADM   PS_JRNL_LN   JRNL_LN_202311                 0         0         0 29/09/2024 10:01:16 NO     0 TO_DATE(' 2023-12-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  52 SYSADM   PS_JRNL_LN   JRNL_LN_202312                34       193        34 10/12/2024 14:21:38 NO     0 TO_DATE(' 2024-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA

  53 SYSADM   PS_JRNL_LN   JRNL_LN_202401             42374    127736       210 12/12/2024 05:27:31 NO     0 TO_DATE(' 2024-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  54 SYSADM   PS_JRNL_LN   JRNL_LN_202402             34803     92215         0 12/12/2024 05:26:30 NO     0 TO_DATE(' 2024-03-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  55 SYSADM   PS_JRNL_LN   JRNL_LN_202403             54940    166263         0 12/12/2024 05:12:29 NO     0 TO_DATE(' 2024-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  56 SYSADM   PS_JRNL_LN   JRNL_LN_202404              5900     13730         0 13/12/2024 05:29:32 NO     0 TO_DATE(' 2024-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  57 SYSADM   PS_JRNL_LN   JRNL_LN_202405              6151     13869         0 13/12/2024 05:31:06 NO     0 TO_DATE(' 2024-06-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  58 SYSADM   PS_JRNL_LN   JRNL_LN_202406             18317     58263         0 13/12/2024 16:15:49 NO     0 TO_DATE(' 2024-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  59 SYSADM   PS_JRNL_LN   JRNL_LN_202407           5067792  14937405         0 13/12/2024 16:02:36 NO     0 TO_DATE(' 2024-08-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  60 SYSADM   PS_JRNL_LN   JRNL_LN_202408           5217744  15378822         0 13/12/2024 18:02:57 NO     0 TO_DATE(' 2024-09-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  61 SYSADM   PS_JRNL_LN   JRNL_LN_202409             65389    243360       160 13/12/2024 12:45:25 NO     0 TO_DATE(' 2024-10-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  62 SYSADM   PS_JRNL_LN   JRNL_LN_202410             44839    152210         0 13/12/2024 00:28:54 NO     0 TO_DATE(' 2024-11-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  63 SYSADM   PS_JRNL_LN   JRNL_LN_202411          28279594  53637873  27478940 13/12/2024 18:18:00 NO     0 TO_DATE(' 2024-12-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  64 SYSADM   PS_JRNL_LN   JRNL_LN_202412          34761590  53485631  27484239 13/12/2024 19:16:11 NO     0 TO_DATE(' 2025-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA

  65 SYSADM   PS_JRNL_LN   JRNL_LN_202501            137138    473452         0 13/12/2024 19:18:09 NO     0 TO_DATE(' 2025-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORI1
  66 SYSADM   PS_JRNL_LN   JRNL_LN_202502                 0         0         0 10/11/2024 10:08:21 NO     0 TO_DATE(' 2025-03-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  67 SYSADM   PS_JRNL_LN   JRNL_LN_202503               466         0         0 13/12/2024 03:59:20 NO     0 TO_DATE(' 2025-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  68 SYSADM   PS_JRNL_LN   JRNL_LN_202504                 0         0         0 17/11/2024 10:03:01 NO     0 TO_DATE(' 2025-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
  69 SYSADM   PS_JRNL_LN   JRNL_LN_202505                 0         0         0 17/11/2024 10:03:01 NO     0 TO_DATE(' 2025-06-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
…
Therefore, compressing journal lines in the current or previous fiscal years is not viable, as subsequent updates would destroy the compression. However, I can look at explicitly compressing older partitions.  

Attribute Clustering

Attribute Clustering sorts or clusters the rows by certain columns.  Like compression, this is declarative and will only take effect during direct-path load or segment reorganisation (such as a compression operation).  It will not take effect during normal DML.  It can be defined at table level and will be implemented during compression.
I have defined attribute clustering on PS_JRNL_LN to sort the rows by the unique key columns of its parent record, PS_JRNL_HEADER.
ALTER TABLE ps_jrnl_ln ADD CLUSTERING BY LINEAR ORDER (business_unit, journal_id, journal_date, unpost_Seq);
ALTER TABLE ps_jrnl_ln MODIFY CLUSTERING YES ON LOAD YES ON DATA MOVEMENT;
I have found it produces a small (approximately 2-5% of the original size) improvement in the resulting compression, further reducing the cost of the full table scan.  So, it is a marginal gain for no additional cost.

Hybrid Columnar Compression (HCC)

I have chosen to compress partitions older than one complete fiscal year with QUERY LOW, and those than 3 complete fiscal years with QUERY HIGH.
Each historical partition can be compressed using an online compress operation.  This can be done in parallel.  It is important to specify that it maintains the indexes so that they remain valid and usable after the operation.   This approach will not lock PS_JRNL_LN, and so the application can continue to run during this process.
…
ALTER TABLE ps_jrnl_ln MOVE PARTITION jrnl_ln_202012 COMPRESS FOR QUERY HIGH UPDATE INDEXES ONLINE PARALLEL;
ALTER TABLE ps_jrnl_ln MOVE PARTITION jrnl_ln_202101 COMPRESS FOR QUERY LOW UPDATE INDEXES ONLINE PARALLEL;
…
The statistics on the newly compressed partitions should be updated, as well as the global statistics.  

Statistics

Optimizer statistics have to be regathered on the freshly compressed partitions; otherwise, the reduction in the cost of the full scan will not occur.  
On some systems, statistics collection on the very largest tables, including PS_JRNL_LN, may not be completed within the regular maintenance window.  Then, Oracle may repeatedly attempt to collect statistics on these segments, and other statistics can become stale too.  Instead, I usually lock the statistics on such tables (to remove them from the maintenance window job) and create a specific recurring job to collect statistics at a convenient time (e.g. at the weekend).
An interim option is to simply manually update the number of blocks in the partition statistics to the number of blocks in the compressed segment (if it is lower), and recalculate the total number of blocks in the whole table for the global statistics.
set serveroutput on timi on
DECLARE 
  l_table_name VARCHAR2(18) := 'PS_JRNL_LN';
  l_part_update BOOLEAN := FALSE;
BEGIN
  FOR i IN(
    select t.table_name, t.partition_name, t.num_rows, t.blocks stat_blocks, s.blocks seg_blocks
    ,      s.tablespace_name, p.compress_for, t.num_rows/NULLIF(LEAST(t.blocks,s.blocks),0) rpb
    from user_segments s
      inner join user_tab_partitions p ON p.table_name = s.segment_name AND p.partition_name = s.partition_name
      inner join user_tab_statistics t ON s.segment_name = t.table_name AND s.partition_name = t.partition_name and t.blocks>s.blocks
    where s.segment_type = 'TABLE PARTITION' and p.compress_for IS NOT NULL and s.segment_name = l_table_name 
  ) LOOP
    l_part_update := TRUE;
    dbms_output.put_line(i.table_name||' ('||i.partition_name||') '||i.stat_blocks||' => '||i.seg_blocks||' blocks');
    dbms_stats.set_table_stats(ownname=>'SYSADM',tabname=>i.table_name,partname=>i.partition_name,numblks=>i.seg_blocks,force=>TRUE);
  END LOOP;
  IF l_part_update THEN
    FOR i IN (
      SELECT table_name, sum(blocks) blocks
      FROM user_tab_statistics
      WHERE table_name = l_table_name AND partition_name IS NOT NULL 
      GROUP BY table_name
    ) LOOP
      dbms_output.put_line(i.table_name||' = '||i.blocks||' blocks');
      dbms_stats.set_table_stats(ownname=>'SYSADM',tabname=>i.table_name,numblks=>i.blocks,force=>TRUE);
    END LOOP;
  ELSE
    dbms_output.put_line(l_table_name||' - no action required');
  END IF;
END;
/

Orphaned Index Entries & Space Recovery

One side effect of the table segment compression operation (or any other segment reorganisation operation) is that we get orphaned entries in any global indexes.  I could rebuild these indexes.
ALTER INDEX psdjrnl_ln REBUILD ONLINE TABLESPACE psindex PARALLEL;
Or, I could wait for the PMO_DEFERRED_GIDX_MAINT_JOB job to run (scheduled by default during the maintenance window) to clean out orphaned index entries from all currently impacted global indexes. Another alternative is to manually run the dbms_part.cleanup_gidx procedure that is in turn called by this job (see also Richard Foote: 12c Asynchronous Global Index Maintenance Part II)
However, another side effect is that global indexes can grow as they are maintained by the table partition compression operations.  These indexes can be coalesced, cleaned up, and then shrunk.  The shrink implies a coalesce.  ALTER INDEX … SHRINK SPACE COMPACT is equivalent to ALTER INDEX … COALESCE - see Jonathan Lewis's Oracle Scratchpad: Shrinking indexes).  However, the coalesce can be done in parallel, but the shrink cannot.  Therefore, I have chosen to do a parallel coalesce that includes a clean-up of orphaned entries, and then I shrink the segment.
ALTER INDEX psdjrnl_ln COALESCE CLEANUP PARALLEL;
ALTER INDEX psdjrnl_ln SHRINK SPACE;
I also rebuild corresponding partitions in locally partitioned indexes.
…
ALTER INDEX ps_jrnl_ln REBUILD PARTITION jrnl_ln_202012 ONLINE;
ALTER INDEX ps_jrnl_ln REBUILD PARTITION jrnl_ln_202101 ONLINE;
…
At the end of this process, space has been released back to the tablespace, but the free space will be distributed throughout the tablespace, so it probably won't be possible to release space back to the file system.
If you are compressing many partitions, perhaps because you are doing this for the first time, and they are in a dedicated tablespace, then I would suggest completely rebuilding the objects into new tablespaces.  Tablespaces can be renamed as required.

No comments :