Monday, June 30, 2025

Optimising Journal Line Queries: 5. Conclusion

This is the last of five articles that examine the challenges posed by typical queries on the journal line table (PS_JRNL_LN).

  1. Problem Statement
  2. Exadata System Statistics
  3. Partitioning
  4. Compression
  5. Conclusion

After introducing Exadata system statistics, partitioning, and compression and also archiving some historical data, we arrived at the point where the execution plan of the statement changes without needing to use hints.

Original Execution Plan (With Hints)

I have used hints to force the original execution plan.  

ALTER SESSION SET STATISTICS_LEVEL=ALL;
SELECT /*+LEADING(A) USE_NL(B) 
          INDEX(B (PS_JRNL_LN.BUSINESS_UNIT PS_JRNL_LN.JOURNAL_ID PS_JRNL_LN.JOURNAL_DATE PS_JRNL_LN.UNPOST_SEQ PS_JRNL_LN.JOURNAL_LINE PS_JRNL_LN.LEDGER))*/
     A.FISCAL_YEAR, A.ACCOUNTING_PERIOD, B.BUSINESS_UNIT, B.JOURNAL_ID, TO_CHAR(B.JOURNAL_DATE,'YYYY-MM-DD'), B.LEDGER, B.ACCOUNT
, B.PRODUCT, B.PROJECT_ID, B.CHARTFIELD1, B.CHARTFIELD2, B.CURRENCY_CD, B.AFFILIATE, SUM( B.MONETARY_AMOUNT), B.FOREIGN_CURRENCY
, SUM( B.FOREIGN_AMOUNT), A.REVERSAL_CD, A.REVERSAL_ADJ_PER, A.JRNL_HDR_STATUS, TO_CHAR(A.POSTED_DATE,'YYYY-MM-DD'), A.OPRID, A.DESCR254
, B.DEPTID, A.SOURCE, B.ALTACCT, TO_CHAR(CAST((A.DTTM_STAMP_SEC) AS TIMESTAMP),'YYYY-MM-DD-HH24.MI.SS.FF'), B.LINE_DESCR 
FROM PS_JRNL_HEADER A, PS_JRNL_LN B 
WHERE (A.BUSINESS_UNIT = B.BUSINESS_UNIT 
AND A.JOURNAL_ID = B.JOURNAL_ID 
AND A.JOURNAL_DATE = B.JOURNAL_DATE 
AND A.UNPOST_SEQ = B.UNPOST_SEQ 
AND A.JRNL_HDR_STATUS IN('P','V','U') 
AND A.FISCAL_YEAR IN (2024) 
AND A.ACCOUNTING_PERIOD BETWEEN 1 AND 12
AND B.CHARTFIELD1 IN ('1234567','1234568','1234569')
AND B.LEDGER IN ('LEDGER')) 
GROUP BY A.FISCAL_YEAR, A.ACCOUNTING_PERIOD, B.BUSINESS_UNIT, B.JOURNAL_ID, B.JOURNAL_DATE, B.LEDGER, B.ACCOUNT, B.PRODUCT, B.PROJECT_ID, B.CHARTFIELD1, 
B.CHARTFIELD2, B.CURRENCY_CD, B.AFFILIATE, B.FOREIGN_CURRENCY, A.REVERSAL_CD, A.REVERSAL_ADJ_PER, A.JRNL_HDR_STATUS, A.POSTED_DATE, A.OPRID, A.DESCR254, 
B.DEPTID, A.SOURCE, B.ALTACCT, A.DTTM_STAMP_SEC, B.LINE_DESCR
/
select * from table(dbms_xplan.display_cursor(null,null,'ADVANCED +ADAPTIVE'));

Rather than get the execution plan from EXPLAIN PLAN, I have executed the query with STATISTICS_LEVEL set to ALL, and then displayed the cursor.

The execution plan starts with the PS_JRNL_HEADER table and uses a nested loop join with a lookup of the unique index on PS_JRNL_LN.  Although note that the optimizer costs were produced with Exadata system statistics.

Plan hash value: 4030641493

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name           | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| E-Time   | Pstart| Pstop | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                |      1 |        |       |       |  1740K(100)|          |       |       |    209K|00:06:41.85 |     238M|     15M|
|   1 |  HASH GROUP BY                      |                |      1 |    498K|   108M|   129M|  1740K  (1)| 00:01:08 |       |       |    209K|00:06:41.85 |     238M|     15M|
|   2 |   NESTED LOOPS                      |                |      1 |    498K|   108M|       |  1722K  (1)| 00:01:08 |       |       |    495K|01:03:03.80 |     238M|     15M|
|   3 |    NESTED LOOPS                     |                |      1 |    498K|   108M|       |  1722K  (1)| 00:01:08 |       |       |    459M|00:11:20.66 |    5549K|   4259K|
|*  4 |     TABLE ACCESS STORAGE FULL       | PS_JRNL_HEADER |      1 |    430K|    41M|       |  1135   (8)| 00:00:01 |       |       |    430K|00:00:00.34 |   88642 |  88637 |
|   5 |     PARTITION RANGE ITERATOR        |                |    430K|      1 |       |       |     3   (0)| 00:00:01 |   KEY |   KEY |    459M|00:10:38.60 |    5460K|   4170K|
|*  6 |      INDEX RANGE SCAN               | PS_JRNL_LN     |    430K|      1 |       |       |     3   (0)| 00:00:01 |   KEY |   KEY |    459M|00:09:55.80 |    5460K|   4170K|
|*  7 |    TABLE ACCESS BY LOCAL INDEX ROWID| PS_JRNL_LN     |    459M|      1 |   127 |       |     4   (0)| 00:00:01 |     1 |     1 |    495K|00:50:25.33 |     233M|     11M|
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
…
Predicate Information (identified by operation id):
---------------------------------------------------

   4 - storage(("A"."FISCAL_YEAR"=2024 AND INTERNAL_FUNCTION("A"."JRNL_HDR_STATUS") AND "A"."ACCOUNTING_PERIOD"<=12 AND "A"."ACCOUNTING_PERIOD">=1))
       filter(("A"."FISCAL_YEAR"=2024 AND INTERNAL_FUNCTION("A"."JRNL_HDR_STATUS") AND "A"."ACCOUNTING_PERIOD"<=12 AND "A"."ACCOUNTING_PERIOD">=1))
   6 - access("A"."BUSINESS_UNIT"="B"."BUSINESS_UNIT" AND "A"."JOURNAL_ID"="B"."JOURNAL_ID" AND "A"."JOURNAL_DATE"="B"."JOURNAL_DATE" AND
              "A"."UNPOST_SEQ"="B"."UNPOST_SEQ" AND "B"."LEDGER"='LEDGER')
       filter("B"."LEDGER"='LEDGER')
   7 - filter(("B"."CHARTFIELD1"='1234567' OR "B"."CHARTFIELD1"='1234568' OR "B"."CHARTFIELD1"='1234569'))

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 3
---------------------------------------------------------------------------

1 -  SEL$1
	           -  LEADING(A)

   6 -  SEL$1 / B@SEL$1
           -  INDEX(B (PS_JRNL_LN.BUSINESS_UNIT PS_JRNL_LN.JOURNAL_ID PS_JRNL_LN.JOURNAL_DATE PS_JRNL_LN.UNPOST_SEQ PS_JRNL_LN.JOURNAL_LINE PS_JRNL_LN.LEDGER))
           -  USE_NL(B)

The cost of this execution plan depends mainly upon how many journal header rows are selected.  There is a cost of 3 per index lookup, plus another 1 for the table access, making a total of 4 per journal header row.  Here we selected 430K rows from PS_JRNL_HEADER, so 430K rows * 4/row = 1720K.  We got an actual cost of 1722K.  The discrepancy is because the 430K was rounded off by the representation of numbers in the execution plan.  Then the cost of the GROUP BY operation is 18K.  Hence, the overall cost is 1740K.

The actual execution time of the query was 63 minutes (78 minutes with time taken to fetch 211K rows across the network to the client).  

New Execution Plan (Without Hints)

Without the hints, the execution plan changes.  It still starts with a full scan of PS_JRNL_HEADER, but then full scans PS_JRNL_LN, filtering the rows by LEDGER and CHARTFIELD1 (operation at line 12) and Bloom filtering by the columns joined to PS_JRNL_HEADER.  PS_JRNL_LN is then hash joined against the rows from PS_JRNL_HEADER to remove false positives returned by the Bloom filter
Plan hash value: 1053505630

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   Id  | Operation                             | Name           | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| E-Time   | Pstart| Pstop | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT                      |                |      1 |        |       |       |   535K(100)|          |       |       |    209K|00:00:10.06 |      27M|     27M|
|     1 |  HASH GROUP BY                        |                |      1 |    498K|   108M|   129M|   535K (24)| 00:00:21 |       |       |    209K|00:00:10.06 |      27M|     27M|
|  *  2 |   HASH JOIN                           |                |      1 |    498K|   108M|    46M|   517K (24)| 00:00:21 |       |       |    495K|00:00:09.23 |      27M|     27M|
|     3 |    PART JOIN FILTER CREATE            | :BF0000        |      1 |    498K|   108M|       |   517K (24)| 00:00:21 |       |       |    430K|00:00:00.66 |   88642 |  88638 |
|-    4 |     NESTED LOOPS                      |                |      1 |    498K|   108M|    46M|   517K (24)| 00:00:21 |       |       |    430K|00:00:00.30 |   88642 |  88638 |
|-    5 |      NESTED LOOPS                     |                |      1 |        |       |       |            |          |       |       |    430K|00:00:00.24 |   88642 |  88638 |
|-    6 |       STATISTICS COLLECTOR            |                |      1 |        |       |       |            |          |       |       |    430K|00:00:00.19 |   88642 |  88638 |
|  *  7 |        TABLE ACCESS STORAGE FULL      | PS_JRNL_HEADER |      1 |    430K|    41M|       |  1135   (8)| 00:00:01 |       |       |    430K|00:00:00.13 |   88642 |  88638 |
|-    8 |       PARTITION RANGE ITERATOR        |                |      0 |        |       |       |            |          |   KEY |   KEY |      0 |00:00:00.01 |       0 |      0 |
|- *  9 |        INDEX RANGE SCAN               | PS_JRNL_LN     |      0 |        |       |       |            |          |   KEY |   KEY |      0 |00:00:00.01 |       0 |      0 |
|- * 10 |      TABLE ACCESS BY LOCAL INDEX ROWID| PS_JRNL_LN     |      0 |      1 |   127 |       |   515K (24)| 00:00:21 |     1 |     1 |      0 |00:00:00.01 |       0 |      0 |
|    11 |    PARTITION RANGE JOIN-FILTER        |                |      1 |    498K|    60M|       |   515K (24)| 00:00:21 |:BF0000|:BF0000|    815K|00:00:07.65 |      27M|     27M|
|  * 12 |     TABLE ACCESS STORAGE FULL         | PS_JRNL_LN     |     18 |    498K|    60M|       |   515K (24)| 00:00:21 |:BF0000|:BF0000|    815K|00:00:07.55 |      27M|     27M|
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "A"@"SEL$1")
      FULL(@"SEL$1" "B"@"SEL$1")
      LEADING(@"SEL$1" "A"@"SEL$1" "B"@"SEL$1")
      USE_HASH(@"SEL$1" "B"@"SEL$1")
      USE_HASH_AGGREGATION(@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("A"."JOURNAL_DATE"="B"."JOURNAL_DATE" AND "A"."BUSINESS_UNIT"="B"."BUSINESS_UNIT" AND "A"."JOURNAL_ID"="B"."JOURNAL_ID" AND "A"."UNPOST_SEQ"="B"."UNPOST_SEQ")
   7 - storage(("A"."FISCAL_YEAR"=2024 AND INTERNAL_FUNCTION("A"."JRNL_HDR_STATUS") AND "A"."ACCOUNTING_PERIOD"<=12 AND "A"."ACCOUNTING_PERIOD">=1))
       filter(("A"."FISCAL_YEAR"=2024 AND INTERNAL_FUNCTION("A"."JRNL_HDR_STATUS") AND "A"."ACCOUNTING_PERIOD"<=12 AND "A"."ACCOUNTING_PERIOD">=1))
   9 - access("A"."BUSINESS_UNIT"="B"."BUSINESS_UNIT" AND "A"."JOURNAL_ID"="B"."JOURNAL_ID" AND "A"."JOURNAL_DATE"="B"."JOURNAL_DATE" AND "A"."UNPOST_SEQ"="B"."UNPOST_SEQ"
              AND "B"."LEDGER"='LEDGER')
       filter("B"."LEDGER"='LEDGER')
  10 - filter(("B"."CHARTFIELD1"='1234567' OR "B"."CHARTFIELD1"='1234568' OR "B"."CHARTFIELD1"='1234569'))
  12 - storage(("B"."LEDGER"='LEDGER' AND INTERNAL_FUNCTION("B"."CHARTFIELD1")))
       filter(("B"."LEDGER"='LEDGER' AND INTERNAL_FUNCTION("B"."CHARTFIELD1")))
The actual execution time is just 21 seconds (or 13 minutes, including fetches).  So, this is much faster.  
The optimizer cost drops to just 535K.  This is the cheapest plan, and therefore Oracle uses it without further intervention.  515K of the cost comes from the full scan of PS_JRNL_LN, the Bloom filter takes it up by just 2K, and the GROUP BY operation by 18K.  It is an adaptive plan, so Oracle can still switch between the nested loop and the Bloom/hash join at run time on the basis of the statistics collected at run time (at line 6).
The full scan Bloom-Hash filter of PS_JRNL_LN is cheaper than the nested loop with 430K index probes.  In fact, the tipping point will be whenever the optimizer estimates that it gets at least 129K rows from PS_JRNL_HEADER (535K / 4/row =128.75K).

TL;DR

Through a combination of Exadata system stats, archiving, partitioning and compression, the cost of smart-full scanning the PS_JRNL_LN table, that contains several billion rows, Oracle has switched to a more efficient execution plan.  


No comments :