Friday, July 25, 2025

Determining Optimal Index Key Compression Length

 In 2008, Richard Foote wrote this still excellent 4-part guide to index key compression. 

He started with this comment, that I think is just as valid as it was then:

“Index compression is perhaps one of the most under used and neglected index options available. It has the potential to substantially reduce the overall size of non-Unique indexes and multi-column unique indexes, in some scenarios dramatically so... Not only will it potentially save storage, but if the resultant index contains fewer leaf blocks, that’s potentially fewer LIOs and from the Cost Based Optimizer’s point of view, potentially a cheaper execution plan option.”

Index key compression is a highly effective option for reducing index size and improving index performance.  

“Oracle stores each distinct combination of compressed column values found within a specific index leaf block in a ‘Prefix’ table within the leaf block and assigns each combination a unique prefix number.” If the prefix length (the number of leading columns to be compressed) is too great, then the prefix table will contain more entries, ultimately one for every row in the index. The compressed index could end up being larger than the uncompressed index!  If the prefix length is too small, then you might not get as much compression as you might with a longer prefix length.  

In other words, there is a sweet spot where you will achieve optimal compression.  That sweet spot can vary from no compression to compressing all the columns.  It will vary from index to index, from partition to partition, and potentially over time as the data in an index changes.

Test Every Option to Determine Optimal Compression

One way to determine optimal compression is through exhaustive testing.  Each index could be rebuilt at each possible compression prefix length, and the size of the index could be compared, and the performance of application processes could be tested.

The following PL/SQL script (available on GitHub) rebuilds each index on a named table at each possible compression length, collects statistics and stores them in a table.

REM index_compression_test.sql
create table gfc_index_compression_stats
(table_name varchar2(128)
,index_name varchar2(128)
,num_rows number
,last_analyzed date
,prefix_length number 
,blevel number 
,leaf_blocks number 
,avg_leaf_blocks_per_key number 
,avg_data_blocks_per_key number 
,clustering_factor number 
,constraint gfc_index_compression_stats_pk primary key (table_name, index_name, prefix_length)
);

DECLARE
  l_table_name VARCHAR2(128) := 'PSTREENODE';
  l_num_cols INTEGER;
  l_sql CLOB;
  e_invalid_compress_length EXCEPTION;
  PRAGMA EXCEPTION_INIT(e_invalid_compress_length,-25194); 
BEGIN
  FOR i IN (
    SELECT table_name, index_name, column_position prefix_length FROM user_ind_columns
    WHERE table_name = l_table_name
    UNION
    SELECT table_name, index_name, 0 FROM user_indexes
    WHERE table_name = l_table_name
    ORDER BY table_name, index_name, prefix_length DESC
  ) LOOP
   IF i.prefix_length > 0 THEN 
     l_sql := 'ALTER INDEX '||i.index_name||' REBUILD COMPRESS '||i.prefix_length;
   ELSE
     l_sql := 'ALTER INDEX '||i.index_name||' REBUILD NOCOMPRESS';
   END IF;

   BEGIN
     dbms_output.put_line(l_sql);
     EXECUTE IMMEDIATE l_sql;
     dbms_stats.gather_index_stats(user,i.index_name);
   
     MERGE INTO gfc_index_compression_stats u
     USING (SELECT * FROM user_indexes WHERE table_name = i.table_name And index_name = i.index_name) s
     ON (u.table_name = s.table_name AND u.index_name = s.index_name AND u.prefix_length = NVL(s.prefix_length,0))
     WHEN MATCHED THEN UPDATE SET u.num_rows = s.num_rows, u.last_analyzed = s.last_analyzed, u.blevel = s.blevel, u.leaf_blocks = s.leaf_blocks, u.avg_leaf_blocks_per_key = s.avg_leaf_blocks_per_key, u.avg_data_blocks_per_key = s.avg_data_blocks_per_key, u.clustering_factor = s.clustering_factor
     WHEN NOT MATCHED THEN INSERT (table_name, index_name, num_rows, last_analyzed, prefix_length, blevel, leaf_blocks, avg_leaf_blocks_per_key, avg_data_blocks_per_key, clustering_factor)
     VALUES (s.table_name, s.index_name, s.num_rows, s.last_analyzed, NVL(s.prefix_length,0), s.blevel, s.leaf_blocks, s.avg_leaf_blocks_per_key, s.avg_data_blocks_per_key, s.clustering_factor);
   EXCEPTION 
     WHEN e_invalid_compress_length THEN NULL;
   END;  
  END LOOP; 
END;
/
The following chart presents the data collected by the script above for the PSTREENODE table in PeopleSoft.  The number of leaf blocks is graphed against the compression prefix length. The left-hand end of each line shows the uncompressed size of the index. 
For most indexes, the size decreases as the compression prefix length increases until it reaches a minimum.  That is the optimal compression.  Beyond that point, where the prefix columns are (or are almost) unique, the compressed index is larger than the uncompressed index. In my example, only two indexes benefit from the entire key index being compressed.  For all the other indexes, the optimal compression is obtained when some, but not all, of the key columns are compressed.
There are 8 indexes on PSTREENODE, and in all, the script performs 53 index rebuilds.  On a small table such as this, it only takes a few minutes to work through this, but on a larger table, this could easily become time-consuming.

Let Oracle Calculate the Optimal Compression Prefix Length

The alternative is to let Oracle calculate the optimal compression prefix length.
Before Oracle introduced the DBMS_STATS package, we used the ANALYZE command to collect optimizer statistics.  The last remaining use of this command is to validate object structures.
ANALYZE INDEX … VALIDATE STRUCTURE CASCADE;
You can validate an object and all dependent objects (for example, indexes) by including the CASCADE option. The following statement validates the emp table and all associated indexes:
ANALYZE TABLE emp VALIDATE STRUCTURE CASCADE;

For an index, Oracle Database verifies the integrity of each data block in the index and checks for block corruption. This clause does not confirm that each row in the table has an index entry or that each index entry points to a row in the table. You can perform these operations by validating the structure of the table with the CASCADE clause.
Oracle Database also computes compression statistics (optimal prefix compression count) for all normal indexes.
Oracle Database stores statistics about the index in the data dictionary views INDEX_STATS and INDEX_HISTOGRAM.

The following script analyses each index.
INDEX_STATS displays only the results for the last ANALYZE command in the current session, so the script transfers them to a permanent table.  It does not reanalyse indexes for which there is already a result stored.
REM calc_opt_comp.sql
REM (c)Go-Faster Consultancy Ltd. 2014
REM see https://blog.psftdba.com/2016/02/implementing-index-compression-and.html
set serveroutput on autotrace off
clear columns
SPOOL calc_opt_comp

REM DROP TABLE sysadm.gfc_index_stats PURGE;

--create working storage table with same structure as INDEX_STATS
CREATE TABLE sysadm.gfc_index_stats 
AS SELECT * FROM index_stats
WHERE 1=2
/

ALTER TABLE sysadm.gfc_index_stats
MODIFY name NOT NULL
/

CREATE UNIQUE INDEX sysadm.gfc_index_stats
ON sysadm.gfc_index_stats (name, partition_name)
/

undefine table_name
DECLARE
 l_sql        VARCHAR2(100);
 l_owner      VARCHAR2(8) := 'SYSADM';
 l_table_name VARCHAR2(30) := '&&table_name';
BEGIN
 FOR i IN (
  SELECT i.index_name, ip.partition_name
  FROM   all_indexes i
  ,      all_ind_partitions ip
  WHERE  i.index_type like '%NORMAL'
  AND    i.table_owner = l_owner
  AND    i.partitioned = 'YES'
  AND    i.table_name = l_table_name
  AND    ip.index_owner = i.owner
  AND    ip.index_name  = i.index_name
  AND    ip.subpartition_count = 0
  AND    ip.segment_created = 'YES'
  UNION
  SELECT i.index_name, isp.subpartition_name
  FROM   all_indexes i
  ,      all_ind_subpartitions isp
  WHERE  i.index_type like '%NORMAL'
  AND    i.table_owner = l_owner
  AND    i.partitioned = 'YES'
  AND    i.table_name = l_table_name
  AND    isp.index_owner = i.owner
  AND    isp.index_name  = i.index_name
  AND    isp.segment_created = 'YES'
  UNION
  SELECT i.index_name, NULL
  FROM   all_indexes i
  WHERE  i.index_type like '%NORMAL'
  AND    i.table_owner = l_owner
  AND    i.table_name = l_table_name
  AND    i.partitioned = 'NO'
  AND    i.segment_created = 'YES'
  MINUS
  SELECT name, partition_name
  FROM   sysadm.gfc_index_stats
 ) LOOP
  IF i.partition_name IS NULL THEN
    l_sql := 'ANALYZE INDEX '||l_owner||'.'||i.index_name||' VALIDATE STRUCTURE';
  ELSE
    l_sql := 'ANALYZE INDEX '||l_owner||'.'||i.index_name||' PARTITION ('||i.partition_name||') VALIDATE STRUCTURE';
  END IF;

  dbms_output.put_line(l_sql);
  EXECUTE IMMEDIATE l_sql;

  DELETE FROM sysadm.gfc_index_stats g
  WHERE EXISTS(
	SELECT  'x'
	FROM	index_stats i
	WHERE 	i.name = g.name
	AND	(i.partition_name = g.partition_name OR (i.partition_name IS NULL AND g.partition_name IS NULL)));

  INSERT INTO sysadm.gfc_index_stats 
  SELECT i.* FROM index_stats i;
  COMMIT;
 END LOOP;
END;
/
…
The script produces reports of its analysis.  The summary report shows the optimal compression length for each index and lists the columns that are and are not compressed.  We can see that the result of the ANALYZE command agrees with the result of the previous test that rebuilt each index at each compression length and measured the size of the index.
                                                     Summary Report

                                      Opt Comp                         Weighted         Est.                            
                                        Prefix        Num               Average         Comp                            
Table Name         Index Name           Length FREQ Parts       Blocks Saving %       Blocks                            
------------------ ------------------ -------- ---- ----- ------------ -------- ------------                            
Compress Columns                                            Do Not Compress Columns                                     
----------------------------------------------------------- ----------------------------------------------------------- 
PSTREENODE         PSAPSTREENODE             4    1     0        2,048     41.0        1,208                            
SETID, TREE_NAME, EFFDT, TREE_BRANCH                        TREE_NODE, TREE_NODE_NUM, TREE_NODE_NUM_END, TREE_NODE_TYPE 
                                                                                                                        
                   PSBPSTREENODE             8    1     0        1,920     34.0        1,267                            
SETID, TREE_NAME, TREE_BRANCH, TREE_NODE_NUM, TREE_NODE, TR                                                             
EE_NODE_NUM_END, TREE_LEVEL_NUM, TREE_NODE_TYPE                                                                         
                                                                                                                        
                   PSDPSTREENODE             3    1     0        1,280     61.0          499                            
SETID, TREE_NAME, EFFDT                                     PARENT_NODE_NUM                                             
                                                                                                                        
                   PSFPSTREENODE             2    1     0        1,024     67.0          338                            
TREE_NAME, EFFDT                                                                                                        
                                                                                                                        
                   PSGPSTREENODE             2    1     0        2,304     35.0        1,498                            
PARENT_NODE_NAME, TREE_NAME                                 EFFDT, TREE_NODE, SETID                                     
                                                                                                                        
                   PSHPSTREENODE             2    1     0        2,048     24.0        1,556                            
TREE_NODE, TREE_NAME                                        EFFDT, SETID, SETCNTRLVALUE, TREE_NODE_NUM                  
                                                                                                                        
                   PSIPSTREENODE             3    1     0        1,152       .0        1,152                            
SETID, TREE_NAME, EFFDT                                     TREE_NODE, TREE_NODE_NUM, TREE_NODE_NUM_END                 
                                                                                                                        
                   PS_PSTREENODE             4    1     0        1,792     46.0          968                            
SETID, SETCNTRLVALUE, TREE_NAME, EFFDT                      TREE_NODE_NUM, TREE_NODE, TREE_BRANCH                       
                                                                                                                        
******************                                  ----- ------------          ------------                            
                                                                                                                        
                                                                                                                        
sum                                                     0       13,568                 8,486

Compression of Partitioned Indexes

If you partition an index, then the script validates the structure of each physical partition.  The detailed report shows the optimal compression for each partition.  You may find that Oracle determines that the optimal compression is different for different partitions.      
Only a single compression length can be specified for each index.  It is then applied to all the partitions, although compression can be disabled on specific partitions.  A judgement has to be made as to what is the best balance.  
                                                     Detail Report

                                                                     Opt Comp                             Est.          
                                                                       Prefix              Saving         Comp          
Table Name         Index Name         Partition Name                   Length       Blocks      %       Blocks          
------------------ ------------------ ------------------------------ -------- ------------ ------ ------------          
…
                   PSHJRNL_LN         JRNL_LNH201612                        1      143,264  142.0      -60,171          
                                      JRNL_LNH201712                        0       88,192   74.0       22,930          
                                      JRNL_LNH201812                        6       12,240     .0       12,240          
                                      JRNL_LNH201912                        6       11,104     .0       11,104          
…
                                      JRNL_LNH202201                        6       13,752     .0       13,752          
                                      JRNL_LNH202202                        6        5,496     .0        5,496          
                                      JRNL_LNH202203                        6        6,504     .0        6,504          
                                      JRNL_LNH202204                        6        5,920     .0        5,920          
                                      JRNL_LNH202205                        6        6,864     .0        6,864          
                                      JRNL_LNH202206                        6       13,584     .0       13,584          
                                      JRNL_LNH202207                        6       12,408     .0       12,408          
                                      JRNL_LNH202208                        3      212,904  121.0      -44,710          
                                      JRNL_LNH202209                        0      262,472  111.0      -28,872          
                                      JRNL_LNH202210                        3      228,552  102.0       -4,571          
                   ******************                                         ------------        ------------          
                   sum                                                           1,625,328             574,550
NB: ANALYZE INDEX on some partitions predicated a saving greater than 100%, leading to a negative predicted size estimate.  This is obviously impossible. In the past, there was a bug (now long since resolved) that caused this behaviour.  This occurs when the predicted optimal compression length is less than the current compression length in an index that is already compressed.  However, this problem does not occur consistently.  
In the above example, 6 columns of PSHJRLN_LN have already been compressed for all partitions.  The script has validated that, for the majority of partitions, this is optimal and calculates that there is no further space saving available.  However, some partitions require less compression.  
The choice is between either:
  • Choosing to compress the entire index at a shorter compression.  In which case, most of the partitions will be larger, the exception partitions will be small, but the net effect is that the index will be larger. 
  • Disabling compression on these partitions.  Over-compressed indexes are generally only slightly larger than uncompressed indexes, so the benefit is probably only small
  • Leave compression at the length that is optimal for most of the partitions, accepting that a few partitions will be over-compressed.  This usually results in the smallest index overall.
On balance, I tend to favour the last option on the basis that an over-compressed index is only slightly larger than an uncompressed index.