In 2008, Richard Foote wrote this still excellent 4-part guide to index key compression.
- Index Compression Part I (Low)
- Index Compression Part II (Down Is The New Up)
- Index Compression Part III (2+2=5)
- Index Compression Part IV (Packt Like Sardines In a Crushd Tin Box)
He started with this comment, that I think is just as valid as it was then:
“Index compression is perhaps one of the most under used and neglected index options available. It has the potential to substantially reduce the overall size of non-Unique indexes and multi-column unique indexes, in some scenarios dramatically so... Not only will it potentially save storage, but if the resultant index contains fewer leaf blocks, that’s potentially fewer LIOs and from the Cost Based Optimizer’s point of view, potentially a cheaper execution plan option.”
Index key compression is a highly effective option for reducing index size and improving index performance.
“Oracle stores each distinct combination of compressed column values found within a specific index leaf block in a ‘Prefix’ table within the leaf block and assigns each combination a unique prefix number.” If the prefix length (the number of leading columns to be compressed) is too great, then the prefix table will contain more entries, ultimately one for every row in the index. The compressed index could end up being larger than the uncompressed index! If the prefix length is too small, then you might not get as much compression as you might with a longer prefix length.
In other words, there is a sweet spot where you will achieve optimal compression. That sweet spot can vary from no compression to compressing all the columns. It will vary from index to index, from partition to partition, and potentially over time as the data in an index changes.
Test Every Option to Determine Optimal Compression
One way to determine optimal compression is through exhaustive testing. Each index could be rebuilt at each possible compression prefix length, and the size of the index could be compared, and the performance of application processes could be tested.
The following PL/SQL script (available on GitHub) rebuilds each index on a named table at each possible compression length, collects statistics and stores them in a table.
REM index_compression_test.sql
create table gfc_index_compression_stats
(table_name varchar2(128)
,index_name varchar2(128)
,num_rows number
,last_analyzed date
,prefix_length number
,blevel number
,leaf_blocks number
,avg_leaf_blocks_per_key number
,avg_data_blocks_per_key number
,clustering_factor number
,constraint gfc_index_compression_stats_pk primary key (table_name, index_name, prefix_length)
);
DECLARE
l_table_name VARCHAR2(128) := 'PSTREENODE';
l_num_cols INTEGER;
l_sql CLOB;
e_invalid_compress_length EXCEPTION;
PRAGMA EXCEPTION_INIT(e_invalid_compress_length,-25194);
BEGIN
FOR i IN (
SELECT table_name, index_name, column_position prefix_length FROM user_ind_columns
WHERE table_name = l_table_name
UNION
SELECT table_name, index_name, 0 FROM user_indexes
WHERE table_name = l_table_name
ORDER BY table_name, index_name, prefix_length DESC
) LOOP
IF i.prefix_length > 0 THEN
l_sql := 'ALTER INDEX '||i.index_name||' REBUILD COMPRESS '||i.prefix_length;
ELSE
l_sql := 'ALTER INDEX '||i.index_name||' REBUILD NOCOMPRESS';
END IF;
BEGIN
dbms_output.put_line(l_sql);
EXECUTE IMMEDIATE l_sql;
dbms_stats.gather_index_stats(user,i.index_name);
MERGE INTO gfc_index_compression_stats u
USING (SELECT * FROM user_indexes WHERE table_name = i.table_name And index_name = i.index_name) s
ON (u.table_name = s.table_name AND u.index_name = s.index_name AND u.prefix_length = NVL(s.prefix_length,0))
WHEN MATCHED THEN UPDATE SET u.num_rows = s.num_rows, u.last_analyzed = s.last_analyzed, u.blevel = s.blevel, u.leaf_blocks = s.leaf_blocks, u.avg_leaf_blocks_per_key = s.avg_leaf_blocks_per_key, u.avg_data_blocks_per_key = s.avg_data_blocks_per_key, u.clustering_factor = s.clustering_factor
WHEN NOT MATCHED THEN INSERT (table_name, index_name, num_rows, last_analyzed, prefix_length, blevel, leaf_blocks, avg_leaf_blocks_per_key, avg_data_blocks_per_key, clustering_factor)
VALUES (s.table_name, s.index_name, s.num_rows, s.last_analyzed, NVL(s.prefix_length,0), s.blevel, s.leaf_blocks, s.avg_leaf_blocks_per_key, s.avg_data_blocks_per_key, s.clustering_factor);
EXCEPTION
WHEN e_invalid_compress_length THEN NULL;
END;
END LOOP;
END;
/
The following chart presents the data collected by the script above for the PSTREENODE table in PeopleSoft. The number of leaf blocks is graphed against the compression prefix length. The left-hand end of each line shows the uncompressed size of the index. Let Oracle Calculate the Optimal Compression Prefix Length
ANALYZE INDEX … VALIDATE STRUCTURE CASCADE;
ANALYZE TABLE emp VALIDATE STRUCTURE CASCADE;
REM calc_opt_comp.sql
REM (c)Go-Faster Consultancy Ltd. 2014
REM see https://blog.psftdba.com/2016/02/implementing-index-compression-and.html
set serveroutput on autotrace off
clear columns
SPOOL calc_opt_comp
REM DROP TABLE sysadm.gfc_index_stats PURGE;
--create working storage table with same structure as INDEX_STATS
CREATE TABLE sysadm.gfc_index_stats
AS SELECT * FROM index_stats
WHERE 1=2
/
ALTER TABLE sysadm.gfc_index_stats
MODIFY name NOT NULL
/
CREATE UNIQUE INDEX sysadm.gfc_index_stats
ON sysadm.gfc_index_stats (name, partition_name)
/
undefine table_name
DECLARE
l_sql VARCHAR2(100);
l_owner VARCHAR2(8) := 'SYSADM';
l_table_name VARCHAR2(30) := '&&table_name';
BEGIN
FOR i IN (
SELECT i.index_name, ip.partition_name
FROM all_indexes i
, all_ind_partitions ip
WHERE i.index_type like '%NORMAL'
AND i.table_owner = l_owner
AND i.partitioned = 'YES'
AND i.table_name = l_table_name
AND ip.index_owner = i.owner
AND ip.index_name = i.index_name
AND ip.subpartition_count = 0
AND ip.segment_created = 'YES'
UNION
SELECT i.index_name, isp.subpartition_name
FROM all_indexes i
, all_ind_subpartitions isp
WHERE i.index_type like '%NORMAL'
AND i.table_owner = l_owner
AND i.partitioned = 'YES'
AND i.table_name = l_table_name
AND isp.index_owner = i.owner
AND isp.index_name = i.index_name
AND isp.segment_created = 'YES'
UNION
SELECT i.index_name, NULL
FROM all_indexes i
WHERE i.index_type like '%NORMAL'
AND i.table_owner = l_owner
AND i.table_name = l_table_name
AND i.partitioned = 'NO'
AND i.segment_created = 'YES'
MINUS
SELECT name, partition_name
FROM sysadm.gfc_index_stats
) LOOP
IF i.partition_name IS NULL THEN
l_sql := 'ANALYZE INDEX '||l_owner||'.'||i.index_name||' VALIDATE STRUCTURE';
ELSE
l_sql := 'ANALYZE INDEX '||l_owner||'.'||i.index_name||' PARTITION ('||i.partition_name||') VALIDATE STRUCTURE';
END IF;
dbms_output.put_line(l_sql);
EXECUTE IMMEDIATE l_sql;
DELETE FROM sysadm.gfc_index_stats g
WHERE EXISTS(
SELECT 'x'
FROM index_stats i
WHERE i.name = g.name
AND (i.partition_name = g.partition_name OR (i.partition_name IS NULL AND g.partition_name IS NULL)));
INSERT INTO sysadm.gfc_index_stats
SELECT i.* FROM index_stats i;
COMMIT;
END LOOP;
END;
/
…
The script produces reports of its analysis. The summary report shows the optimal compression length for each index and lists the columns that are and are not compressed. We can see that the result of the ANALYZE command agrees with the result of the previous test that rebuilt each index at each compression length and measured the size of the index.
Summary Report
Opt Comp Weighted Est.
Prefix Num Average Comp
Table Name Index Name Length FREQ Parts Blocks Saving % Blocks
------------------ ------------------ -------- ---- ----- ------------ -------- ------------
Compress Columns Do Not Compress Columns
----------------------------------------------------------- -----------------------------------------------------------
PSTREENODE PSAPSTREENODE 4 1 0 2,048 41.0 1,208
SETID, TREE_NAME, EFFDT, TREE_BRANCH TREE_NODE, TREE_NODE_NUM, TREE_NODE_NUM_END, TREE_NODE_TYPE
PSBPSTREENODE 8 1 0 1,920 34.0 1,267
SETID, TREE_NAME, TREE_BRANCH, TREE_NODE_NUM, TREE_NODE, TR
EE_NODE_NUM_END, TREE_LEVEL_NUM, TREE_NODE_TYPE
PSDPSTREENODE 3 1 0 1,280 61.0 499
SETID, TREE_NAME, EFFDT PARENT_NODE_NUM
PSFPSTREENODE 2 1 0 1,024 67.0 338
TREE_NAME, EFFDT
PSGPSTREENODE 2 1 0 2,304 35.0 1,498
PARENT_NODE_NAME, TREE_NAME EFFDT, TREE_NODE, SETID
PSHPSTREENODE 2 1 0 2,048 24.0 1,556
TREE_NODE, TREE_NAME EFFDT, SETID, SETCNTRLVALUE, TREE_NODE_NUM
PSIPSTREENODE 3 1 0 1,152 .0 1,152
SETID, TREE_NAME, EFFDT TREE_NODE, TREE_NODE_NUM, TREE_NODE_NUM_END
PS_PSTREENODE 4 1 0 1,792 46.0 968
SETID, SETCNTRLVALUE, TREE_NAME, EFFDT TREE_NODE_NUM, TREE_NODE, TREE_BRANCH
****************** ----- ------------ ------------
sum 0 13,568 8,486
Compression of Partitioned Indexes
Detail Report
Opt Comp Est.
Prefix Saving Comp
Table Name Index Name Partition Name Length Blocks % Blocks
------------------ ------------------ ------------------------------ -------- ------------ ------ ------------
…
PSHJRNL_LN JRNL_LNH201612 1 143,264 142.0 -60,171
JRNL_LNH201712 0 88,192 74.0 22,930
JRNL_LNH201812 6 12,240 .0 12,240
JRNL_LNH201912 6 11,104 .0 11,104
…
JRNL_LNH202201 6 13,752 .0 13,752
JRNL_LNH202202 6 5,496 .0 5,496
JRNL_LNH202203 6 6,504 .0 6,504
JRNL_LNH202204 6 5,920 .0 5,920
JRNL_LNH202205 6 6,864 .0 6,864
JRNL_LNH202206 6 13,584 .0 13,584
JRNL_LNH202207 6 12,408 .0 12,408
JRNL_LNH202208 3 212,904 121.0 -44,710
JRNL_LNH202209 0 262,472 111.0 -28,872
JRNL_LNH202210 3 228,552 102.0 -4,571
****************** ------------ ------------
sum 1,625,328 574,550
- Choosing to compress the entire index at a shorter compression. In which case, most of the partitions will be larger, the exception partitions will be small, but the net effect is that the index will be larger.
- Disabling compression on these partitions. Over-compressed indexes are generally only slightly larger than uncompressed indexes, so the benefit is probably only small
- Leave compression at the length that is optimal for most of the partitions, accepting that a few partitions will be over-compressed. This usually results in the smallest index overall.