Thursday, July 19, 2012

Gathering Aggregated Cost-Based Optimiser Statistics on Partitioned Objects

Recently, I have been looking into how to gather cost-based optimizer statistics on composite partitioned objects. 

Database partitioning is not used by default in PeopleSoft because it is database specific.  Not all databases support partitioning, and where they do it is done in a platform specific manner.  Application Designer has no capability to create partitioned objects (although from PeopleTools 8.51 it will preserve partitioning in tables and index that are already partitioned in an Oracle database). It is therefore no surprise that it is not used widely by PeopleSoft customers.  However, partitioning is essential in Global Payroll systems that use 'streamed'  processing (many concurrent payroll processes).  By definition, these are the payroll systems that have large volumes of data where collecting statistics on the results tables can quickly become a challenge.  Partitioning can also have application in General Ledger reporting in Financials, particularly where nVision is in use.

It is important that statistics on partitioned tables are both up to date and accurate.  However, collecting global statistics on a large partitioned object can be a time-consuming and resource intensive business as Oracle samples all the physical partitions or sub-partitions. Briefly, if you do not collect global statistics on a partitioned table, Oracle will aggegrate the statistics on the physical partitons or sub-partitions to calculate statistics on the logical table and partition segments.

Oracle 10g makes a number of mistakes in its calculation of these aggregated statistics.  In particular the number of distinct values on columns by which the table is partitioned have impossibly low values.  This is can affect cardinality calculations and so lead the optimizer to choose the wrong execution plan.

I have now published the second of two documents on my website that examine aspects of statistics on partitioned, and in particular composite partitioned tables. The first document examines the problems in 10g, and proposes a procedure to 'correct' the aggregated statistics to at least minimum possible values.
The second document looks at the same issue in 11g and shows that while most of the issues are fixed, one problem remains.