Tuesday, December 11, 2007

Changes in Calculation of Predicate Selectivity in Oracle 10g

Stuff changes! The very first entry I wrote for this blog back in April 2006 (http://blog.psftdba.com/2006/04/currentdatein-metasql-prevents-oracle.html) discussed how the Oracle 9i optimizer calculated the selectivity of expressions, such as the expansion of the %CurrentDateIn macro in PeopleSoft. Recently, I had cause to repeat the test script in this entry on Oracle 10gR2 (10.2.0.1.0 on Windows, 10.2.0.3.0 on HP-UX).
For the simple TRUNC(SYSDATE) predicate, Oracle correctly calculated the cardinality as 4. So no change here since Oracle 9i.
EXPLAIN PLAN FOR
SELECT * FROM t1 WHERE c > TRUNC(SYSDATE);

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     4 |  4096 |    41   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     4 |  4096 |    41   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("C">TRUNC(SYSDATE@!))

But when I use the expansion of the PeopleSoft %CurrentDateIn macro, Oracle 10g now also correctly calculates the cardinality as 4. Oracle 9i didn't calculate this, and instead used a hard-coded assumption of 5% selectivity (50 rows in this example).
EXPLAIN PLAN FOR
SELECT * FROM t1 WHERE c > TO_DATE(TO_CHAR(SYSDATE,'YYYY-MM-DD'),'YYYY-MM-DD');

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     4 |  4096 |    41   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     4 |  4096 |    41   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("C">TO_DATE(TO_CHAR(SYSDATE@!,'YYYY-MM-DD'),'YYYY-MM-DD'))

I have tried other more complex expressions, but it would appear that, where possible, Oracle will calculate the value of the expression and then use that result to calculate the selectivity of that value.
This statement will generate an error when it is executed because the divisor is zero. However, if I just produce an execution plan, Oracle cannot calculate the selectivity, and so has gone back to the 5% assumption.
EXPLAIN PLAN FOR
SELECT * FROM t1 WHERE c > TO_DATE(TO_CHAR(SYSDATE,'YYYY-MM-DD'),'YYYY-MM-DD');

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    50 | 51200 |    41   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    50 | 51200 |    41   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("C">SYSDATE@!+1/0)

While it is wonderful that Oracle optimizer has got a bit cleverer in calculating the cardinality of predicates, this is going to cause execution plans to change on upgrading to 10g. Research into this is on-going, but I am particularly concerned by the effect on the costing of effective-dated sub-queries.

1 comment :

Noons said...

You touch on a very valid point here, David.

I can understand the concept of increasing the "smartness" of the optimizer and Oracle's efforts in this area.

But there is such a thing as "too smart", for lack of a better expression.

One thing that steady state production environments do NOT need is ad-hoc change. Guess what the so-called "smart optimizer" means?


Is it desirable in the long run? All I can say is: as much as possible, I try to turn off all this smartness so that I can end up with a predictable system. It might not run as fast as it could in some cases, but it sure ain't gonna blow up in execution times without telling me!