This is the second of a series of five articles where I look at the challenges presented by typical queries on the journal line table (PS_JRNL_LN).
- Problem Statement
- Exadata System Statistics
- Partitioning
- Compression
- Conclusion
Exadata System Statistics
Many other people have written notes about how Oracle's optimizer costs a full table scan. This is a selection:
- Neil Chandler: Oracle Optimizer System Statistics
- John Brady: Oracle Optimizer Plan Costing - Full Table Scans
- Randolf Geist: Understanding the different modes of System Statistics aka. CPU Costing and the effects of multiple block sizes - part 4
- Martin Widlake: Cost of Full Table Scans
Roughly speaking, the cost calculated by the optimizer that we see in an execution plan is an estimate of the time taken to perform an operation, where the unit of time is the duration of a single block read. Although that statement is an oversimplification. There are various guesses and assumptions built into the optimizer's calculation. The cost-based optimizer looks for the cheapest plan, that ought to be the fastest to execute. However, in many cases, cost does not correspond to execution time.
- See also Jonathan Lewis: Cost is Time
Full Scan Cost
The cost of a full table scan is made up of an I/O cost (the time taken to read the blocks from disk) and a CPU cost (the time taken to process the rows). The I/O cost is the number of multi-block read operations, multiplied by the ratio of the duration of a multi-block read to a single-block read.
- IO Cost = (HWM / MBRC) . (MREADTIM / SREADTIM)
Where
- HWM = the high water mark of the segment expressed as a number of blocks
- MBRC = average multi-block read count for sequential read, in blocks (see parameter DB_FILE_MULTIBLOCK_READ_COUNT).
- MREADTIME = average time to perform a multi-block read at once (sequential read), in milliseconds
- SREADTIME = average time to read a single block (random read), in milliseconds
See PL/SQL Packages and Types Reference: DBMS_STATS
The single and multi-block read times are derived from two system statistics, the block size and the multi-block read count.
- SREADTIM = IOSEEKTIM + DB_BLOCK_SIZE / IOTFRSPEED
- MREADTIM = IOSEEKTIM + (DB_BLOCK_SIZE * MBRC) / IOTFRSPEED
Where
- IOSEEKTIM = Seek time + latency time + operating system overhead time, in milliseconds (default 10ms).
- IOTFRSPEED = I/O transfer speed in bytes per millisecond (or if you prefer KBytes/second)
- DB_BLOCK_SIZE = block size of the segment (usually 8Kb)
System statistics can be gathered based on actual system behaviour using DBMS_STATS, or set to pre-defined values using DBMS_STATS.GATHER_SYSTEM_STATS. Over the years many blogs, forums and presentations have discussed the merits or otherwise of collecting or setting system statistics.
Oracle's position is set out in the Oracle Optimizer Blog: Should You Gather System Statistics? It can be summarised as:
- Do not gather your own system statistics.
- Use the Oracle-provided defaults.
- Except on Exadata, where you can consider using the Exadata defaults, and perhaps not even then on a mixed workload. You will have to test this for yourself.
On any Oracle system, the default system statistics can be reset with
exec DBMS_STATS.GATHER_SYSTEM_STATS('NOWORKLOAD');
This sets the system statistics as follows:
- MBRC=8
- IOSEEKTIM=10
- IOTFRSPEED=10
Thus:
SREADTIM = IOSEEKTIM + DB_BLOCK_SIZE / IOTFRSPEED
= 10 + 8192 / 4096
= 12 (ms)
MREADTIM = IOSEEKTIM + (DB_BLOCK_SIZE * MBRC) / IOTFRSPEED
= 10 + (8192 * 8 ) / 4096
= 10 + 16
= 26 (ms)
However, on Exadata, you can set 'system statistics take into account the unique capabilities of Oracle Exadata, such as large I/O size and high I/O throughput'
exec DBMS_STATS.GATHER_SYSTEM_STATS('EXADATA');
Some system statistics are then set differently:
- MBRC=128
- IOSEEKTIM=10
- IOTFRSPEED=204800
Thus
SREADTIM = IOSEEKTIM + DB_BLOCK_SIZE / IOTFRSPEED
= 10 + 8192 / 204800
= 10.04 (ms)
MREADTIM = IOSEEKTIM + (DB_BLOCK_SIZE * MBRC) / IOTFRSPEED
= 10 + (8192 . 128 ) / 204800
= 10 + 5.1200
= 15.12000 (ms)
Now, I can put these numbers back into the formula Oracle uses to calculate the I/O cost of a full scan.
- IO Cost = (HWM / MBRC) . (MREADTIM / SREADTIM)
Let us suppose that we are going to read 100M blocks. The I/O cost of that scan will be very different with Exadata system statistics rather than the normal default system statistics.
Normal IO Cost = (100000000/8) . (26/12)
= 27,083,333.3
Exadata IO Cost = (100000000/128) . (15.12/10.04)
= 1,176,543.8
Thus, introducing Exadata system statistics significantly reduces the cost of the full scan, making the database more likely to use a full scan than index lookups. That may or may not be desirable.
-------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 428K| 93M| | 2834K (8)| 00:01:51 | | |
| 1 | HASH GROUP BY | | 428K| 93M| 111M| 2834K (8)| 00:01:51 | | |
|* 2 | HASH JOIN | | 428K| 93M| 46M| 2819K (8)| 00:01:51 | | |
| 3 | JOIN FILTER CREATE | :BF0001 | 428K| 41M| | 1476 (7)| 00:00:01 | | |
| 4 | PART JOIN FILTER CREATE | :BF0000 | 428K| 41M| | 1476 (7)| 00:00:01 | | |
|* 5 | TABLE ACCESS STORAGE FULL | PS_JRNL_HEADER | 428K| 41M| | 1476 (7)| 00:00:01 | | |
| 6 | JOIN FILTER USE | :BF0001 | 1120K| 136M| | 2817K (8)| 00:01:51 | | |
| 7 | PARTITION RANGE JOIN-FILTER| | 1120K| 136M| | 2817K (8)| 00:01:51 |:BF0000|:BF0000|
|* 8 | TABLE ACCESS STORAGE FULL | PS_JRNL_LN | 1120K| 136M| | 2817K (8)| 00:01:51 |:BF0000|:BF0000|
-------------------------------------------------------------------------------------------------------------------------
If I look at the same example query that I used earlier, then with the Exadata default system statistics, the cost has come down significantly (from 66M to 2817K). It is a significant improvement, but it is still greater than the cost of the nested loop (1730K). Therefore, for this query, I still only get this execution plan if I hint the statement to force it. I still need to make the full scan cheaper.
Different queries will have different costs and will flip between the nested loop and Full scan/Bloom filter/hash join at different points.
Non-Exadata System Statistics
exec DBMS_STATS.GATHER_SYSTEM_STATS('NOWORKLOAD');