However, I have recently noticed that the DDL models have been swapped over. I have commented on this elsewhere, but I felt it needed a separate posting.
I am certain that this is a mistake, but it is at least one that can be easily corrected by PeopleSoft customers. The problem is not obvious because the full compute DDL model actually only uses a 1% sample, and the automatic sample size calculated by Oracle is usually within an order of magnitude of this value, though it is often greater than 1%.
So let me be absolutely clear here that:
This can be easily verified. I wrote a simple Application Engine that collected statistics on two tables via the %UpdateStats macro. I implemented the delivery DDL models as specified in ddlora.dms. This is the Application Engine trace file.
...
-- 17.35.40 .(DMK.MAIN.Step01) (SQL)
RECSTATS PSLOCK LOW
/
-- Row(s) affected: 1
/
/
Restart Data CheckPointed
/
COMMIT
/
...
-- 17.35.41 .(DMK.MAIN.Step02) (SQL)
RECSTATS PSVERSION HIGH
/
-- Row(s) affected: 1
/
/
Restart Data CheckPointed
/
COMMIT
/
...
Unfortunately the %UpdateStats macro is not fully traced in the PeopleTools trace either (it also reports the same information as the AE Trace file). The only way I know to find out what is being submitted to the database is to enable Oracle SQL Trace, and look in the trace file.
...
=====================
PARSING IN CURSOR #2 len=155 dep=0 uid=39 oct=47 lid=39 tim=259825298401 hv=1993983003 ad='6c1382b4'
BEGIN DBMS_STATS.GATHER_TABLE_STATS (ownname=> 'SYSADM', tabname=> 'PSLOCK', estimate_percent=> 1, method_opt=> 'FOR ALL COLUMNS SIZE 1', cascade=> TRUE); END;
END OF STMT
PARSE #2:c=31250,e=386098,p=25,cr=105,cu=0,mis=1,r=0,dep=0,og=1,tim=259825298396
...
=====================
PARSING IN CURSOR #2 len=193 dep=0 uid=39 oct=47 lid=39 tim=259826057420 hv=2784637395 ad='6c138098'
BEGIN DBMS_STATS.GATHER_TABLE_STATS (ownname=> 'SYSADM', tabname=>'PSVERSION', estimate_percent=> dbms_stats.auto_sample_size, method_opt=> 'FOR ALL INDEXED COLUMNS SIZE 1', cascade=> TRUE); END;
END OF STMT
PARSE #2:c=0,e=2195,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=259826057415
...
=====================
PARSING IN CURSOR #2 len=155 dep=0 uid=39 oct=47 lid=39 tim=259825298401 hv=1993983003 ad='6c1382b4'
BEGIN DBMS_STATS.GATHER_TABLE_STATS (ownname=> 'SYSADM', tabname=> 'PSLOCK', estimate_percent=> 1, method_opt=> 'FOR ALL COLUMNS SIZE 1', cascade=> TRUE); END;
END OF STMT
PARSE #2:c=31250,e=386098,p=25,cr=105,cu=0,mis=1,r=0,dep=0,og=1,tim=259825298396
...
=====================
PARSING IN CURSOR #2 len=193 dep=0 uid=39 oct=47 lid=39 tim=259826057420 hv=2784637395 ad='6c138098'
BEGIN DBMS_STATS.GATHER_TABLE_STATS (ownname=> 'SYSADM', tabname=>'PSVERSION', estimate_percent=> dbms_stats.auto_sample_size, method_opt=> 'FOR ALL INDEXED COLUMNS SIZE 1', cascade=> TRUE); END;
END OF STMT
PARSE #2:c=0,e=2195,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=259826057415
...
So, you can see that RECSTATS HIGH corresponds to the default estimate, but RECSTATS LOW corresponds to the 1% sample size.
If you look in ddlora.dms you can see that model 4 is the 1% sample and model 5 is the default estimate.
...
4,2,0,0,$long
DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=> [TBNAME], estimate_percent=> , method_opt=> 'FOR ALL COLUMNS SIZE 1', cascade=> TRUE);
//
5,2,0,0,$long
DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=> [TBNAME], estimate_percent=> dbms_stats.auto_sample_size, method_opt=> 'FOR ALL INDEXED COLUMNS SIZE 1', cascade=> TRUE);
//
...
4,2,0,0,$long
DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=> [TBNAME], estimate_percent=> , method_opt=> 'FOR ALL COLUMNS SIZE 1', cascade=> TRUE);
//
5,2,0,0,$long
DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=> [TBNAME], estimate_percent=> dbms_stats.auto_sample_size, method_opt=> 'FOR ALL INDEXED COLUMNS SIZE 1', cascade=> TRUE);
//
...
So you can also see how the DDL models have been confused, which as I have commented I consider to be a typographical error, and that should really have been 100% indicating a full compute. I think it would make sense to change the ddlora.dms script to read as follows:
......
INSERT INTO PSDDLMODEL (
STATEMENT_TYPE,
PLATFORMID,
SIZING_SET,
PARMCOUNT,
MODEL_STATEMENT)
VALUES(
:1,
:2,
:3,
:4,
:5)
\
$DATATYPES NUMERIC,NUMERIC,NUMERIC,NUMERIC,CHARACTER
...
4,2,0,0,$long
DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=> [TBNAME], estimate_percent=>dbms_stats.auto_sample_size, method_opt=> 'FOR ALL COLUMNS SIZE AUTO', cascade=> TRUE);
//
5,2,0,0,$long
DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=> [TBNAME], estimate_percent=>, method_opt=> 'FOR ALL COLUMNS SIZE AUTO', cascade=> TRUE);
//
/
...
INSERT INTO PSDDLMODEL (
STATEMENT_TYPE,
PLATFORMID,
SIZING_SET,
PARMCOUNT,
MODEL_STATEMENT)
VALUES(
:1,
:2,
:3,
:4,
:5)
\
$DATATYPES NUMERIC,NUMERIC,NUMERIC,NUMERIC,CHARACTER
...
4,2,0,0,$long
DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=> [TBNAME], estimate_percent=>dbms_stats.auto_sample_size, method_opt=> 'FOR ALL COLUMNS SIZE AUTO', cascade=> TRUE);
//
5,2,0,0,$long
DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=> [TBNAME], estimate_percent=>, method_opt=> 'FOR ALL COLUMNS SIZE AUTO', cascade=> TRUE);
//
/
...
Or, if you use the wrapper SQL that I proposed in %UpdateStats() -v- Optimizer Dynamic Sampling
...
4,2,0,0,$long
wrapper.ps_stats(p_ownname=> [DBNAME], p_tabname=> [TBNAME], p_estimate_percent=> 0);
//
5,2,0,0,$long
wrapper.ps_stats(p_ownname=> [DBNAME], p_tabname=> [TBNAME], p_estimate_percent=> 1);
//
...
4,2,0,0,$long
wrapper.ps_stats(p_ownname=> [DBNAME], p_tabname=> [TBNAME], p_estimate_percent=> 0);
//
5,2,0,0,$long
wrapper.ps_stats(p_ownname=> [DBNAME], p_tabname=> [TBNAME], p_estimate_percent=> 1);
//
...
I have kept the 1% sample size for the compute model, but there is no reason why you could not choose a larger value. If you wanted %UpdateStats([table],HIGH) to continue to mean a full compute, then the value really should be 100%. It is really a matter of how long you want to spend analysing statistics on tables during batch programs. However, a higher sample size will not necessarily produce statistics that will lead to a better execution plan!
Exactly how Oracle calculates the default sample size is not published. Values in the range 0.5% to 10% are typical. In a perverse sense, a 1% sample size will usually be smaller sample than the Oracle default sample size, so in the delivered DDL models, %UpdateStats([table],HIGH) will usually use a larger sample size that %UpdateStats([table],LOW)! However, I simply cannot believe that this is what was intended.
You can see the value that Oracle calculates for auto_sample_size by tracing the dbms_stats call, and looking for the sample clause in the recursive SQL. Eg.
... from "SYSADM"."PSPCMNAME" sample ( .9490170771) t
1 comment :
Thanks for posting this. I've struggled with this problem for a couple years now. I'd gotten excited about the prospect of dynamic sampling after a conversation with Wolfgang at Hotsos a couple years back, but then ran into the ORA-20005 and was told to get my nose out of the developers' business.
Are you really sure 4 and 5 have been swapped?
in 8.46, 4 is analyze estimate and 5 is analyze compute.
In the ddlora.dms I've got in front of me for 8.49, 4 is estimate 1%, and 5 is estimate with auto sample size. I figured they're using auto sample to do better than 1% but avoid the hit of a full compute.
Am I just muddled?
Post a Comment