Tuesday, January 08, 2008

Oracle/PeopleSoft have mixed up DDL Models used by %UpdateStats from PeopleTools 8.48

Last May, I wrote about Changes to DDL Models in PeopleTools 8.48. DDL models 4 and 5 are used by the %UpdateStats PeopleCode macro. Previously, PeopleSoft had delivered these models with ANALYZE TABLE commands. Now, in line with long standing Oracle RDBMS guidance, they call DBMS_STATS (see $PS_HOME/script/ddlora.dms). I certainly welcome that change.

However, I have recently noticed that the DDL models have been swapped over. I have commented on this elsewhere, but I felt it needed a separate posting.

I am certain that this is a mistake, but it is at least one that can be easily corrected by PeopleSoft customers. The problem is not obvious because the full compute DDL model actually only uses a 1% sample, and the automatic sample size calculated by Oracle is usually within an order of magnitude of this value, though it is often greater than 1%.

So let me be absolutely clear here that:

  • Model 4 is used by %UpdateStats([table],LOW);


  • Model 5 is used bt %UpdateStats([table],HIGH);


  • This can be easily verified. I wrote a simple Application Engine that collected statistics on two tables via the %UpdateStats macro. I implemented the delivery DDL models as specified in ddlora.dms. This is the Application Engine trace file.

    ...
    -- 17.35.40 .(DMK.MAIN.Step01) (SQL)
    RECSTATS PSLOCK LOW
    /
    -- Row(s) affected: 1
    /
    /
    Restart Data CheckPointed
    /
    COMMIT
    /
    ...
    -- 17.35.41 .(DMK.MAIN.Step02) (SQL)
    RECSTATS PSVERSION HIGH
    /
    -- Row(s) affected: 1
    /
    /
    Restart Data CheckPointed
    /
    COMMIT
    /
    ...


    Unfortunately the %UpdateStats macro is not fully traced in the PeopleTools trace either (it also reports the same information as the AE Trace file). The only way I know to find out what is being submitted to the database is to enable Oracle SQL Trace, and look in the trace file.

    ...
    =====================
    PARSING IN CURSOR #2 len=155 dep=0 uid=39 oct=47 lid=39 tim=259825298401 hv=1993983003 ad='6c1382b4'
    BEGIN DBMS_STATS.GATHER_TABLE_STATS (ownname=> 'SYSADM', tabname=> 'PSLOCK', estimate_percent=> 1, method_opt=> 'FOR ALL COLUMNS SIZE 1', cascade=> TRUE); END;
    END OF STMT
    PARSE #2:c=31250,e=386098,p=25,cr=105,cu=0,mis=1,r=0,dep=0,og=1,tim=259825298396
    ...
    =====================
    PARSING IN CURSOR #2 len=193 dep=0 uid=39 oct=47 lid=39 tim=259826057420 hv=2784637395 ad='6c138098'
    BEGIN DBMS_STATS.GATHER_TABLE_STATS (ownname=> 'SYSADM', tabname=>'PSVERSION', estimate_percent=> dbms_stats.auto_sample_size, method_opt=> 'FOR ALL INDEXED COLUMNS SIZE 1', cascade=> TRUE); END;
    END OF STMT
    PARSE #2:c=0,e=2195,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=259826057415
    ...


    So, you can see that RECSTATS HIGH corresponds to the default estimate, but RECSTATS LOW corresponds to the 1% sample size.
    If you look in ddlora.dms you can see that model 4 is the 1% sample and model 5 is the default estimate.

    ...
    4,2,0,0,$long
    DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=> [TBNAME], estimate_percent=> , method_opt=> 'FOR ALL COLUMNS SIZE 1', cascade=> TRUE);
    //
    5,2,0,0,$long
    DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=> [TBNAME], estimate_percent=> dbms_stats.auto_sample_size, method_opt=> 'FOR ALL INDEXED COLUMNS SIZE 1', cascade=> TRUE);
    //
    ...


    So you can also see how the DDL models have been confused, which as I have commented I consider to be a typographical error, and that should really have been 100% indicating a full compute. I think it would make sense to change the ddlora.dms script to read as follows:

    ......
    INSERT INTO PSDDLMODEL (
    STATEMENT_TYPE,
    PLATFORMID,
    SIZING_SET,
    PARMCOUNT,
    MODEL_STATEMENT)
    VALUES(
    :1,
    :2,
    :3,
    :4,
    :5)
    \
    $DATATYPES NUMERIC,NUMERIC,NUMERIC,NUMERIC,CHARACTER
    ...
    4,2,0,0,$long
    DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=> [TBNAME], estimate_percent=>dbms_stats.auto_sample_size, method_opt=> 'FOR ALL COLUMNS SIZE AUTO', cascade=> TRUE);
    //
    5,2,0,0,$long
    DBMS_STATS.GATHER_TABLE_STATS (ownname=> [DBNAME], tabname=> [TBNAME], estimate_percent=>, method_opt=> 'FOR ALL COLUMNS SIZE AUTO', cascade=> TRUE);
    //
    /
    ...


    Or, if you use the wrapper SQL that I proposed in %UpdateStats() -v- Optimizer Dynamic Sampling

    ...
    4,2,0,0,$long
    wrapper.ps_stats(p_ownname=> [DBNAME], p_tabname=> [TBNAME], p_estimate_percent=> 0);
    //
    5,2,0,0,$long
    wrapper.ps_stats(p_ownname=> [DBNAME], p_tabname=> [TBNAME], p_estimate_percent=> 1);
    //
    ...


    I have kept the 1% sample size for the compute model, but there is no reason why you could not choose a larger value. If you wanted %UpdateStats([table],HIGH) to continue to mean a full compute, then the value really should be 100%. It is really a matter of how long you want to spend analysing statistics on tables during batch programs. However, a higher sample size will not necessarily produce statistics that will lead to a better execution plan!

    Exactly how Oracle calculates the default sample size is not published. Values in the range 0.5% to 10% are typical. In a perverse sense, a 1% sample size will usually be smaller sample than the Oracle default sample size, so in the delivered DDL models, %UpdateStats([table],HIGH) will usually use a larger sample size that %UpdateStats([table],LOW)! However, I simply cannot believe that this is what was intended.

    You can see the value that Oracle calculates for auto_sample_size by tracing the dbms_stats call, and looking for the sample clause in the recursive SQL. Eg.

    ... from "SYSADM"."PSPCMNAME" sample ( .9490170771) t

    1 comment :

    Anonymous said...

    Thanks for posting this. I've struggled with this problem for a couple years now. I'd gotten excited about the prospect of dynamic sampling after a conversation with Wolfgang at Hotsos a couple years back, but then ran into the ORA-20005 and was told to get my nose out of the developers' business.

    Are you really sure 4 and 5 have been swapped?
    in 8.46, 4 is analyze estimate and 5 is analyze compute.
    In the ddlora.dms I've got in front of me for 8.49, 4 is estimate 1%, and 5 is estimate with auto sample size. I figured they're using auto sample to do better than 1% but avoid the hit of a full compute.

    Am I just muddled?