Monday, December 04, 2006
When talking about performance tuning, I will often tell developers that they need to use Oracle SQL*Trace and TKPROF because that is an absolutely reliable way to find out how their SQL executed, how long it took and hence where they should focus their attention to improve performance. However, that requires access to the trace files in the USER_DUMP_DEST directory on the database server, which is something that database administrators are justifiably reticent to provide. When I am on a customer site, I often experience the same difficulties.
Even if developers can get a log on to a development database server, trace files will only be readable on Unix servers within the DBA group unless _trace_file_public is set to true.
Another option is to retrieve trace files from the user_dump_dest directory via an external table. The idea is not mine. I've seen this technique demonstrated by Tanel Poder, there is an article by Jared Still that demonstrates how to retrieve the alert log, and there is also an posting on Ask Tom using utl_dir. However, I couldn't find a script to do trace files, so I wrote one.
The script assumes the trace file is created by the current session. The first thing it does is to create a database directory that maps to the USER_DUMP_DEST directory, and an external table that corresponds to the trace file. Then you can just query the trace file in SQL*PLus and spool the output to a local file.
Of course, this also shows that external tables are a significant security risk. The privilege to create and read from directories carefully controlled.
Wednesday, November 01, 2006
Here is an example. I will create two tables, and replicate them with Materialized Views. One by primary key, and the other by ROWID.
So the data has replicated and everything seems to working fine. Now lets truncate the source tables, and put some new data in. One row has the same key value, one does not.
The fast refresh of the Materialized View with the primary key appears to work. The new rows are inserted into the Materialized View, replacing existing rows with the same key values, but the old rows remain.
The fast refresh of the Materialized View by ROWID fails, and the old data remains in place.
But rows 1 & 2 are in MV but not source table
However, a full refresh corrects the discrepancy
It seems to me, that it would be appropriate to prevent a Truncate command executing if there is a Materialized View Log on the table. I have written such a trigger
The trigger MVTRUNC_LOCK can be downloaded from the Go-Faster website.
Wednesday, October 25, 2006
are maintained by PeopleTools, but which themselves are not maintained by PeopleTools. This is often as a result of performance tuning activities where you choose to use features of the Oracle database that PeopleSoft do not use because they are not available on other database platforms.
- Function Based Indexes: In Oracle it is possible to implement an index on a function. A typical example would be an index on UPPER(NAME) on PS_NAMES to facilitate case
- Record based auditing can be done with a database DML trigger, instead of the default functionality of the Application Server. This was implemented by PeopleSoft to improve performance of the auditing, and is a rare example of PeopleSoft coding specific code for each platform because the trigger DDL is slightly different.
In Oracle it is possible to build DDL triggers. Just as DML triggers fire when the data is changed, DDL triggers fire when particular DDL commands are issued. I have created a trigger called PSFT_DDL_LOCK (available from the Go-Faster website) that fires when an object is altered or dropped. In certain cases the trigger will raise an error, this causes the original DDL command to fail, and thus prevents loss of the unmanaged objects. If the table related to the object being dropped or altered is not managed by PeopleSoft (if it can't be found in PSRECDEFN), the trigger does not raise any error. Otherwise,
- If a trigger is being dropped or altered, and the name of that trigger does not start with PSU, then an error is raised. Triggers that are named PSU% are created by Application Designer for use with Mobile Agents If an index is dropped or altered, the trigger checks that it is defined in PeopleTools. Indexes that correspond to Unique, Duplicate and Alternate Keys in Application Designer (where the index name is 'PS', followed by either a digit or an underscore, followed by the record name) are ignored.
- If a table is dropped or altered, the DDL trigger checks than there are no user indexes or triggers not defined in PeopleSoft, nor any primary key constraints, materialized views or materialized view logs on the table. It also checks that the table or index is not partitioned, clustered, global temporary or index organised.
When an error is generated by this trigger during development or migration activities, it usually indicates that there is another database object that you need to consider before issuing the command that errorred. It is not simply a matter of disabling the trigger and trying again.
The trigger does have a couple of side effects.
- There are a number of SQL statements that are run in the trigger, and this does impact the performance of DDL commands. If you are dropping all the tables in a schema, then it would be advisable to disable the trigger. The trigger definitely needs the following function based index to be created on PSRECDEFN, because it needs to look up the PeopleSoft record from the table name.
ON psrecdefn (DECODE(sqltablename,' ','PS_'recname,sqltablename))
TABLESPACE PSINDEX PCTFREE 0;
- If a user index is removed from the PeopleTools definition before it is dropped, the trigger will raise an error. However, in such cases it would be better to keep the definition of the index in Application Designer and set the platform radio button to 'None' so that PeopleTools does not build it. That way a comment can be preserved to explain why the index is no longer necessary.
The behaviour can be reenabled like this:
The previous version of this trigger (called T_LOCK) did not have this capability, and so I suggested disabling the trigger, but this affected all sessions.
Thursday, October 19, 2006
But it is not completely true!
Long Character fields in PeopleSoft that are not defined as 'required' in the Application Designer are nullable in the database. Long Character field are usually created as LONG columns in the database unless their length is defined as between 1 and 2000 characters, in which case they are created as VARCHAR2 columns.
So a Long Character field of between 1 and 2000 characters becomes a nullable VARCHAR2 column in the database (it has taken me 10 years to find this, and it's yet another piece of errata in my book!).
The edit and key options in PeopleTools are restricted on a long field, but nonetheless it opens some interesting possibilities. Including sparse indexes on nullable VARCHAR2 columns, where you identify rows for a query with non-null flags, and set the flag to null afterwards, thus removing it from the index.
Wednesday, October 11, 2006
The story happens to be about a PeopleSoft system, but it could have happened to any application. It starts with a pair of ERP systems. Changes in the HR system generate cases in the CRM system that the back office HR helpdesk processes. The HR system sends a message to the CRM system, where it may in turn into one or more workflow messages. Some HR actions can give rise to many hundreds or even thousands of CRM workflows. For each Workflow, a run control record is created on a table which is keyed on the operator ID and a Run Control ID (an arbitrary ID to identify the control record).
The vanilla version of this code used a sequence number generated, in the usual PeopleSoft platform agnostic way, from a table. In extreme situations a batch HR process can generate many thousands of workflows. These messages are processed by the subscription servers of an application server domain. This system is busy enough to have several of these domains, each with several subscription handler processes. So isn't long before contention on the row level lock taken during sequence generation table became an issue. It was noticed that adding addition subscription handlers didn't increase throughput, and that database locking also impeded on-line users who would also execute this code.
So a developer changed the code and now the run control was a string derived by concatenating the operator ID (originally it used just the first three characters of the operator ID) and the current Time. In case this run control already existed (usually because a message generated several work flows) the code goes into a loop, generating a new run control from the operator and time each time. Adding more subscription handlers still doesn't increase message throughput - if anything it makes it worse.
Three more facts to bear in mind:
- The time is in the format HH24:MI:SS.
- The publish and subscribe application server processes all connect as the operator PSAPPS.
- The run control table is not purged.
So yesterday when user FRED generates a workflow at 10.27.42 the run control is 'FRED10.27.42'. If today he generates another workflow at 10.27.42 the code will loop until 10.27.43 until it can generate run control that has not been used. In fact, Fred can never generate a run control at 10.27.42 ever again. Also, if a single operation generates 3 run controls then it will take at least 2 extra seconds to allocate the second and third run control.
The situation for the message processing application server is even more severe. It can generate hundreds or even thousands of workflows in a evening. The subscription handlers execute the loop in the code until a time of day when they have not processed a message before.
On a day that I was on this site, I heard that a worker in the HR call centre had had nothing to do all day, and then at 5.30 in the evening, as she was about to go home, 15 new cases appeared on her list and she burst into tears (call centre workers are measured on the number of cases they process and the length of their queues). Sure enough, we found that there were over 50000 records, 3600 records per hour for user PSAPPS continuously from 3am to 5.30pm. When we got to 86400 records processing of workflow from inbound messages would have stopped completely! We also found that some operators have generated over 40 messages in a given minute and so the save time of some of their operations is starting to be seriously affected.
The correct solution to all of this would be to have used an Oracle sequence. You get a unique number every time, and there is no contention on its allocation. In this application there is no objection to any gaps in the sequence.
The original code suffered from database locking problems that effectively serialised run control processing. The code change worked well in test and initially worked fairly well in production. Although the message processing application servers could only produce a maximum of one workflow per second before they would start to execute the loop, which they could do over 100 times per second. Thus database locking was replaced with high CPU consumption in the application server as it looped, and additional SQL execution and logical I/O in the database leading to increased latch contention and also increasing database CPU consumption. As the problem became more severe, so other parts of the application also suffered as users waited for their requests to be processed by an available process in the application server. It all amounts to a viscous and downward spiral.
- Oracle recognised the problems of sequence generation, so they created Sequences. You've paid good money for them, so use them.
- While date and time functions can appear to produce unique values, you need to think carefully about what will happen in a busy multi-user system.
Tuesday, October 10, 2006
Another back door into PeopleSoft: I've been working on nVision recently. In PeopleTools 8.x, the nVision performance options are stored on the tree definition table, PSTREEDEFN (in previous versions, they were stored in the layout). The performance options should be set as follows
- Use Literal Values: The SQL generated by nVision uses a list of litteral values instead of a joining to one or more tree selector tables. This simplifies the query by reducing the number of tables queried, although the query will be much longer because there will be many criteria on litteral values select from those same tree selector tables. The fewer tables in the from clauses, the fewer way the optimizer will have to execute the statement incorrectly!
- Use Static Selectors: nVision queries data from the tree selector tables rather than build an extract of part of the trees. Sometimes nVision doesn't clear out this temporary data, usually when it errors for some reason. Hence, debris builds up in the tree selector tables over time an this degrades the performance of the queries.
- Use Between on the Selection Criteria: This doesn't improve performance, but merely shortens the length of the SQL statement.
Friday, October 06, 2006
It works by permitting the process to insert the data directly into new data blocks at the end of the object, rather than using space currently allocated to the object. So the new data is appended to the table. This is much faster because it permits Oracle to by-pass some of the processes that occur for each row inserted, such as index maintenance which only occurs when the insert is committed. Significantly, the hint also reduces the volume of redo written.
Every time a piece of data is changed in an Oracle database, the new data is written to the redo logs. This information can then be used to reapply changes to a database during media recovery, or it can be applied to a standby database as a part of a disaster recover solution (Oracle call this technique 'Physical DataGuard'). Every change, even to working storage tables and including indexes, generates redo information.
The reduction in redo can be very significant in systems using DataGuard across a WAN. Bandwidth constraints can cause a backlog of redo information waiting to be transmitted. In most systems the majority of redo is generated by overnight batch processes and it can take time during the day for that backlog to clear and for the DR system to catch up with production. Hence, reducing redo is important.
Therefore the APPEND hint should be used in insert statements when both of the following criteria are met:
- The table has been created or truncated prior to the insert. It is not sufficient to delete all the rows because the empty blocks remain allocated to the table.
- The table should be populated in a single monolithic insert statement. Multiple statements should be combined using the UNION ALL operator between the queries. Otherwise most of the benefit will be lost during multiple index maintenance phases.
The session statistics show the savings in a number of areas.
Fewer blocks are changed because they are not written and then rewritten as each row is inserted. Table blocks are only written once during the insert and the index is only maintained when the insert is committed.
- When using the APPEND hint, the insert should be committed immediately. It must be committed before the table can be referenced, otherwise an error will be generated.
- The data in the table will have been completely replaced by this point, so it is almost certainly appropriate to refresh the optimiser statistics. In Applicaition Engine use the %UpdateStats MetaSQL.
- The APPEND hint cannot be used in conjuction with the %InsertSelect MetaSQL because the hint MUST be placed between the 'insert' and 'into' keywords. NB: The statement will not fail if you get the hint wrong.
Thursday, August 17, 2006
This is because PSINDEXDEFN still contains flags for platforms that are no longer supported by PeopleSoft/Oracle and can no longer be set individually in Application Designer.
- PLATFORM_SBS: SQL Base
- PLATFORM_ALB: Allbase
- PLATFORM_DB4: DB2/AS400
However, Application Designer still checks the values of these flags and if any flags have different values, then the SOME radio button is selected. However, all six platform flags have the same value.
This is a nuisance, because PeopleSoft releases indexes for specific platforms, and some indexes are suppressed on certain platforms. It is not easy to see from the indexes view of Application Desginer whether this index will be built. It would be easier if it said ALL or NONE when it means that.
I have fixed this by updating PeopleTools tables as follows. The first query reports on the indexes where the supported platform flags all have the same value and one of the unsupported platform flags are different.
The following SQL commands set the SQL flags for the unsupported platforms to the value for the supported platforms. The version number on the record definition is updated so that PeopleSoft recaches the object.
The platform flags now say 'ALL'. Not a tremendous change, but at least I can immediately see that these indexes do build on all platforms without having to open each one.
Thursday, June 15, 2006
Application Engine batch timings are very useful metrics when tuning processes, but they are only written when the program ends, and they are only written to the timings tables in the database when a program completes successfully. I am repeatedly restarting the same process, gradually moving towards the end, so I will never get a complete timings report.
However, Application Engine programs writes a status string to table PSPRCSRQSTSTRNG which contains the step about to be executed. The value of this string can be seen in Process Monitor in the Process Detail page.
I created a table to which a trigger logs what steps are executed. I have also used a sequence number to produce a reliable primary key on that table because values in the string will repeat when Application Engine is in a loop or when a process is restarted process in which case it keeps the same process instance number. The sequence number is generated by an Oracle Sequence to avoid locking problems caused by using a table to generate a sequence (as PeopleSoft generally does). I am not worried by gaps in the sequence.
A trigger on PSPRCSRQSTSTRNG writes new values for PRCSRQSTSTRING to the log table. I have used an autonomous transaction in case the update rolls back after an error, and so that I can see the log record before Application Engine commits. I am also storing two timestamps when the audit record is written. The Oracle date type is only accurate to the second, but I have also used the newer timestamp datatype. On Oracle 9.2 (on Windows) I get millisecond accuracy, but on Oracle 10.2 I get microsecond accuracy.
Unfortunately, the status record is not always updated or deleted when the Application Engine program ends. Therefore, another trigger is required to delete the process record when the application updates its status to no longer processing.
This table can be used is various ways.
i) A simple report can be written to show the last 10 steps for each process instance, and their duration. I can run this while the program is executing to see how far it has got, and how long the current step has been executing.
ii) I can also aggregate the time for all the steps to determine what steps account for the longest execution time.
This method, although useful, is not foolproof. The status string is only updated if AE Restart is enabled, and then not for for every step followed by a COMMIT. I haven't yet been able to work out all the criteria for this.
Tuesday, May 16, 2006
I started on the basis that this is a performance issue. I took a minimalist approach that I would only suppress lookups that were consuming a significant amount of tine. So I looked at whether it was possible to use performance Monitor data to identify long running lookups. When you capture a PMU either through sampling or a performance trace you get a heirarchy of the following transactions when you performance a lookup.
- 401:Entry and exit for Component search and processing on AppSrv
- 403:Modal component/secondary page/lookup page from a Level 1
- 408:All SQL calls excluding PMUs 406 and 407
Performance monitor transactions are held in the table PSPMTRANSHIST. This table can become rather large, I found it easier to extract the transaction types that are of interest to me into another table, and then analyse that data.
I have constructed two queries to identify component look-ups that are acocunting for a large amount of response time. This first query simply sums the duration and counts the number of executions of each lookup record. Details of the component and page are also available on the table.
The other query additionally retrieves the SQL that was run, but this data is only available if the PMU trace was at Verbose level on the PSAPPSRV process.
ttitle 'SQL run by lookup - only with verbose or debug level tracing'
Suppressing the automatic lookup saves the user from waiting for an unnecessary query. The question is what is the criteria for adding a record to the lookup exclusion table.
- Certainly if a lookup record returns more than 300 rows it should not be executed automatically, because only the first 300 rows will be returned, and then the user will probably not have the row they are looking for and they will have to add search criteria and repeat the search.
- There is a good case for suppressing the automatic lookup when it returns more than 100 rows. The results will return up to 3 pages of 100 rows per page. Then the user may well have to navigate to a different page in order to find what they are looking for.
- Having discussed this issue with a few people, there is some support for the suggestion that the automatic search should be supressed for records with more than 25-50 rows. If there are more rows than this, the user is likely to have to scroll down the page the desired row.
Then all the records with lookup exclusion fields will be copied into our new working storage table.
Now, the table LUX_CAND contains a list of all the tables and view used as lookups, and the number of rows that they return (I cannot do anything about PeopleSoft dynamic views). The next thing is to add any record that returns too many rows into the lookup exclusion table.
Don't worry about an negative effects of having a large number records in the lookup exclusion table. This information is cached by the application server. Although there is no version number on PSRECXL, when you save an update to the lookup exclusion table, there is component PeopleCode that updates the SYS and PPC version numbers, which causes the application server to recache this table, so that changes take effect immediately. So when updating PSRECXL with this script, it is also necessary to increment the version numbers.
On my Financials 8.4 demo system I found 4089 records being used as lookup records. Of those 557 have over 300 rows, 1058 have over 100 rows, and 1711 records return over 25 rows.
On balance, I think that I would be prepared to add all 1711 records to the lookup exclusion table. It would save unecessary searches.
Saturday, May 06, 2006
This applies to PeopleSoft as much as anything else. In this posting I want to draw attention to the Lookup exclusion table. Lots of fields in PeopleSoft applications have lookups associated with them, indicated by the magnifying glass icon.
When you click on that you are taken to a lookup dialogue, but normally the search fires automatically. Sometimes the operator must wait a while for the SQL queries to return the data. If there is a lot of data only 300 rows are retrieved into the component buffer, and only the first 100 rows of that set are shown. For example, here is an example from Application Hire in HR illustrated with the demo database.
These results are usually useless to the operator who must enter criteria into the lookup search dialogue and search again. It would be better if the first automatic search could be suppressed. Fortunately, this can be done by putting the prompt table associated with the field on the lookup exclusion table.
Now the user goes straight into a waiting page and is prompted to enter search criteria to find the relavent data. Even better, the benefit to the user is immediate. You don't have to restart any servers for this configuration setting to be recognised, and for the behaviour of the lookup to change.
The lookup exclusion table is a very simple and quick way of improving performance by cutting out unnecessary work. It is also much easier to implement these changes because there are only configuration, rather than customisation.
Thursday, April 27, 2006
One of the things I am using this blog for is for addenda to the book. When I wrote the chapter about performance monitor I had never used it in a real production scenario. Since publication I have learnt a lot more about it. One of the things that I missed was transaction 116, 'Redirected round trip time (network latency)'.
When you log into the PIA the browser, the browser is redirected to another page. I understand this transaction to measure the time taken for the login page with the redirection tag to go from the web server to browser, to be processed on the browser, and for the browser to result the page to which it was redirected. Thus it measures the latency of a return trip on the network. This and PeopleSoft Ping are the only metrics that PeopleSoft generates that indicate performance of network between the web server and a user's workstation.
This transaction holds various pieces of information about the user's PC, including its IP address (although that might be the address of a firewall or load balancer), the user's Session ID on the webserver, and the user agent information, from which you can determine the browser and version, and the operating system of their workstation.
So now, if you know where a user is physically located, you may be able to detect a correlation between location and the round trip during. One limitation is that this transaction is only collected when a user logs in, and so you may not collect data points when you need them.
Wednesday, April 26, 2006
Here is a simple tip, but I (probably because I don't often migrate PeopleSoft projects myself) think it is rather significant.
Ideally, if you a setting non-default storage options such as PCTFREE on tables or indexes, of if you are compressing indexes, then you should use DDL Overrides in PeopleSoft Application Designer to manage these options so that when you build a build script, it contains the correct options. Otherwise, these settings could be lost when a build scripts is run.
However, but default the Application Designer does migrate these DDL overrides. When you are about to copy a project select the copy options, and under the 'General Options' tab select the 'Take DDL from Source' radio button.
Unfortunately, Application Designer does not remember this option, and you need to remember to set it every time you migrate a project.
So now you can put DDL overrides into development environemt and migrate them through into production
Tuesday, April 25, 2006
You have to configure it to self-monitor, ignoring all the warning messages.
The component trace indicates which analytic queries take the most time.
The Performance Monitor tables (PSPMTRANSHIST, PSPMTRANSARCH, PSPMEVENTHIST, PSPMEVENTARCH) are never updated. The performance collator application server process (PSPPMSRV) inserts data into the performance monitor tables, and later the performance monitor archive Appliction Engine process (PSPM_ARCHIVE) will insert them into the %ARCH tables, and delete them from the %HIST tables. Therefore, it is sensible to pack the data by minimising free space in these tables, and so reduce both logical and phyiscal I/O. So
I have reduced the free space in the data blocks (PCTFREE) to 0% on tables and indexes, and increased PCTUSED to 99%.
I have also found it beneficial to collect histograms on certain columns
If you have performance problems with Performance Monitor, remember that you can also use Performance Monitor to analyse its own analytics.
- Additional indexes and Function based indexes: During the course of performance tuning it is often necessary to build additional indexes. Ideally these indexes should be added via the Application Designer. However, a DBA might add an index directly to the production environment, and it may take time to get a project moved into that environment. The other scenario, is that the Application Designer cannot build function-based indexes. These are particularly useful to build upper case indexes on columns, to support case-insensitive searching.
- PeopleSoft delivers a mechanism to build triggers to perform DML auditing. However, if you then rebuild the underlying table, then the trigger will be lost, and there is nothing PeopleSoft to warn you of this, or to audit this situation.
My solution to both scenarios is to create a DDL trigger to prevent accidental ALTER and DROP commands on objects not defined by PeopleTools. When you do want to alter this objects you can then disable this trigger. So you now have a way of controlling when you can and can't drop additional triggers and indexes. You still have to remmeber to switch this trigger back on again!
- The trigger performs a number of test queries on PeopleTools or Oracle Catalogue views. There are different queries depending upon the type of object being ALTERed or DROPped. If a query returns a row then there is a problem, and an exception is raised. The queries also generate part of the error message that is returned.
- The trigger only affects operations on tables that are specified in PSRECDEFN as type 0 or 7 records. I have not added an handling for the additional instances of a PeopleSoft temporary table.
- The ora_% variables are described in the RDBMS documentation in Application Developer's Guide - Fundamentals. In the 9.2 documentation this is in chapter 16. Working with System Events Event Attribute Functions
So here are a few tests. I have created a function based index and a two triggers on PS_RT_RATE_TBL. I have chosen this table for the example because it also has a PeopleSoft generated trigger for Mobile agents.
So the t_lock trigger prevents me from droping or altering either the function based index or the table.
But I can drop any other index on the table that is maintained by PeopleTools.
I can't alter any trigger on the table except the PSU
Tuesday, April 18, 2006
The SQL Statement comes from a SQL exec in the DERIVEDRCSEARCH.SEARCH_BUTTON.FieldChange peoplecode, but it contains the %FirstRows metaSQL which controls how the rows are fetched.
The %FirstRows metaSQL is introducing the /*+ FIRST_ROWS */ hint on Oracle. On Microsoft SQLServer it evaluates to TOP(n), causing the query to return only the first n rows of the data set. This is a change in behaviour introduced in PeopleTools 8.46 prior to which this macro evaluated to a blank string when PeopleSoft runs on an Oracle database.
The solution to my performance problem was simply to remove the hint. The following table shows timings in SQL*Plus for the statement with and without the hints.
|Hint||Execution Time(UAT system)||Execution Time (PRD system)|
|No hint, any FIRST_ROWS(n), or ALL_ROWS||5s||<1s data-blogger-escaped-td="td">|
This hint is deprecated in Oracle 9i having been replaced with FIRST_ROWS(n). FIRST_ROWS is that it uses a mix of cost and heuristics to find a best plan for fast delivery of the first few rows. Oracle’s Performance Tuning and Planning manual was that “Using heuristics sometimes leads the CBO to generate a plan with a cost that is significantly larger than the cost of a plan without applying the heuristic. FIRST_ROWS is available for backward compatibility and plan stability”.
The problem is that the FIRST_ROWS hint includes some rules that override the usual cost-based behaviour, including that an index can be used to avoid a sort operation, no matter how expensive the path may be. In most situations FIRST_ROWS is simply an inappropriate hint in Oracle 9i.
I have found two support cases (200991709 and 200769258) where this hint was causing a problem, and it is probably in response to these, that there is now a workaround available in PeopleTools 8.46. A new parameter OracleDisableFirstRowsHint has been added to the Database Options section of both the Application Server configuration file (psappsrv.cfg) and the Process Scheduler configuration file (psprcs.cfg). This flag defaults to 0, and should be set to 1 in order to supress this hint.
However, PeopleTools 8.46 is not certified on any release of Oracle prior to 188.8.131.52.0, and the FIRST_ROWS hint was deprecated in Oracle 9i when it was replaced by FIRST_ROWS(n). MetaSQLs are there to produce appropriate platform specific code. It would be much better if this MetaSQL generates a FIRST_ROWS(n) hint in the first place, perhaps resulting in this
Wednesday, April 12, 2006
%CurrentDateIn MetaSQL prevents Oracle CBO from correctly evaluating selectivity of predicate (up to Oracle 9i)
Update 6.2.2009: The problem described in this posting is not an issue in Oracle 10g - see Changes in Calculation of Predicate Selectivity in Oracle 10g
I recently discovered the Oracle Cost Based Optimizer choosing a poor execution plan for a particular critical SQL statement in a CRM system because the expansion of the %CurrentDateIn macro is excessively complicated. The problem occurs in a delivered view PS_RBC_PACKAGE_VW and and a custom view PS_XX_RBCPKCLTR_VW. The views both contain a pair of date conditions, which are coded in line with PeopleSoft standards.
On an Oracle RDBMS, this expands to
With the result that this statement took over 15 seconds to execute. However, if the view is recoded as follows
Then the execution time fell to less than 1 seconds.
To explain why, I shall use a very simple example that is easy to reproduced. In the following script, I have created a table with a 1000 rows and a few columns. Column B is just for padding so that the rows are not unrealistically small. Columns C, D and E contain some dates. The data is evenly distributed in C and D. In E the distribution is deliberately uneven, there are more dates further in the past.
Lets look at the execution plans of a few simple SQLs. Both of the following queries return 4 rows, but the cardinality is very different. In the first statement I have used just the simple TRUNC(SYSDATE), the optimizer has correctly worked out that the query will return 4 rows. However, in the second I have used the expansion of the %CurrentDateIn macro. Because the predicate contains a function, the optimizer uses a hard coded guess that the selectivity of the condition is 5% of the table.
If I repeat the queries on column D, the cardinality goes up to 41 (it will in fact return 42 rows, but its close), but the cardinality is still 50.
If I now try column E, the calculated cardinality the calculated is now 115, although the queries actually return 22 rows. The optimizer makes a mistake because the data values are not evenly distributed between the high and low values for the column.
So this test illustrates that the expansion of %CurrentDateIn prevents Oracle’s CBO from evaluating the selectivity correctly, and that this causes it to use a guess that is sometimes too high, and sometimes too low, but always incorrect. In some cases this will lead to an inappropriate execution plan. If TRUNC(SYSDATE) were used, the optimizer would have better information about the SQL and be able to make better decisions.
I say that this is a bug in PeopleTools. There is, of course, a simple workaround. Simply code TRUNC(SYSDATE) in the application as a customisation where necessary. However, there is no reason why the expansion of this metaSQL could not be changed, and delivered as a PeopleTools patch.