Thursday, June 26, 2008

Discrepancy in Timings between Application Engine and Oracle SQL Trace

I would like to draw attention to the risk of uncritically accepting performance information from either a tool or an instrumented program. It is always necessary to consider whether you can believe what you are being told, or whether something is lying to you.

In this case, I want to highlight in the Application Engine Timings report. Take the following example from Time and Labor. The AE Timings report claims that the process spends 91.5% of the execution time in SQL. This would lead you to believe that this process is affected by the performance of the SQL, and that you need to do some SQL performance tuning.

---------------------------------------------------------------------------------
Total run time : 1411.6
Total time in application SQL : 1291.7 Percent time in application SQL : 91.5%
Total time in PeopleCode : 119.4 Percent time in PeopleCode : 8.5%
Total time in cache : 0.1 Number of calls to cache : 3
---------------------------------------------------------------------------------

However, all is not quite as it appears. Faced with a potential SQL problem, I usually turn to Oracle SQL Trace. The following is taken from a TKPROF analysis of the trace file for the same process.

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 153883 1.80 2.41 0 4 0 0
Execute 192806 429.94 464.96 8398 276000 588543 123140
Fetch 52740 2.75 3.55 590 202419 0 15265
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 399429 434.49 470.93 8988 478423 588543 138405

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 196521 0.00 0.13
SQL*Net message from client 196521 0.32 849.98
...
SQL*Net more data to client 114884 0.00 1.91
SQL*Net more data from client 250825 0.05 1.88
...

The database reports 849.98s for the idle event SQL*Net message from client, and only 470.93s on non-idle activity. So only 36% of the total elapsed time is spent in the database, the other 64% is time that the Application Engine process is busy executing code.

Why the discrepancy?

This is certainly a very chatty process. There are nearly 200,000 round-trip SQL*net messages between AE and Oracle. It could be a network latency problem. However, very little time is spent on the SQL*Net more data events so it is unlikely that there is a significant network component.

Let's look at the statements that have the longest times in AE timing report .

                         PeopleSoft Application Engine Timings
(All timings in seconds)

C o m p i l e E x e c u t e F e t c h Total
SQL Statement Count Time Count Time Count Time Time
------------------------------ ------- -------- ------- -------- ------- -------- --------
PeopleCode

DELETE PS_SCH_ADHOC_DTL 36540 60.6 0 0.0 60.6
INSERT PS_SCH_ADHOC_DTL 71887 885.6 0 0.0 885.6
Record.SelectByKey SCH_ADHOC_DTL 36596 118.0 36596 0.0 118.0
--------
1291.7

In less than 24 minutes, some statement have been executed very frequently. So, there were lots of SQL*Net messages, because there were lots of SQL statements. These SQL statements are submitted from PeopleCode. It is very likely we are in some sort of procedural loop, and therein lies the problem.

PeopleSoft programs can only measure the time taken to execute SQL from the perspective of the client program. In this case the timings are calculated by instrumentation inside Application Engine and PeopleCode. Meanwhile, the timings in the Oracle trace are obtained from instrumentation within the Oracle kernel.

There is still a fair amount of code in PeopleSoft that executes during the time counted as execution of SQL Statements. This is not just the SQL*Net libraries. I think that the timings include most of the time it takes for sqlexec() and Row set functions to execute. The discrepancy with the Oracle time mounts with every SQL operation.

In this case, there is very little to be gained by tuning the SQL. We can't do much about the PeopleCode because it is delivered vanilla functionality. All we can do is to bring more CPU to bear upon the problem by running multiple concurrent instances of the process.

2 comments :

Unknown said...

You can use Java object as well to do file processing. I saw good tips at www.itwisesolutions.com/PsftTips.html website. Here is the snippet, how we can use Java

Local JavaObject &javaFile;&javaFile = CreateJavaObject("java.io.File", "c:\temp\mytextfile.txt");
&javaFile.delete();

-larry

d said...

interesting article