Saturday, October 27, 2007

Record Locator Dialogue Behaviour

A little while ago I was asked whether Business Units need to be 5 characters for good performance. PeopleSoft used to recommend that SETIDs and Business Units should be defined as five characters.

I blithely responded that it was because if you put five characters into the search criteria of a five character field on a Record Locator Dialogue, then the resulting SQL would result in an equality condition rather than a LIKE.

If I search for a string shorter than the field, it generates a LIKE:

SELECT DISTINCT SETID ... WHERE SETID LIKE 'ABCD' || '%' ESCAPE '\' ...
ORDER BY SETID ...

If you search for a string equal in length to the field you get an equality (unless you specify something else in advanced search).

SELECT DISTINCT SETID ... WHERE SETID = 'ABCDE' ... ORDER BY SETID ...

That is how it was in PeopleTools 8.44, or at least that is what I wrote in my book (page 94).

But things change. I checking the behaviour in PeopleTools 8.48, and this is what I found:

Lets start with the Basic Search mode in the Record Locator Dialogue

The default operation in the basic search is 'begins with'. The search string is shorter than the field (11 characters in this case) and so we get a LIKE criteria, but there is a change here. The % has been put into the search string.

SELECT DISTINCT EMPLID, NAME, LAST_NAME_SRCH, SECOND_LAST_SRCH, NAME_AC
FROM PS_PERALL_SEC_SRCH
WHERE OPRID=:1 AND EMPLID LIKE '1%'
ORDER BY EMPLID
Bind-1 type=2 length=2 value=PS


However, if I fill the search field with 11 characters I still get a LIKE, and most significantly I no longer get an equality condition.

SELECT DISTINCT EMPLID, NAME, LAST_NAME_SRCH, SECOND_LAST_SRCH, NAME_AC
FROM PS_PERALL_SEC_SRCH
WHERE OPRID=:1 AND EMPLID LIKE '11111111111%'
ORDER BY EMPLID
Bind-1 type=2 length=2 value=PS


However, in Advanced Search, I get exactly what I ask for.
'Begins with' always results in a LIKE condition, irrespective of the length of the search string.

SELECT DISTINCT EMPLID, NAME, LAST_NAME_SRCH, SECOND_LAST_SRCH, NAME_AC
FROM PS_PERALL_SEC_SRCH
WHERE OPRID=:1 AND EMPLID LIKE '11111111111%'
ORDER BY EMPLID
Bind-1 type=2 length=2 value=PS


'Contains' always produces a LIKE condition, with % symbols at both ends, again irrespective of the length of the search string.

SELECT DISTINCT EMPLID, NAME, LAST_NAME_SRCH, SECOND_LAST_SRCH, NAME_AC
FROM PS_PERALL_SEC_SRCH
WHERE OPRID=:1 AND EMPLID LIKE '%11111111111%'
ORDER BY EMPLID
Bind-1 type=2 length=2 value=PS


If I specify '=', then I get an equality condition in the SQL, but the search criteria also becomes a bind variable in the SQL. This is different from all other cases, where the search criteria appears in the SQL as a literal value.

SELECT DISTINCT EMPLID, NAME, LAST_NAME_SRCH, SECOND_LAST_SRCH, NAME_AC
FROM PS_PERALL_SEC_SRCH
WHERE OPRID=:1 AND EMPLID=:2
ORDER BY EMPLID
Bind-1 type=2 length=2 value=PS
Bind-2 type=2 length=1 value=1


Only on a mixed case field, does PeopleSoft append the % in a separate string, because both the column and the search string are forced into upper case.

SELECT DISTINCT EMPLID, NAME, LAST_NAME_SRCH, SECOND_LAST_SRCH, NAME_AC
FROM PS_PERALL_SEC_SRCH
WHERE OPRID=:1
AND UPPER(NAME) LIKE UPPER('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa') || '%' ESCAPE '\'
ORDER BY NAME, EMPLID
Bind-1 type=2 length=2 value=PS


Implications

So does it matter that PeopleTools uses a LIKE instead of an equality? Sometimes it could, but not always. Here is a very simple test where I search for a single employee by the unique key columns, first with an equality
SELECT emplid FROM ps_personal_data WHERE emplid = 'SFCH00034'
/

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 1 (0)| 00:00:01 |
|* 1 | INDEX UNIQUE SCAN| PS_PERSONAL_DATA | 1 | 8 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("EMPLID"='SFCH00034')

Statistics
----------------------------------------------------------
2 consistent gets

And here is the same search with a LIKE criteria
SELECT emplid FROM ps_personal_data WHERE emplid LIKE 'SFCH00034%'
/

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 2 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| PS_PERSONAL_DATA | 1 | 8 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("EMPLID" LIKE 'SFCH00034%')
filter("EMPLID" LIKE 'SFCH00034%')

Statistics
----------------------------------------------------------
3 consistent gets

In this case, although both queries return the same result, it made the difference between a index unique scan and a single fetch operation, and an index range scan and a second fetch, requiring another consistent read. Although, I can also produce tests which produce the same execution plan and number of consistent reads for = and LIKE.
The LIKE operation does cause an additional filter operation, but this is only a small CPU overhead.

However, this does mean that the advice about having 5 character SETIDs and Business Units is no longer valid.

Tuesday, October 02, 2007

Dirty Reads in PeopleTools 8.49

In PeopleTools 8.49 you can configure the application server to use Dirty Reads on SQL Server and DB2/OS390. Let's start with a quick review of database isolation levels. If you run PeopleSoft on an Oracle database, you hardly ever have to worry about isolation levels, because Oracle defaults to Read Committed, where all transactions are transactionally or read consistent, and that doesn't bring any scalability problems. (In writing this note, I have spent a fair amount of time reading various Oracle and SQL Server websites. It has been an excellent revision!) What is read consistency? Well, if I start a long-running query at say 09.00, and then I update some data at 09.01 and commit that update, and then at 09.02 the query comes to the data that was changed at 09.01, it will report the data as at 09.00. So the set of data that is retrieved by the long-running query is consistent with itself. On Oracle, this is done by keeping information to reverse the change, essentially the old data values, in the undo (or rollback) segments from which it can construct, in memory, a read consistent version of data blocks as the start of the transaction. So at 9.02, Oracle creates a copy of the block in the buffer cache and uses the undo information to roll it back to the state it was in when the query began at 9.00, and then it reads the data from that read consistent copy. Thus, Oracle is capable of maintaining read consistency without locking, so that writers never block readers, and readers never block writers. Oracle also supports two other isolation levels:
  • Serializable: queries cannot see changes committed after the transaction began, but before the current query began.
  • Read-Only: as Serializable but no INSERT, UPDATE or DELETE statements are permitted.
Oracle has not explicitly implemented Repeatable Read as an isolation level, but the same effect can be achieved if you use SELECT ... FOR UPDATE. However, the other database platforms that are certified for PeopleSoft have additional isolation levels that can be set at database and session-level (and hence in the application). In SQL Server the isolation modes are:
  • Read Uncommitted (also known as Dirty Read): Queries simply read whatever is in the data blocks, so it is possible to read changes in the database that have not been committed and that may potentially be rolled back!
  • Read Committed (SQL Server default): Uncommitted changes cannot be read, queries can be blocked by shared locks created by updates unless using Read Committed Snapshot, or other transaction in higher isolation levels.
  • Read Committed Snapshot (available SQL Server 2005): row versioning is used to produce a transactionally consistent snapshot of the data. This is effectively equivalent to Read Committed on Oracle.
  • Repeatable Read: Locks are placed on all data that is read.
  • Serializable: Locks are placed on all data that is read, and rows cannot be inserted ranges of data queried by other statements.
  • Snapshot: Changes made after the start of the transaction cannot be seen.
PeopleSoft running on Microsoft SQL Server 2000 and Sybase has always had scalability problems with high concurrency because read consistency is achieved by the use of locking. This is addressed in SQL Server 2005 with the READ_COMMITTED_SNAPSHOT option, but this version of the database is not certified below PeopleTools 8.47.
Dirty Reads in PeopleTools 8.49 
Dirty-read mode can be specified for the application server:
[PSQRYSRV]
;=========================================================================
; Settings for PSQRYSRV
;=========================================================================
...
; Use dirty-read(uncommitted read) for PSQRYSRV only on DB2/OS390 or SQL Server
Use dirty-read=0
It can also be set for scheduled queries run by Application Engine in the Process Scheduler:
[PSAESRV]
...
;-----------------------------------------------------------------------------------------
; Setting to control dirty read for Scheduled PSQueries.
ScheduledQuery-DirtyRead = 0
This is what Enterprise PeopleTools 8.49 PeopleBook: System and Server Administration says:

"Use Dirty-Read: Enter 1 to enable the application server to read uncommitted data from a table. Enter 0 to disable dirty reads.

This parameter can be used for general reporting and PeopleSoft Query. You can run dirty read queries through the application server, the Process Scheduler, and in a two-tier connection. The Use Dirty-Read setting in the application server configuration controls the behaviour of PSAPPSRV, PSQCKSRV, and PSQRYSRV.

Note. Dirty reads are not recommended if you are reading data and doing subsequent processing based on the disposition of the data at the time that it is read. Between the time the data is read by a subsequent process and the time the unit of work is completed by the first process, any activity affecting the table data at the time a subsequent process read could be rolled back, invalidating the accuracy of the data that a subsequent process read." However, the parameter only appears in the query server section (PSQRYSRV) of the application server configuration file (psappsrv.cfg), so I suspect that it only applies to PS/Query queries (and therefore Crystal and BI Publisher reports) and nVision reports. Although, I haven't yet found confirmation of that in PeopleBooks or other documentation. The delivered comment in the configuration file says that dirty-reads apply to DB2/OS390 and SQL Server, but the documentation doesn't mention which databases it applies to. Dirty-reads do not occur in Oracle, but this isolation level is also available on Sybase and Informix. 
Conclusion You can get around the locking problems in SQL Server 2000 and DB2, but at a price. Your reports may not show you entirely accurate data! But, if you are running PeopleTools 8.49 on SQL Server 2000, I would question why you are not running SQL Server 2005.

Monday, October 01, 2007

PeopleTools 8.48: Descending Indexes are back!

In Application Designer, you can mark a key field as descending. When that record is used in a scroll on a page, PeopleTools automatically orders the query generated to populate the scroll in key order, and that includes specifying descending order for descending keys. The JOB record in HR is a good example.
On entering the JOB component, the PIA generates the SQL to retrieve the job history for a particular employee. The two known keys are specified in the where clause, but all the keys were used to generate the ORDER BY clause. EFFDT and EFFSEQ are descending keys on the JOB record, so the query is in descending order on those columns.

SELECT EMPLID,...
FROM PS_JOB
WHERE EMPLID=:1
AND EMPL_RCD=:2
ORDER BY EMPLID, EMPL_RCD, EFFDT DESC, EFFSEQ DESC

The descending key attribute has another purpose. Prior to PeopleTools 8.14, if you specified a key field as descending in Application Designer, that column would also be indexed in descending order where it appeared in indexes. However, PeopleSoft removed this functionality because of Oracle Bug 869177 which in most versions of Oracle 8.1 could cause ORA-3113 when accessing the index.

Now that Oracle has fixed the bug in the database, PeopleTools once again builds indexes with descending columns on Oracle RDBMS from PeopleTools 8.48.

However, as always, there is a snag (or I wouldn't have bothered to write this note). The name of the descending column is not reported in catalogue views. Instead the column name is replaced with a system generated column name.

SELECT table_name, index_name, column_position
,      column_name, descend
FROM   dba_ind_columns
WHERE  index_name = 'PS_JOB'

TABLE_NAME INDEX_NAME COLUMN_POSITION COLUMN_NAME  DESCEND
---------- ---------- --------------- ------------ ------------
PS_JOB     PS_JOB                   1 EMPLID       ASC
PS_JOB     PS_JOB                   2 EMPL_RCD     ASC
PS_JOB     PS_JOB                   3 SYS_NC00163$ DESC
PS_JOB     PS_JOB                   4 SYS_NC00164$ DESC

This could cause problems with SQL scripts that use the catalogue views. For example, I have had to change my DDL Trigger to protect database objects not managed by Application Designer.

So how do you find the name of the underlying column? You have to look in the sys owned catalogue tables. If you look at the query in $ORACLE_HOME/rdbms/admin/catalog.sql at the code behind the view, you find that the column names are stored on sys.col$. There is another column on that table called DEFAULT$, which stores the original column name.

COLUMN property        FORMAT 999999
COLUMN table_owner     FORMAT a8  HEADING 'Table|Owner'
COLUMN default$        FORMAT a8
COLUMN column_name     FORMAT a12 HEADING 'Column|Name'
COLUMN index_name      FORMAT a6  HEADING 'Index|Name'
COLUMN table_name      FORMAT a6  HEADING 'Table|Name'
COLUMN column_position FORMAT 99  HEADING 'Column|Position'
COLUMN column_length   FORMAT 999 HEADING 'Column|Length'
COLUMN char_length     FORMAT 999 HEADING 'Char|Length'
COLUMN descend         FORMAT a7

select /*i.obj#
,      io.name   index_owner
,      */idx.name  index_name
,      bo.name   table_owner
,      base.name table_name
,      decode(bitand(c.property, 1024), 1024,
(select decode(bitand(tc.property, 1), 1, ac.name, tc.name)
 from   sys.col$ tc, attrcol$ ac
 where  tc.intcol# = c.intcol#-1
   and  tc.obj# = c.obj#
   and  tc.obj# = ac.obj#(+
   and  tc.intcol# = ac.intcol#(+)),
decode(ac.name, null, c.name, ac.name)) column_name
,      c.property, c.default$
,      ic.pos#  column_position
,      c.length column_length
,      c.spare3 char_length
,      decode(bitand(c.property, 131072), 131072, 'DESC', 'ASC') descend
from sys.col$ c, sys.obj$ idx, sys.obj$ base, sys.icol$ ic,
sys.user$ io, sys.user$ bo, sys.ind$ i, sys.attrcol$ ac
where ic.bo# = c.obj#
and decode(bitand(i.property,1024),0,ic.intcol#,ic.spare2) = c.intcol#
and ic.bo# = base.obj#
and io.user# = idx.owner#
and bo.user# = base.owner#
and ic.obj# = idx.obj#
and idx.obj# = i.obj#
and i.type# in (1, 2, 3, 4, 6, 7, 9)
and c.obj# = ac.obj#(+)
and c.intcol# = ac.intcol#(+)
and io.name = 'SYSADM'
and idx.name = 'PS_JOB'
and bo.name = 'SYSADM'
and base.name = 'PS_JOB'
/

Index  Table    Table  Column                           Column Column   Char
Name   Owner    Name   Name         PROPERTY DEFAULT$ Position Length Length DESCEND
------ -------- ------ ------------ -------- -------- -------- ------ ------ -------
PS_JOB SYSADM   PS_JOB EMPLID              0                 1     11     11 ASC
PS_JOB SYSADM   PS_JOB EMPL_RCD            0                 2     22      0 ASC
PS_JOB SYSADM   PS_JOB SYS_NC00163$   196904 "EFFDT"         3     12      0 DESC
PS_JOB SYSADM   PS_JOB SYS_NC00164$   196904 "EFFSEQ"        4     34      0 DESC

My thanks to Noons for pointing out that you can get the underlying column or expression from DBA_IND_EXPRESSIONS.

Using v$session_longops to find long running SQL statements

v$session_longops is a dynamic performance view in which Oracle reports on the status of operations that run for more than 6 (elapsed) seconds. Various operations, including queries are automatically reported in this view. Applications can also mark operations as long running with DBMS_APPLICATION_INFO.SET_SESSION_LONGOPS.

I recently spent a day on a site (not a PeopleSoft system) watching a single update statement that eventually ran for 27 hours. I used this view to predict when the SQL statement would complete.

You can see incomplete long running operations by querying long operations where the work done so far is less than the total amount of work.

SELECT * FROM v$session_longops WHERE sofar < totalwork;

SID SERIAL# OPNAME
---------- ---------- -----------------------------------------------------------
TARGET
----------------------------------------------------------------
TARGET_DESC SOFAR TOTALWORK UNITS
-------------------------------- ---------- ---------- --------------------------
START_TIME LAST_UPDATE_TIME TIME_REMAINING ELAPSED_SECONDS CONTEXT
------------------- ------------------- -------------- --------------- ----------
MESSAGE
---------------------------------------------------------------------------------
USERNAME SQL_ADDRESS SQL_HASH_VALUE QCSID
------------------------------ ---------------- -------------- ----------
24 12722 Table Scan
R2LLMS.TBL_TMX_REGTRANSACTION
14407 24935 Blocks
01:25:40 28/09/2007 17:35:44 28/09/2007 42533 58204 0
Table Scan: R2LLMS.TBL_TMX_REGTRANSACTION: 14407 out of 24935 Blocks done
R2LLMS C00000012FE2A170 861482451 0

The operations remain in this view long after they have completed, so you can use it to get a feel for what long operations occur on a sytem. But there are a fixed number of slots for v$session_longops, determined by ksulomux_. The entry for a completed operation remains in the view until the slots is reused. They are not reused in order of allocation, instead the array element to be reused is "probably determined by a MOD function" (My thanks to Tanel Poder for his clarification). So v$session_longops contains both current incomplete long operations and an arbitrary sample of completed long operations.

You can use the SQL_ADDRESS and SQL_HASH_VALUE to get the SQL statement from the library cache.

SELECT l.sid, l.sofar, l.totalwork, l.start_time, l.last_update_time, s.sql_text
FROM v$session_longops l
LEFT OUTER JOIN v$sql s ON s.hash_value = l.sql_hash_value
AND s.address = l.sql_address
AND s.child_number = 0

SID SOFAR TOTALWORK START_TIME LAST_UPDATE_TIME
------ ---------- ---------- ------------------- -------------------
SQL_TEXT
--------------------------------------------------------------------
90 19014 26171 17:48:25 28/09/2007 17:48:37 28/09/2007
SELECT DISTINCT (PROCESSID) FROM PSAPMSGPUBHDR WHERE MACHINENAME=:1
AND PUBSTATUS IN (:"SYS_B_0", :"SYS_B_1") ORDER BY PROCESSID

Aggregate total amount of time spent on SQL statements. But there is a problem, SQL statements can produce multiple long operations that may or may not occur concurrently.

SELECT  l.sql_hash_value
, SUM(l.last_update_time-l.start_time)*86400 secs
, COUNT(*) execs
FROM v$session_longops l
GROUP BY l.sql_hash_value
ORDER BY secs

But what if the statement isn’t in the library cache we can try and get it out of statspack! This SQL function looks up the SQL statement by hash value and assembles the pieces up to 4000 characters

CREATE OR REPLACE FUNCTION perfstat.get_sqltext(p_hash_value NUMBER)
RETURN VARCHAR2 IS
l_sql_text VARCHAR2(32767) := '';
l_sql_left NUMBER := 4000;
BEGIN
FOR i IN(SELECT * FROM perfstat.stats$sqltext
WHERE hash_value = p_hash_value ORDER BY piece
) LOOP
IF l_sql_left > 64 THEN
l_sql_text := l_sql_text || i.sql_text;
ELSIF l_sql_left > 0 THEN
l_sql_text := l_sql_text || SUBSTR(i.sql_text,1,l_sql_left);
END IF;
l_sql_left := l_sql_left - LENGTH(i.sql_text);
END LOOP;
RETURN l_sql_text;
END get_sqltext;
/
show errors
GRANT EXECUTE ON perfstat.get_sqltext TO PUBLIC;

This version gets the SQL statement from library cache, and if it isn't there it looks it up from the statspack data.

spool longops
SELECT l.*, NVL(s.sql_text
, perfstat.get_sqltext(l.sql_hash_value)) sql_text
FROM (
SELECT l.target, l.operation, l.sql_hash_value
, SUM(secs) secs, SUM(execs) execs
FROM (
SELECT l.sid, l.serial#, l.sql_address, l.sql_hash_value
, l.target, l.operation
, MAX(l.last_update_time-l.start_time)*86400 secs
, COUNT(*) execs
, SUM(totalwork) totalwork
FROM (
SELECT l.*
, SUBSTR(l.message,1,instr(l.message,':',1,1)-1) operation
FROM v$session_longops l) l
GROUP BY l.sid, l.serial#, l.sql_address
, l.sql_hash_value, l.target, l.operation
) l
GROUP BY l.target, l.operation, l.sql_hash_value
) l
LEFT OUTER JOIN v$sql s ON s.hash_value = l.sql_hash_value
--AND s.address = l.sql_address
AND s.child_number = 0
ORDER BY secs desc
/
spool off

This is some sample output. This particular statement has been submitted by the publication servers 25 times with a cumulative execution time of 878 seconds.

TARGET
----------------------------------------------------------------
OPERATION
------------------------------------------------------------------
SQL_HASH_VALUE SECS EXECS
-------------- ---------- ----------
SQL_TEXT
------------------------------------------------------------------
SYSADM.PSAPMSGPUBCON
Table Scan
1297842315 878 25
SELECT :"SYS_B_0", MIN(B.MSGNAME), MIN(A.PUBCONSTATUS), COUNT(*)
FROM PSAPMSGPUBCON A, PSAPMSGPUBHDR B WHERE A.PUBID = B.PUBID AND
A.PUBNODE = B.PUBNODE AND A.CHNLNAME = B.CHNLNAME AND A.CHNLNAME
IN (SELECT S.CHNLNAME FROM PS_AMM_CHNL_SECVW S WHERE S.OPRID =
:"SYS_B_1") AND A.LASTUPDDTTM >= TO_DATE(SUBSTR(:"SYS_B_2",
:"SYS_B_3", :"SYS_B_4"), :"SYS_B_5") GROUP BY B.MSGNAME,
A.PUBCONSTATUS ORDER BY MIN(B.MSGNAME), MIN(A.PUBCONSTATUS)