This post draws together all the recommendations for managing ad hoc user queries in PeopleSoft PS/Query. Some points relate to the database, some to the configuration of the middleware (Application servers and process schedulers), and some are human issues rather than technical ones.
Different timeouts are set in different places, but some of these timeouts need to align.
Long-running queries mostly consume CPU on the database, and in the cloud, most of what you pay for is CPU. Nobody would want to allow an unconstrained query load to lead to an increase in the cloud subscription requirements!
Recommendations
- Do not increase the PIA inactivity timeout or PS/Query timeouts. They are both delivered set at 20 minutes. They are primarily there to protect users and the system as a whole.
- If you have already increased them, you may need to bring them down in stages, allowing users to get used to the new limits and handle exceptional queries.
- If you permit users to use PS/Query, then the PIA inactivity timeout should be the same ICQuery Tuxedo service timeout set on the PSQRYSRV server (default 20 minutes)
- If you do not use online PS/Query, then the PIA timeout can be reduced as low as the PSAPPSRV timeout (default 5 minutes).
- It is not possible to regulate when users run queries in the PIA. Users, quite legitimately, also need to run queries that may take longer than the timeouts. Encourage users to schedule such queries on the process scheduler.
- You can restrict the number of online queries that execute concurrently on the database by restricting the maximum number of PSQRYSRV processes. There is still nothing to prevent users from starting queries which then just queue up on the application server. Users cannot distinguish between a query that is queuing in the application server or executing on the database. The service timeout starts from when the service request is queued in the application server.
- However, you still don’t want even scheduled queries to run for hours on end. Set a maximum processing time in the process definition for the PSQUERY application engine. Say 4 hours. The process scheduler will terminate the PSQUERY process after this amount of time.
- A common cause of long-running PS/Queries is a mistake in the query definition, resulting in missing or incorrect joins in the SQL.
- A SQL query can continue to run on the database server after the client-side process, the PSQRYSRV server process or the PSQUERY application engine process, has terminated, at least until the current fetch returns. A common cause is that a large sort or hash operation must complete before the first fetch returns.
- Enable dead connect detection by setting SQLNET.EXPIRE_TIME in SQLNET.ORA. The idle Oracle shadow process will periodically poll the client process to check that it is still responsive. If a PSQUERY process has been terminated by the process scheduler, this mechanism should also terminate the session if it is still running on the database.
- Use the database resource manager to allocate and manage CPU allocation and limit run time.
- The resource manager only intervenes when the system runs out of free CPU.
- Define a hierarchy of resource manager consumer groups. Map database sessions to consumer groups using session instrumentation. In general, online processing is higher priority than scheduled processes, which are higher priority than ad-hoc queries.
- Set a maximum CPU/runtime limit for scheduled queries. Using different consumer groups for online and scheduled batch queries.
- For scheduled queries, match the consumer group timeout to the maximum runtime for the PSQUERY application engine.
- For online queries, match the consumer group timeout to the PIA inactivity timeout (which should also be the same as the ICQuery Tuxedo service timeout).
- See A Resource Manager CPU Time Limit for PS/Queries Executed Online in the PIA
- NB: Resource Manager is an Oracle Enterprise Edition Feature.
- On Exadata, use SQL Quarantine to identify queries with a track record of exceeding the consumer group timeout.
- Enable SQL Quarantine for scheduled queries only using a trigger on PSPRQSRQST – see Github: set_prcs_sess_parm_trg.sql
- Quarantined queries are immediately terminated with error ORA-00040: active time limit exceeded - call aborted. For scheduled queries, this error can be found in the process scheduler message log.
- Enable SQL Quarantine for SQL Developer, etc, using an on-logon trigger – see Github: set_sess_parm_trg.sql
- Add run control logging for scheduled PS/Queries
- Whenever a scheduled PS/Query, run via PSQUERY, terminates other than successfully, the SQL query can be found in the message log. However, bind variable values are not recorded.
- Bind variable values may be changed between executions of the same query using the same run control by the same user. Enable run control logging so that it is possible to reconstruct the query and bind variables run by the user.
- There is nothing to stop the user from altering the PS/Query between executions, in which case you may need to get the SQL of a long-running query that completes successfully from AWR.
- It is not possible to allocate PS/Queries to a different temporary tablespace because that is determined by the database user ID. Everything in PeopleSoft runs as the Owner ID (usually SYSADM). Unless you are faking a reporting database with a second schema pointed to by a second entry in PS.PSDBOWNER.
No comments :
Post a Comment