Wednesday, October 28, 2009

Database Links and PeopleSoft

I have seen Oracle database links used in conjuction with PeopleSoft at a number of customer sites over the years. I think some scenarios are examples where it is reasonable to use a database link, some are not.
In Oracle RDBMS, a database link (see also the Oracle Concepts Manual) specifies how a database server can create a session on another database in order to perform a remote operation. The remote database may be another Oracle database, or another type of database. The remote operation may be a query, or it may modify remote data (DML operation), in which case a two-phase commit mechanism ensures transaction integrity across the databases.

PeopleSoft does not use database links between databases because they are a database platform specific technology. Instead, the PeopleSoft methodology is to replicate data between databases with an application message. This works well with small data sets, and with larger data sets that change slowly. However, there are scenarios with very large data volumes where Application Messaging will struggle to keep up, and Oracle RDMBS specific replication technologies are an appropriate alternative.

Data can be replicated between databases with Materialised Views (formerly known as Snapshots). This is a SQL based technology. Materialised Views Logs on the source database track changes in the source table. An incremental or full refresh process is run on the target, usually as a scheduled job.

Oracle introduced an alternative replication technology in Oracle 9i called ‘Streams’. This uses supplementary redo logging on the source database which can then be applied to the target. Additional database server processes transmit, receive and apply this redo stream. This technology is aimed a large volume replication for data warehouses. I know of one site where several hundred database tables are replicated from PeopleSoft HR to a data warehouse.

I think the clearest way is to explain the good and bad use of links is by example.

At two different sites with HR and Finance, I have seen similar examples of both good and bad use of database links (one was on PeopleTools 7.0, before Application Messaging was available).
  • GL data needs to be sent from Payroll in the HR database to GL in the Financials database. The PeopleSoft delivered mechanism was to use generate and reload an interface file. This customer changed replaced the interface table in HR with a view that referenced a table with the same name and structure on the Financial system via a link. The payroll GL extract process now inserted directly into the table on Financials. I think this is a perfectly reasonable use of a database link. It was simple. It performed better than the interface file, and would certainly have been faster than application messaging. 
  • However, the same customer also used the employee expenses module in Financials. This required employee personal data. So they made the PS_PERSONAL_DATA table in Financials a view of the corresponding table in HR. The result was that every time somebody opened the expenses component in Financials, the application server process referenced the database link to HR. There is a limit on the maximum number of database links that a session can concurrently open (OPEN_LINKS) and that the whole database instance can concurrently open (OPEN_LINKS_PER_INSTANCE). They both default is 4, which gives an indication of how Oracle expect you to use this feature. This system ran out of database links. No Oracle errors were generated, instead sessions wait to be allowed to make the connection to the remote database. There are specific database link wait events that report the amount of time lost.  Eventually you reach the point where Tuxedo services timeout and then users receive errors. I think this is an example of the wrong way to use a database link. Instead I think that the data should have been replicated from HR to Financials. In a modern PeopleSoft system this should be done with an application message (on the 7.0 system a Materialised View could have been used). 
  • Not only were database links used to provide data to components, but the same views, that referenced remote objects, were used in queries and reports resulting in complex distributed queries. When multiple remote objects are referenced in a single SQL, I have seen Oracle decide to copy the whole of one or more of these objects to the local temporary tablespace. This does not result in good performance. Remember also, that even query only operations via a database link create a transaction on the remains until a commit or rollback is issued.
  • I worked on another site where Materialised Views were used to incrementally replicate data from CRM and HR to EPM databases. Processes in the EPM database then referenced the Materialised Views. In this system, there was never any question of EPM processes referencing objects via the links. The data volumes were such that Application Messaging would never have coped. In this case only the Materialised View refresh process references the database links, and so the DBA can manage the usage of links. This is a system where (after upgrade to Oracle 10g) Streams could also have been used to replicate the data.

My view is that database links should not be used to directly support functionality in either on-line components or reporting. When this is done in reports and queries it presents the problem of tuning distributed queries, and having to decide whether local or remote database should drive the query. Instead data should be replicated to the local database, preferably by PeopleSoft messaging, but if necessary by Oracle replication. However, I think that it can be reasonable to use a database link in an occasional batch process that moves data between systems.


Anonymous said...

I think you need to consider the cost of development and maintenance if you’re going to recommend building messages everywhere in order to support ad hoc queries. You can end up having a ton is integration points that need to be supported for management reporting... I don't think that is very realistic... If it's an online page that is heavily used I completely agree that the data needs to be in the local db...

Unknown said...

I would like to ask for a small clarification. The Oracle Grid Monitor throws up some of the queries run internally by PeopleSoft as bad queries. Is there any way where we can go and edit these queries to improve the performance?

David Kurtz said...

It depends. Some queries come from within the application and can be changed via the PeopleSoft Application Designer, some come from the ad-hoc PS/Query tool and can be changed on-line, and some queries are coded deep within PeopleTools, and you cannot change them.

You need to work out where the query came from. It is a subject that I addressed in the book, but this presentation on my website might give you some help working out where these problem queries are coming from:*Fingerprints

Unknown said...

Thanks a tonne. The Presentation really gave me some useful insights. But unfortunately for me I found that the queries are coming from deep within PeopleTools. These queries usually dont use bind variables and my DBAs claim that this is the reason why the query is slow. Since there is no way for me to edit the query I am pretty much helpless now :(.

David Kurtz said...

I'd be interested to see what is causing you problems. But let's move this off the blog. Can you either mail me direct, or post to PeopleSoft DBA forum (

Anonymous said...

Hi David,

Instead of using EIP integration points for our HR and CRM application , we are planning to use the DB2 views for full data load as well as incremental load. Seeing the volume of our organisational data we are thinking that full data load using EIP can cause issue around performance and service availablity but at the same time is it possible to use the EIP business logic using DB2 views to get data from HCM table at the realtime basis for our CRM instance?


Tyler van Vierzen said...

Thanks for this concise set of best practices, we follow the same guidelines but still repeatedly have the "why can't we just have a dblink?" conversation (argument). It's helpful to have something to point to. We do exactly as you say -- try our best to keep it at the application level, and if that's not possible (due to frequent updates on a large data set, for example), we will replicate it around via oracle goldengate. Preferably to a central data store so that everyone can take advantage of the co-located data, rather than just from PSFT database to PSFT database. We reserve dblinks for non-production / troubleshooting when at all possible. They're just too hard to support (performance, etc) and they multiply like rabbits if you open the door to them in Production. They're too "easy" to implement, but they are costly down the road in terms of support and tuning.