In day to day life, when one says I need to “lookup my planner” to make an appointment, unless they are a celebrity whose every second is bought and very closely managed, they refer to a handful of things from which they need to lookup. While there are some of us who may note the birthdays or anniversaries of our close friends and relatives, some of us “cache” this information in their memory for instant access.
Most ETL tools come with transformations to perform lookup. In a “perfect” dimensional model, the ETL process will perform lookups on dimensional tables to lookup dimensional data. This is the traditional use of a lookup. A further and advanced definition and use of lookups is in detecting what the “update strategy” of an incoming data should be. While these are very common uses of lookup transformations, they aren’t always ideal.
Most lookup transformations support database table/view/synonym lookup, or flat file lookups – don’t quite know if any do on XML files yet. For tables/files with small foot print, lookups are very performance friendly. Most lookup implementations provide facilities to cache data persistently or in a static way. On the face of this, caching sounds very useful to improve performance of the ETL process and is easily bought as a means to improve performance. But as is the case with most features used in-appropriately, here is where lies a potential mine-field of performance bottlenecks.
Purely because of how easy they are to use and also because of some developers preferring it against an extra join in SQL, lookups are used not just to perform on smallish sets of data but on tables that would generally not qualify as lookups. This often happens when the developer is running against tight deadlines or because he or she does not have enough appreciation of the overall application and its performance requirements.
While there is no sure shot number of records for a table to be lookup-able, a general rule of thumb is to use lookups on fairly static data (non-transactional) e.g. location of a store and resist against any transactional type of table.
Use Undraleu® to perform an Automated Code Review of your Informatica PowerCenter Code and get a best practices perspective on the health of your ETL code