Earlier this week I had to write an agent to read a CSV file and create Notes documents based on the content. Nothing difficult, this is something most of us do all the time. However, each entry in the CSV file had a producer code. I needed to do a lookup into the database and get the matching producer (also called agency) document and read a particular field from it. This field had to be stored in the new document I was creating, to tie the new entries to an existing producer/agency.
There were 3686 producers in the database, and 47731 entries in the CSV file. In other words, there were several entries per producer.
One solution is to load all agency documents into a list, then perform the lookup against the list using IsElement(). Since the number of producers are much fewer than the number of entries in the CSV file, I figured that I would save some substantial time vs. performing separate lookups for each entry.
I wrote three different version of the importer, as identical as possible.
The first version performed a view lookup for each entry in the CSV file to get the producer. It finished in 520 seconds.
The second version used a NotesViewEntryCollection to loop through all producers and load them into a list, then processed the CSV and used the list to get the producer documents. This one finished in 257 seconds.
The third version is using a combination. I do not pre-load the list with producer documents, instead I use IsElement() on each entry in the CSV file to check if the list already contains an element for the producer. If not, I perform the lookup and store the document into the list, to use next time the same producer is needed. This agent finished in 260 seconds.
The third version is the approach I have been using in the past. The benefit with that approach is that I only load the producers where I actually have data. Suppose only 1000 different producer codes are listed in the CSV file, I will just perform 1000 lookups, instead of 3686.
I use TeamStudio Profiler to time and profile my code, and I noticed something interesting. When the document was loaded into the list in program two and three, the time it took to read the value and store it into the new document was much shorter than when I had just performed the lookup, even if the code was identical. You can see it in the screenshots below. I hope André Guirard or someone else that knows more about the internals of Lotusscript can explain this. See update at the end of this article!
Version two – Preloading all producers before import (257 seconds to execute):
Version three – Lookup once, store in list for later use (260 seconds to execute):
Update: I just talked to André Guirard about this. I isolated the slow code to doc.GetItemValue(), and André explained that the slowness I see is because of how Notes documents are handled. The note is not "cracked open" until an item is needed. So when I store the note in the list (example 2 and 3), it is opened once (which is slow), and then the following times I access the same document it is very fast, as it already is open. In the code with individual lookups (example 1), the document has to be "cracked open" over and over again, as the object is recreated for every entry.