Idempotency – a key to better code
Recently I found a term that intrigued me. Idempotency.
From the web I saw this definition that I liked: In computing, an idempotent operation is one that has no additional effect if it is called more than once with the same input parameters.
Much of my computing experience has been dedicated to the transformation of data. And I have written countless routines that transform data in one way or another, from constructing schedule applications to assigning inventory to orders. I support many applications that manage data in various ways with various syntaxes and data structures in various applications. Invariably, those sections of the code that keep me up at night and cause me the most angst in support, are those that were developed without this concept of Idempotency incorporated into their design. When one makes a routine that operates on a data set, if you do not consider what happens if that routine runs again, it will eventually cause you grief in a support call. I learned long ago that operations of these sort must be designed to not accumulate their effect on the data. They must be ‘smart’ enough to not cause errors when they are run again, because in complex systems the control of when the routines are run may not be under the control of the programmer (think jobs or multi user UI), or even if it is, the programmer may end up calling it again for other reasons. If the routine does not adhere to the concept of Idempotency, it can be very tricky to understand how the data go into the sate it is in when the user calls. Often my most difficult troubleshooting issues are these types of problems. So I read with keen interest about this concept that was well enough defined to help me keep the concept in the forefront when designing new applications.
Some examples when using Idempotency is critical are: netting inventory, re-scheduling activities, parsing address data into fields, and in some cases, adding records to a record set. In all these examples, the code needs to be aware of whether the transformation operation was already performed.
Adding Records to a data set: Let’s say you are accumulating records in a data set from various data sources, like names from each department’s employee databases. If you have already appended the finance department’s data to the master table, then appending it again will cause duplicates. Obviously there are many techniques to prevent duplicates in a database, but let’s explore how Idempotency can help. If the appending routine is designed with Idempotence, it can be run anytime, and as many times as you like without adverse effect (like duplicates). To incorporate this into the append routine, ensure your data set has a text field to hold the name of the source of the data. I usually put in the name of the action query or stored procedure that creates the records. Then the first part of the routine can query the data set to see if this action has been run previously, and if so, either terminate or remove the records before executing the append of new records. In this way, running the routine multiple times for the finance department will replace the finance department’s names in the master table.
Netting Inventory: When dealing with inventory, I typically read the value form the system of record, and netting happens there. However, let’s say you need to carry a book inventory value in your local system, and net that inventory as the user enters adjustments to it every day. The netting logic can be complex. It begins with the starting inventory, and adjustments are accumulated and applied to become a new starting inventory value. If the adjustments are applied to the starting inventory more than once then the value will drift away from reality making it unusable. To prevent this, and apply the concept of Idempotency, I carry three inventory fields: Inventory (both the starting inventory and the resulting adjusted inventory), Original Inventory, and Adjustments to Inventory. When the adjustment field changes via the UI, I replace the Original Inventory field with the contents of the Inventory field. After this I can apply the transformation (repeatedly) to the entire data set to calculate the Inventory = Original + Adjustment. Additionally, I time-stamp the record when the transformation is applied, and when the Adjustment is entered. The UI can compare the Adjustment time-stamp to the Transformation time-stamp to see how to treat the Adjustment, either as a replace or an accumulation. If the Adjustment time stamp is later than the Transformation time-stamp this means that the Transformation has not yet been run to use this Adjustment. In this case, the UI might accumulate any new user adjustment into the field. If the transformation has already been run, then the UI would replace the Adjustment.
Aspen SCM (MIMI) Scheduling routines: Another area where this concept of Idempotency is important is when using some scheduling techniques in Aspen SCM Plant Scheduler. Sometimes it is interesting to move all the activities off of one or more reactor facilities to a temporary holding place (similar to a queue) to be able to re-schedule them one by one on the best reactor at the most appropriate time. This is a powerful technique to allow the routine to prioritize activities to meet customer demand, and maximize the capacity utilization on the reactors. However, if Idempotency is not considered during the design of this routine, the results can be devastating to the quality of the schedule. Lets say the routine fails during the re-scheduling portion of the routine. The reactors are partially filled, and the temporary holding place is loaded with activities. Since multiple reactors are the source of the activities, the temporary holding facility would be overloaded in time, having activities that extend beyond the end of the scheduling horizon. Executing the routine again when starting in this state would erase all of the activities on the temporary holding place, thus erasing much of the schedule. Incorporating Idempotency into the routine would mean considering the path to recovering these activities in the case of a failure or re-running the routine.
It turns out there are several other related terms that are interesting as well: Again from the web: read about them here:
NULLIPOTENT: If an operation has no side effects, like purely displaying information on a web page without any change in a database (in other words you are only reading the database), we say the operation is NULLIPOTENT. All GETs should be nullipotent. Otherwise, use POST.
IDEMPOTENT: A message in an email messaging system is opened and marked as “opened” in the database. One can open the message many times but this repeated action will only ever result in that message being in the “opened” state. This is an idempotent operation.
NON-IDEMPOTENT: If an operation always causes a change in state, like POSTing the same message to a user over and over, resulting in a new message sent and stored in the database every time, we say that the operation is NON-IDEMPOTENT.
Reading about and exploring these terms has reinforced and put a name to a concept that through experience I have come to understand has major consequences. Now that I can name the concept, hopefully I can be more concise in explaining to others the need this concept addresses, and write better code too.
Jim Piermarini – Profit Point Inc.