Mostly-transparent memoization in Ruby

Here’s an easy technique for automatically memoizing the results of method calls in Ruby. Let’s say that we’re interested in looking up instances of the Employee class by their employee ID numbers via a class method called find_first_by_empid. Furthermore, this lookup might have to hit the disk or the network, so we want to invoke find_first_by_empid at most once for each employee ID.

We might want to just store employees in a hash, indexed by employee ID. If the hash had an entry for a given ID, we’d know that we had already looked up the employee with that ID. If not, we’d know that we needed to look up the employee with that ID and then place the employee object in the hash. This solution is straightforward, but could prove tedious to implement in practice. Fortunately, Ruby provides a mechanism to make this essentially transparent.

When you create a Hash in Ruby, you can create it in one of several ways. You can just call the constructor Hash.new (or {}), in which case attempting to look up a key that is not in the table will return nil. You can provide a single default object as an argument to the constructor, which will be returned instead of nil as the value for keys not in the table. Finally, you can provide a block that describes how to calculate the value for a missing key. This last option is interesting for the memoization application, since it mimics a straightforward implementation of call-by-need.

employees = Hash.new { |hash, key| hash[key] = Employee.find_first_by_empid(key) }

The above code snippet shows an example of this technique. When we ask employees for an employee ID that it doesn’t already contain, it will invoke the block with itself and the missing key as arguments. In executing the block, employees will insert a key-value pair in itself corresponding to the missing key and the result of the method call with that key as an argument.

There are some caveats here. First, you need to make sure that your arguments are hashable. It is also somewhat clunkier to deal with multiple arguments, since they must be in a list. Also, it must be safe to memoize the function: that is, as long as the hash exists, the memoized method must return the same value every time it is invoked with a given argument. This technique isn’t perfect (and almost certainly isn’t novel), but it is a pretty slick way to keep from repeatedly invoking expensive methods and depends on nothing more than Ruby’s standard library.