Implementing type translation
In earlier posts we introduced the concepts of type widening and type translation, discussed support for these in existing database systems, and presented a general approach to implementing type widening. In this post, we’ll extend our simple interpreter with support for type translations.
Adding functions to SimpleInterpreter
While we could implement type translation for AddExpression
, we’d want to do so in such a way as to resolve the ambiguity inherent in adding a StringType
and an IntType
: should we coerce the StringType
to an integer value or convert the IntType
to a string representation? Put another way, should "12" + 3
evaluate to "123"
or 15
?^{1} While this is an interesting aesthetic question, we’ll not treat it further in this post.
Instead, we’ll extend our simple interpreter trait with support for functions:
Notice that we don’t declare BinaryFunction
as a case class, since we might want to declare other case classes that inherit from it.^{2} We could thus use BinaryFunction
directly, like this:
Alternatively, we could implement subclasses of BinaryFunction
with bakedin function parameters, as follows:
Note that, in this case (with no type coercion), we use a single class representing both a function and a function application. We’ll be changing that shortly.
Combining widening and function application
We’ll first produce an interpreter that combines both type widening and function application, using our “unzipped” widening interpreter, which finds widenings one parameter at a time, as a starting point.^{3} In the following example, we’ve created separate classes representing function definitions (BinaryFunction
) and function application expressions (BinaryFunctionAppl
). Each function definition has formal parameter types (that is, the parameter types it expects) and a result type; each function application has actual parameter types, which are the types of the expressions supplied as actual parameters. In a function application, we denote the formal parameter types as Lf
and Rf
and the actual parameter types as La
and Ra
. (As usual, these types can be equal, but they need not be.) Just like Apache Hive does when automatically wrapping UDF
functions in the GenericUDF
interface, we define the formal parameter types based on the formal parameter types of the underlying function implementation.
In order to have a valid function application, we need two witness functions (here called lconv
and rconv
) to convert values of the actual parameter types La
and Ra
to values of the formal parameter types Lf
and Rf
. We can see how this works in practice by defining a maximum
function over DoubleType
values and applying it to IntType
values:
Note that the problem of widening actuals to formals is even more straightforward than finding the least upper bound type of two operands in an addition, since we know what the formals are expected to be and thus know immediately whether we have relevant widenings or not.
Adding translation
The final enhancement to our simple interpreter is to add type translation. Since we can treat type widening as a limited case of type translation, our approach will handle both. For the sake of a straightforward example, we’ll simply allow converting from string values to doubles (à la Hive and other untyped database systems) as well as converting from a value to one of a wider type.
We’ll define a trait called ↪
and declare witness instances A ↪ B
if there is a way to translate from a Value[A]
to a Value[B]
. Since we can’t statically guarantee that type translations will succeed in general, A ↪ B
will be implemented as a partial function from Value[A]
to Value[B]
.
Once we’ve implemented the partial functions to convert strings to doubles, our interpreter is very similar to the typewidening interpreter. We can see that it works by attempting to take the maximum of a double and a string representation of a double:
In the remainder of this post, we’ll sketch a couple of simple extensions to this basic approach.
Dealing with failures
Note that our interpreter will crash if asked to evaluate a function application for which type translation fails. Failure is probably fine in these cases for generalpurpose programming languages (although we’d probably want to model failure in the interpreted language or otherwise do something nicer than terminating with a Scala MatchError
because a partial function isn’t defined for its input). However, if we’re interested in emulating database query languages, we have another option for dealing with translation failures: simply return a null value.
Since most database query language functions – both builtin and userdefined – must handle null values appropriately, this wouldn’t necessarily complicate function implementations any further. It would require some tweaks to the way we’ve ascribed types to this program; in particular, translation functions and function applications would have to return either a value of the translated type or null (if translation failed).^{4}
Supporting polymorphism
One interesting remaining question is this: can we extend our interpreter to allow encoding simple generic functions? By “simple” generic functions, I mean those whose parameter types may consist of either concrete types or type variables, but whose output type is either not a type variable or is the same as one of its input types. As a concrete example, let’s say we wanted a function pow[X](base: X, exp: X): X
where X could be either a doubleprecision floatingpoint value or an arbitraryprecision decimal – but, in either case, the result type would be the same as the actual parameter types.
Assuming the existence of appropriate objectlanguage encodings for type constraints, subtyping, and substitutability relations between interpreter types, we could proceed via iterative application of widening and translation until we reached a fixed point. (This process would terminate so long as the widening and translation relations imposed a complete partial order on the set of interpretedlanguage types.)
However we’d likely not need such generality to implement polymorphic functions in a query language. Apache Hive, for example, allows coercing strings to either DECIMAL
or DOUBLE
values. So a simple approach would be to find the narrowest of those two types that string arguments could be coerced to (call this type T_{Lub}) and then widen all other arguments of type X to T_{Lub}.^{5}
A parting thought
The interpreters presented in these posts took care to get things implemented safely (OK, as safely as practical) in Scala’s type system. There are almost certainly better ways to statically capture the safety properties we might care about. However, if we were in a more dynamic context – such as in an untyped language, or in a triviallytyped fragment of a Scala program^{6} – many of these approaches would admit terser implementations.

Some related questions: should
"12" + 3
evaluate to the same thing as12 + "3"
? Should both be legal? Java allows the former (evaluating it as string concatenation by invoking.toString
on an automaticallyboxed3
), but not the latter. ↩ 
We mimic some (but not all) case class functionality here. We have declared an
apply
method to the companion object, to allow constructingBinaryFunction
instances withoutnew
, and anunapply
method to allow matching and deconstructingBinaryFunction
instances. ↩ 
Given
NullType
extendingType
, we’d have a few options: we could return an instance of Scala’sEither[A <: Value[_], Value[NullType]]
; we could convert all interpretereval
methods to return anOption[Value[_]]
; we could relax static typing on function return values; etc. ↩ 
As a further practical consideration, many of Hive’s numeric functions that accept either
DECIMAL
orDOUBLE
values only operate on doubleprecision values and thus narrowDECIMAL
arguments internally at evaluation time. ↩ 
Consider, for example, representing all interpreter values as
Option[Any]
and using explicit casts. ↩