In earlier posts we introduced the concepts of type widening and type translation, discussed support for these in existing database systems, and presented a general approach to implementing type widening. In this post, we’ll extend our simple interpreter with support for type translations.
Adding functions to
While we could implement type translation for
AddExpression, we’d want to do so in such a way as to resolve the ambiguity inherent in adding a
StringType and an
IntType: should we coerce the
StringType to an integer value or convert the
IntType to a string representation? Put another way, should
"12" + 3 evaluate to
15?1 While this is an interesting aesthetic question, we’ll not treat it further in this post.
Instead, we’ll extend our simple interpreter trait with support for functions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Notice that we don’t declare
BinaryFunction as a case class, since we might want to declare other case classes that inherit from it.2 We could thus use
BinaryFunction directly, like this:
1 2 3 4 5 6 7 8
Alternatively, we could implement subclasses of
BinaryFunction with baked-in function parameters, as follows:
1 2 3 4 5
Note that, in this case (with no type coercion), we use a single class representing both a function and a function application. We’ll be changing that shortly.
Combining widening and function application
We’ll first produce an interpreter that combines both type widening and function application, using our “unzipped” widening interpreter, which finds widenings one parameter at a time, as a starting point.3 In the following example, we’ve created separate classes representing function definitions (
BinaryFunction) and function application expressions (
BinaryFunctionAppl). Each function definition has formal parameter types (that is, the parameter types it expects) and a result type; each function application has actual parameter types, which are the types of the expressions supplied as actual parameters. In a function application, we denote the formal parameter types as
Rf and the actual parameter types as
Ra. (As usual, these types can be equal, but they need not be.) Just like Apache Hive does when automatically wrapping
UDF functions in the
GenericUDF interface, we define the formal parameter types based on the formal parameter types of the underlying function implementation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
In order to have a valid function application, we need two witness functions (here called
rconv) to convert values of the actual parameter types
Ra to values of the formal parameter types
Rf. We can see how this works in practice by defining a maximum
DoubleType values and applying it to
1 2 3 4 5 6 7 8 9 10 11
Note that the problem of widening actuals to formals is even more straightforward than finding the least upper bound type of two operands in an addition, since we know what the formals are expected to be and thus know immediately whether we have relevant widenings or not.
The final enhancement to our simple interpreter is to add type translation. Since we can treat type widening as a limited case of type translation, our approach will handle both. For the sake of a straightforward example, we’ll simply allow converting from string values to doubles (à la Hive and other untyped database systems) as well as converting from a value to one of a wider type.
We’ll define a trait called
↪ and declare witness instances
A ↪ B if there is a way to translate from a
Value[A] to a
Value[B]. Since we can’t statically guarantee that type translations will succeed in general,
A ↪ B will be implemented as a partial function from
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
Once we’ve implemented the partial functions to convert strings to doubles, our interpreter is very similar to the type-widening interpreter. We can see that it works by attempting to take the maximum of a double and a string representation of a double:
1 2 3 4 5 6 7 8 9 10 11
In the remainder of this post, we’ll sketch a couple of simple extensions to this basic approach.
Dealing with failures
Note that our interpreter will crash if asked to evaluate a function application for which type translation fails. Failure is probably fine in these cases for general-purpose programming languages (although we’d probably want to model failure in the interpreted language or otherwise do something nicer than terminating with a Scala
MatchError because a partial function isn’t defined for its input). However, if we’re interested in emulating database query languages, we have another option for dealing with translation failures: simply return a null value.
Since most database query language functions — both built-in and user-defined — must handle null values appropriately, this wouldn’t necessarily complicate function implementations any further. It would require some tweaks to the way we’ve ascribed types to this program; in particular, translation functions and function applications would have to return either a value of the translated type or null (if translation failed).4
One interesting remaining question is this: can we extend our interpreter to allow encoding simple generic functions? By “simple” generic functions, I mean those whose parameter types may consist of either concrete types or type variables, but whose output type is either not a type variable or is the same as one of its input types. As a concrete example, let’s say we wanted a function
pow[X](base: X, exp: X): X where X could be either a double-precision floating-point value or an arbitrary-precision decimal — but, in either case, the result type would be the same as the actual parameter types.
Assuming the existence of appropriate object-language encodings for type constraints, subtyping, and substitutability relations between interpreter types, we could proceed via iterative application of widening and translation until we reached a fixed point. (This process would terminate so long as the widening and translation relations imposed a complete partial order on the set of interpreted-language types.)
However we’d likely not need such generality to implement polymorphic functions in a query language. Apache Hive, for example, allows coercing strings to either
DOUBLE values. So a simple approach would be to find the narrowest of those two types that string arguments could be coerced to (call this type TLub) and then widen all other arguments of type X to TLub.5
A parting thought
The interpreters presented in these posts took care to get things implemented safely (OK, as safely as practical) in Scala’s type system. There are almost certainly better ways to statically capture the safety properties we might care about. However, if we were in a more dynamic context — such as in an untyped language, or in a trivially-typed fragment of a Scala program6 — many of these approaches would admit terser implementations.
Some related questions: should
"12" + 3evaluate to the same thing as
12 + "3"? Should both be legal? Java allows the former (evaluating it as string concatenation by invoking
.toStringon an automatically-boxed
3), but not the latter.↩
We mimic some (but not all) case class functionality here. We have declared an
applymethod to the companion object, to allow constructing
new, and an
unapplymethod to allow matching and deconstructing
Type, we’d have a few options: we could return an instance of Scala’s
Either[A <: Value[_], Value[NullType]]; we could convert all interpreter
evalmethods to return an
Option[Value[_]]; we could relax static typing on function return values; etc.↩
As a further practical consideration, many of Hive’s numeric functions that accept either
DOUBLEvalues only operate on double-precision values and thus narrow
DECIMALarguments internally at evaluation time.↩
Consider, for example, representing all interpreter values as
Option[Any]and using explicit casts.↩