Implementing type translation

type systems
sql
hive
type coercions
Published

August 26, 2014

In earlier posts we introduced the concepts of type widening and type translation, discussed support for these in existing database systems, and presented a general approach to implementing type widening. In this post, we’ll extend our simple interpreter with support for type translations.

Adding functions to SimpleInterpreter

While we could implement type translation for AddExpression, we’d want to do so in such a way as to resolve the ambiguity inherent in adding a StringType and an IntType: should we coerce the StringType to an integer value or convert the IntType to a string representation? Put another way, should "12" + 3 evaluate to "123" or 15?1 While this is an interesting aesthetic question, we’ll not treat it further in this post.

Instead, we’ll extend our simple interpreter trait with support for functions:

object LessSimpleInterpreter extends SimpleInterpreter {

  class BinaryFunction[L,R,Result](val lhs: Expression[L], val rhs: Expression[R], val f: ((L, R) => Result)) 
      extends Expression[Result] {
    type left = L
    type right = R
    type result = Result
    
    def eval: Value[Result] = {
      val lv = lhs.eval
      val rv = rhs.eval
      new Value[Result](f(lv.get, rv.get)) {}
    }
  }
  
  object BinaryFunction {
    def apply[L,R,Result](lhs: Expression[L], rhs: Expression[R], f: ((L, R) => Result)) =
      new BinaryFunction(lhs, rhs, f)
    
    def unapply[L,R,Result](bf: BinaryFunction[L,R,Result]): 
        Option[(Expression[L],Expression[R],((L, R) => Result))] =
      Some((bf.lhs, bf.rhs, bf.f))
  }
}

Notice that we don’t declare BinaryFunction as a case class, since we might want to declare other case classes that inherit from it.2 We could thus use BinaryFunction directly, like this:

scala> import LessSimpleInterpreter._
import LessSimpleInterpreter._

scala> val max = BinaryFunction(IntValue(4), IntValue(5), ((a:Int, b:Int) => if (a > b) a else b))
max: LessSimpleInterpreter.BinaryFunction[Int,Int,Int] = LessSimpleInterpreter$BinaryFunction@301ec38b

scala> max.eval.get
res0: Int = 5

Alternatively, we could implement subclasses of BinaryFunction with baked-in function parameters, as follows:

import LessSimpleInterpreter._

class MaxFunction[T, Result](lhs: Expression[T], rhs: Expression[T])
    (implicit ev: T => Ordered[T])
    extends BinaryFunction(lhs, rhs, ((a:T, b:T) => if (a > b) a else b)) { }

Note that, in this case (with no type coercion), we use a single class representing both a function and a function application. We’ll be changing that shortly.

Combining widening and function application

We’ll first produce an interpreter that combines both type widening and function application, using our “unzipped” widening interpreter, which finds widenings one parameter at a time, as a starting point.3 In the following example, we’ve created separate classes representing function definitions (BinaryFunction) and function application expressions (BinaryFunctionAppl). Each function definition has formal parameter types (that is, the parameter types it expects) and a result type; each function application has actual parameter types, which are the types of the expressions supplied as actual parameters. In a function application, we denote the formal parameter types as Lf and Rf and the actual parameter types as La and Ra. (As usual, these types can be equal, but they need not be.) Just like Apache Hive does when automatically wrapping UDF functions in the GenericUDF interface, we define the formal parameter types based on the formal parameter types of the underlying function implementation.

object WideningFunctionInterpreter extends UnzippedWideningInterpreter {
  class BinaryFunction[L <: Type, R <: Type, Result <: Type]
      (fun: (L#nativeType, R#nativeType) => Result#nativeType) {
    
    def apply(lhs: L#nativeType, rhs: R#nativeType): Result#nativeType = fun(lhs, rhs)
  }
  
  class BinaryFunctionAppl[La <: Type, Ra <: Type, Lf <: Type, Rf <: Type, Result <: Type]
      (val lhs: Expression[La], val rhs: Expression[Ra], val f: BinaryFunction[Lf, Rf, Result])
      (implicit lconv: Value[La] => Value[Lf], rconv: Value[Ra] => Value[Rf])
      extends Expression[Result] {
    type left = Lf
    type right = Rf
    type result = Result
    
    def eval: Value[Result] = {
      val lv = lconv(lhs.eval)
      val rv = rconv(rhs.eval)
      new Value[Result](f(lv.get, rv.get)) {}
    }
  }
}

In order to have a valid function application, we need two witness functions (here called lconv and rconv) to convert values of the actual parameter types La and Ra to values of the formal parameter types Lf and Rf. We can see how this works in practice by defining a maximum function over DoubleType values and applying it to IntType values:

scala> import WideningFunctionInterpreter._
import WideningFunctionInterpreter._

scala> val max = new BinaryFunction[DoubleType,DoubleType,DoubleType]((x: Double, y: Double) => if (x > y) x else y)
max: WideningFunctionInterpreter.BinaryFunction[WideningFunctionInterpreter.DoubleType,WideningFunctionInterpreter.DoubleType,WideningFunctionInterpreter.DoubleType] = WideningFunctionInterpreter$BinaryFunction@72a7aa4f

scala> val appl = new BinaryFunctionAppl(IntValue(5), IntValue(7), max)
appl: WideningFunctionInterpreter.BinaryFunctionAppl[WideningFunctionInterpreter.IntType,WideningFunctionInterpreter.IntType,WideningFunctionInterpreter.DoubleType,WideningFunctionInterpreter.DoubleType,WideningFunctionInterpreter.DoubleType] = WideningFunctionInterpreter$BinaryFunctionAppl@1a536164

scala> val result = appl.eval.get
result: WideningFunctionInterpreter.DoubleType#nativeType = 7.0

Note that the problem of widening actuals to formals is even more straightforward than finding the least upper bound type of two operands in an addition, since we know what the formals are expected to be and thus know immediately whether we have relevant widenings or not.

Adding translation

The final enhancement to our simple interpreter is to add type translation. Since we can treat type widening as a limited case of type translation, our approach will handle both. For the sake of a straightforward example, we’ll simply allow converting from string values to doubles (à la Hive and other untyped database systems) as well as converting from a value to one of a wider type.

We’ll define a trait called and declare witness instances A ↪ B if there is a way to translate from a Value[A] to a Value[B]. Since we can’t statically guarantee that type translations will succeed in general, A ↪ B will be implemented as a partial function from Value[A] to Value[B].

import scala.util.{Try, Success}

object TranslatingFunctionInterpreter extends UnzippedWideningInterpreter {
  trait[A <: Type, B <: Type] extends PartialFunction[Value[A], Value[B]]

  implicit def wideningTranslation[A <: Type, B <: Type]
       (implicit ev: A#nativeType => B#nativeType):[A, B] = new[A, B] {
     def apply(a: Value[A]): Value[B] = new Value[B](ev(a.get))
     def isDefinedAt(a: Value[A]) = true
  }

  implicit object StringToDouble extends[StringType,DoubleType] {
    def apply(s: Value[StringType]) = {
      DoubleValue(Try(s.get.toDouble).toOption.get)
    }
    
    def isDefinedAt(s: Value[StringType]) = {
      Try(s.get.toDouble) match {
        case Success(_) => true
        case _ => false
      }
    }
  }
  
  class BinaryFunction[L <: Type, R <: Type, Result <: Type]
      (fun: (L#nativeType, R#nativeType) => Result#nativeType) {
    
    def apply(lhs: L#nativeType, rhs: R#nativeType): Result#nativeType = fun(lhs, rhs)
  }
  
  class BinaryFunctionAppl[La <: Type, Ra <: Type, Lf <: Type, Rf <: Type, Result <: Type]
      (val lhs: Expression[La], val rhs: Expression[Ra], val f: BinaryFunction[Lf, Rf, Result])
      (implicit lconv: La ↪ Lf, rconv: Ra ↪ Rf)
      extends Expression[Result] {
    type left = Lf
    type right = Rf
    type result = Result
    
    def eval: Value[Result] = {
      val lv = lconv(lhs.eval)
      val rv = rconv(rhs.eval)
      new Value[Result](f(lv.get, rv.get)) {}
    }
  }
}

Once we’ve implemented the partial functions to convert strings to doubles, our interpreter is very similar to the type-widening interpreter. We can see that it works by attempting to take the maximum of a double and a string representation of a double:

scala> import TranslatingFunctionInterpreter._
import TranslatingFunctionInterpreter._

scala> val max = new BinaryFunction[DoubleType,DoubleType,DoubleType]((x: Double, y: Double) => if (x > y) x else y)
max: TranslatingFunctionInterpreter.BinaryFunction[TranslatingFunctionInterpreter.DoubleType,TranslatingFunctionInterpreter.DoubleType,TranslatingFunctionInterpreter.DoubleType] = TranslatingFunctionInterpreter$BinaryFunction@4ab550d5

scala> val appl = new BinaryFunctionAppl(DoubleValue(5.0), StringValue("7.0"), max)
appl: TranslatingFunctionInterpreter.BinaryFunctionAppl[TranslatingFunctionInterpreter.DoubleType,TranslatingFunctionInterpreter.StringType,TranslatingFunctionInterpreter.DoubleType,TranslatingFunctionInterpreter.DoubleType,TranslatingFunctionInterpreter.DoubleType] = TranslatingFunctionInterpreter$BinaryFunctionAppl@6e1b9411

scala> appl.eval.get
res0: TranslatingFunctionInterpreter.DoubleType#nativeType = 7.0

In the remainder of this post, we’ll sketch a couple of simple extensions to this basic approach.

Dealing with failures

Note that our interpreter will crash if asked to evaluate a function application for which type translation fails. Failure is probably fine in these cases for general-purpose programming languages (although we’d probably want to model failure in the interpreted language or otherwise do something nicer than terminating with a Scala MatchError because a partial function isn’t defined for its input). However, if we’re interested in emulating database query languages, we have another option for dealing with translation failures: simply return a null value.

Since most database query language functions – both built-in and user-defined – must handle null values appropriately, this wouldn’t necessarily complicate function implementations any further. It would require some tweaks to the way we’ve ascribed types to this program; in particular, translation functions and function applications would have to return either a value of the translated type or null (if translation failed).4

Supporting polymorphism

One interesting remaining question is this: can we extend our interpreter to allow encoding simple generic functions? By “simple” generic functions, I mean those whose parameter types may consist of either concrete types or type variables, but whose output type is either not a type variable or is the same as one of its input types. As a concrete example, let’s say we wanted a function pow[X](base: X, exp: X): X where X could be either a double-precision floating-point value or an arbitrary-precision decimal – but, in either case, the result type would be the same as the actual parameter types.

Assuming the existence of appropriate object-language encodings for type constraints, subtyping, and substitutability relations between interpreter types, we could proceed via iterative application of widening and translation until we reached a fixed point. (This process would terminate so long as the widening and translation relations imposed a complete partial order on the set of interpreted-language types.)

However we’d likely not need such generality to implement polymorphic functions in a query language. Apache Hive, for example, allows coercing strings to either DECIMAL or DOUBLE values. So a simple approach would be to find the narrowest of those two types that string arguments could be coerced to (call this type TLub) and then widen all other arguments of type X to TLub.5

A parting thought

The interpreters presented in these posts took care to get things implemented safely (OK, as safely as practical) in Scala’s type system. There are almost certainly better ways to statically capture the safety properties we might care about. However, if we were in a more dynamic context – such as in an untyped language, or in a trivially-typed fragment of a Scala program6 – many of these approaches would admit terser implementations.

Footnotes

  1. Some related questions: should "12" + 3 evaluate to the same thing as 12 + "3"? Should both be legal? Java allows the former (evaluating it as string concatenation by invoking .toString on an automatically-boxed 3), but not the latter.↩︎

  2. We mimic some (but not all) case class functionality here. We have declared an apply method to the companion object, to allow constructing BinaryFunction instances without new, and an unapply method to allow matching and deconstructing BinaryFunction instances.↩︎

  3. See this post for more details on this approach.↩︎

  4. Given NullType extending Type, we’d have a few options: we could return an instance of Scala’s Either[A <: Value[_], Value[NullType]]; we could convert all interpreter eval methods to return an Option[Value[_]]; we could relax static typing on function return values; etc.↩︎

  5. As a further practical consideration, many of Hive’s numeric functions that accept either DECIMAL or DOUBLE values only operate on double-precision values and thus narrow DECIMAL arguments internally at evaluation time.↩︎

  6. Consider, for example, representing all interpreter values as Option[Any] and using explicit casts.↩︎