Type coercions for untyped query languages

type systems
sql
hive
type coercions
Published

August 18, 2014

In this post, we’re going to introduce two kinds of implicit type conversions that are common in database query languages:

  1. Type widening, in which a value of type T is converted to a value of some wider type T’, where T ⊆ T’. As an example, given the expression b / d, where b holds an unsigned 8-bit value and d holds a double-precision floating-point value, a type-widening conversion would convert the value of b to a double-precision value and use this value as the left-hand operand of a floating-point division.
  2. Type translation, in which a value is transformed to produce a value of some unrelated type. As an example, consider a programming language that evaluates the expression sd / d as floating-point division where d is a double-precision floating-point value and sd is a string value that represents a number.

Type widening is common in typed general-purpose programming languages. For example, in C or Java, a / b is integer division with an integer result if both a and b are integer values but floating-point division – evaluating to the wider of the types of its operands – if either a or b is a floating-point value. Similarly, it is generally possible in such languages to assign a value of a narrower type to a variable of a wider one.

However, type translation (by necessity) only appears in untyped1 languages. This is the case because types are static properties but we cannot in general statically determine whether or not type translations will succeed. By allowing type translation, we are thus necessarily deferring decisions about whether or not a program fragment makes sense in some type system until it executes, and trading some flexibility for the possibility of runtime errors. Some programming communities regard this as an acceptable tradeoff.

Several extant database systems, including Hive, PostgreSQL, SQLite (see also here), and Microsoft SQL Server) support both type widening and type translations, so that one can, for example, take the cosine of the string representing a double-precision floating point value.

In subsequent posts, we’ll look at what type coercions some of these systems support and present general approaches to implementing support for type widening and type coercion in language interpreters, with specific techniques for realizing these general approaches in interpreters implemented in Scala.

Footnotes

  1. It may also appear in languages in which all programs are trivially well-typed and in which the type system thus cannot provide useful static guarantees about runtime behavior. An example such language is Tcl, where everything is a string. (By most definitions of typedness, though, “trivially typed” languages are untyped.)↩︎