Automating the sbt REPL

If you’re like me, you often find yourself pasting transcripts into sbt console sessions in order to interactively test out new app functionality. A lot of times, these transcripts have a great deal of boilerplate and can be dramatically simplified. Fortunately, sbt provides a facility to let you specify what commands should run at the beginning of a console session: the project-specific initialCommands setting.¹ sbt also provides a cleanupCommands setting to specify commands to run when your REPL exits, so if you’re testing anything that needs to have some cleanup code run before the JVM terminates, you can have that done automatically as well. (This is also useful to avoid ugly stack traces when developing Spark applications and quitting the console before stopping your SparkContext.) Finally, since sbt build definitions are just Scala code, you can conditionalize these command sets, for example, to only load test fixtures sometimes.

Here’s what this sort of automation looks like in a real build definition, with an excerpt of the Build.scala file from a (not-yet-merged) feature branch on my sur-la-plaque project, showing how I added automation to the REPL for the analysis subproject:²

def optionallySetupFixtures = {
  sys.env.get("SLP_FIXTURES_FROM") match {
    case Some(dir: String) => s"""
      |val data = app.processFiles(SLP.listFilesInDir("$dir"))
      |data.registerAsTable("trackpoints")
    """.stripMargin
    case _ => ""
  }
}

def analysisSettings = baseSettings ++ sparkSettings ++ breezeSettings ++ dispatchSettings ++ testSettings ++ Seq(
  initialCommands in console :=
    """
      |import org.apache.spark.SparkConf
      |import org.apache.spark.SparkContext
      |import org.apache.spark.rdd.RDD
      |import com.freevariable.surlaplaque.importer._
      |import com.freevariable.surlaplaque.data._
      |import com.freevariable.surlaplaque.app._
      |
      |val conf = new SparkConf().setMaster("local[8]").setAppName("console").set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
      |val sc = new SparkContext(conf)
      |val sqlContext = new org.apache.spark.sql.SQLContext(sc)
      |val app = new SLP(sc)
      |import sqlContext._
      |
    """.stripMargin + optionallySetupFixtures,
  cleanupCommands in console := "app.stop"
)

First, I’ve declared a simple optionallySetupFixtures function that generates code to load test data and register it with Spark SQL, but only if SLP_FIXTURES_FROM is set in the environment with the name of a directory containing activity files. The analysisSettings function returns a Seq of settings for the analysis subproject, first combining common settings, test settings, and library-specific settings for its dependencies (these are all declared elsewhere in the file). To this combination of common settings, we then add

an initialCommands setting to ensure that our REPL session imports Spark and sur-la-plaque libraries and sets up SparkContext and SQLContext instances, and
a cleanupCommands setting to gracefully shut down the SparkContext when we exit the REPL (via the stop method in the SLP application class)

Note that the initialCommands setting is the result of appending the static settings (our imports and variable declarations) with the result of calling optionallySetupFixtures, which will either be code to load and register our data or nothing, depending on our environment.

This functionality makes it easy to develop custom REPL environments or just save a lot of time while interactively experimenting with new techniques in your project. Even better, the investment required is absolutely minimal compared to the payoff of not having to paste or type boilerplate code in to every REPL session.

Footnotes

This feature is mentioned in the documentation and used by Apache Spark, so I was familiar with it, but – for whatever reason – I hadn’t thought to apply it to my own projects until recently. I’m mentioning it here in case you didn’t think of it either!↩︎
sur-la-plaque is a collection of applications dedicated to making sense of bicycling activity data; you can read more about some of the tools it includes here. The code is currently structured in two sbt projects: one for analysis code that actually processes data, and one that provides a web-based viewer of analysis results.↩︎