If you’re like me, you often find yourself pasting transcripts into sbt console
sessions in order to interactively test out new app functionality. A lot of times, these transcripts have a great deal of boilerplate and can be dramatically simplified. Fortunately, sbt
provides a facility to let you specify what commands should run at the beginning of a console session: the project-specific initialCommands
setting.1 sbt
also provides a cleanupCommands
setting to specify commands to run when your REPL exits, so if you’re testing anything that needs to have some cleanup code run before the JVM terminates, you can have that done automatically as well. (This is also useful to avoid ugly stack traces when developing Spark applications and quitting the console before stopping your SparkContext
.) Finally, since sbt
build definitions are just Scala code, you can conditionalize these command sets, for example, to only load test fixtures sometimes.
Here’s what this sort of automation looks like in a real build definition, with an excerpt of the Build.scala
file from a (not-yet-merged) feature branch on my sur-la-plaque
project, showing how I added automation to the REPL for the analysis
subproject:2
def optionallySetupFixtures = {
.env.get("SLP_FIXTURES_FROM") match {
syscase Some(dir: String) => s"""
|val data = app.processFiles(SLP.listFilesInDir("$dir"))
|data.registerAsTable("trackpoints")
""".stripMargin
case _ => ""
}
}
def analysisSettings = baseSettings ++ sparkSettings ++ breezeSettings ++ dispatchSettings ++ testSettings ++ Seq(
:=
initialCommands in console """
|import org.apache.spark.SparkConf
|import org.apache.spark.SparkContext
|import org.apache.spark.rdd.RDD
|import com.freevariable.surlaplaque.importer._
|import com.freevariable.surlaplaque.data._
|import com.freevariable.surlaplaque.app._
|
|val conf = new SparkConf().setMaster("local[8]").setAppName("console").set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
|val sc = new SparkContext(conf)
|val sqlContext = new org.apache.spark.sql.SQLContext(sc)
|val app = new SLP(sc)
|import sqlContext._
|
""".stripMargin + optionallySetupFixtures,
:= "app.stop"
cleanupCommands in console )
First, I’ve declared a simple optionallySetupFixtures
function that generates code to load test data and register it with Spark SQL, but only if SLP_FIXTURES_FROM
is set in the environment with the name of a directory containing activity files. The analysisSettings
function returns a Seq
of settings for the analysis
subproject, first combining common settings, test settings, and library-specific settings for its dependencies (these are all declared elsewhere in the file). To this combination of common settings, we then add
- an
initialCommands
setting to ensure that our REPL session imports Spark andsur-la-plaque
libraries and sets upSparkContext
andSQLContext
instances, and - a
cleanupCommands
setting to gracefully shut down theSparkContext
when we exit the REPL (via thestop
method in theSLP
application class)
Note that the initialCommands
setting is the result of appending the static settings (our imports and variable declarations) with the result of calling optionallySetupFixtures
, which will either be code to load and register our data or nothing, depending on our environment.
This functionality makes it easy to develop custom REPL environments or just save a lot of time while interactively experimenting with new techniques in your project. Even better, the investment required is absolutely minimal compared to the payoff of not having to paste or type boilerplate code in to every REPL session.
Footnotes
This feature is mentioned in the documentation and used by Apache Spark, so I was familiar with it, but – for whatever reason – I hadn’t thought to apply it to my own projects until recently. I’m mentioning it here in case you didn’t think of it either!↩︎
sur-la-plaque
is a collection of applications dedicated to making sense of bicycling activity data; you can read more about some of the tools it includes here. The code is currently structured in twosbt
projects: one foranalysis
code that actually processes data, and one that provides a web-basedviewer
of analysis results.↩︎