The Spark project is an actively-developed open-source engine for data analytics on clusters using Scala, Python, or Java. It offers map, filter, and reduce operations over in-memory collections, data from local files, or data taken from HDFS, but unlike standard map-reduce frameworks, it offers the opportunity to cache intermediate results across the cluster (and can thus offer orders-of-magnitude improvements over standard map-reduce when implementing iterative algorithms). I’ve been using it lately and have been really impressed. However — as with many cool projects in the “big data” space — the chain of dependencies to get a working installation can be daunting.
In this post, we’ll walk through setting up Spark to run on a stock Fedora 18 installation. We’ll also build the Mesos cluster manager so that we can run Spark jobs under Mesos, and we’ll build Hadoop with support for Mesos (so that we’ll have the option to run standard Hadoop MapReduce jobs under Mesos as well). By following these steps, you should be up and running with Spark quickly and painlessly.
Recall that Wallaby applies partial configurations to groups of nodes. Groups can be either explicit —- that is, a named subset of nodes created by the user, or special groups that are built-in to Wallaby; each node’s group memberships have a priority ordering, so that an individual node’s configuration can favor the partial configuration from one group over another. There are two kinds of special groups: the default group, which contains every node and a set of identity groups, each of which only contains a single node. In addition, Wallaby includes a skeleton group, which combines aspects of explicit and special groups: while it can be managed like an explicit group, all newly-created nodes become members of the skeleton group automatically. The default group is always the lowest-priority membership for a node and its identity group is always the highest-priority membership; a node’s skeleton group membership can be reprioritized or removed as necessary.
The default group is an appealing target for common pool configuration, since it is guaranteed to be applied to every node. However, because Wallaby’s configuration model is additive, it is likely not the best place for every configured feature or parameter setting that you might initially consider applying to the whole pool. For example, if a group’s partial configuration installs a feature, every node that is a member of that group will install that feature (there is no way to say “this group’s configuration removes feature F, if it happens to be installed already). Similarly, if a parameter is set on a group, every node that is a member of that group will have that parameter set; individual node configurations can override the value that the parameter is set to within the group, but there is no way to unset the parameter altogether. Therefore, if you need to enable a feature on almost every node, the default group is not the right place to install that feature. (Indeed, the default group is not the right place to put a putatively universal parameter setting or feature installation even if you can imagine a future exception to its universality.) The default group is also not a great place to put configuration that you expect to take priority over other possible group configurations, since it will always be at the lowest priority.
At this point, you may be asking yourself why you’d want to put any configuration in the default group. While I tend towards a minimal default group configuration myself, I absolutely see several use cases for the default group:
Setting configuration parameters that are actually uniform across the whole pool (and will require a new value, not the absence of a value, if they change). One such parameter is FILESYSTEM_DOMAIN, which you can set to an arbitrary string in order to identify machines that access the same shared filesystem. If you were to extend your pool with machines that couldn’t access that same filesystem, you’d provide a new value for this parameter.
Installing Wallaby features that you actually want installed on every node. I’d include features like Master, which ensures that the condor_master daemon is running (since Wallaby’s configuration daemon runs under the condor_master; a node can’t be configured if it isn’t running) and NodeAccess, which controls access to the pool (although the specific policy parameters required by NodeAccess may change in pool subsets).
Rapidly prototyping configurations for homogeneous or small, experimental pools. When you’re first using Wallaby, the default group is a convenient way to get things running. Similarly, if most of your nodes are execute nodes with the same policies, you may be able to put most of your configuration in the default group, especially if your submit and central manager configurations are generally a superset of your execute node configuration. Fortunately, Wallaby makes it straightforward to move configuration from the default group to an explicit group if you should require more flexibility.
I’m sure there are other cases in which using the default group makes good sense, but in general, you should strongly consider using an explicit group or the skeleton group for almost all of your nearly-universal configuration. If you haven’t used the skelton group before, read more about it!
Rob Rati and I gave a tutorial on highly-available job queues at Condor Week this year. While it was not a Wallaby-specific tutorial, we did point out that configuring highly-available job queues is easier for users who manage and deploy their configurations with Wallaby; compare the manual and automated approaches in the following slides:
Configuring highly-available central managers (HA CMs) is rather more involved than configuring highly-available job queues. Here’s what a successful HA CM setup requires:
all hosts that serve as candidate central managers (CMs) must be included in the CONDOR_HOST variable across the pool
the had and replication daemons must be set up to run on candidate CMs
the HAD_LIST and REPLICATION_LIST configuration variables must include a list of candidate CMs and the ports on which the had and replication daemons are running on these hosts
various tunable settings related to shared-state and failure detection must be set
Wallaby includes HACentralManager, a ready-to-install feature that has sensible defaults for setting up a candidate CM. The tedious work of constructing lists of hostnames and ports — and ensuring that these are set everywhere that they must be — can take great advantage of Wallaby’s scriptability. At the bottom of this post is a simple Wallaby shell command that sets up a highly-available central manager with several nodes serving as candidate CMs. To use it, download it and place it in your WALLABY_COMMAND_DIR (review how to install Wallaby shell command extensions if necessary). Then invoke it with
wallaby setup-ha-cms fred barney wilma betty
The above invocation will set up fred, barney, wilma, and betty as candidate CMs, place the candidate CMs in the PotentialCMs group (creating this group if necessary), and configure Wallaby’s default group to use the highly-available CM cluster. (The setup-ha-cms command takes options to put candidate CMs in a different group or apply this configuration to some subset of the pool; invoke it with --help for more information.)
Once you’ve set up your candidate CMs, be sure to activate the new configuration:
Of course, wallaby activate will alert you to any problems that prevent your configuration from taking effect. Correct any errors that come up and activate again, if necessary. The setup-ha-cms command is a pretty simple example of automating configuration, but it saves a lot of repetitive and error-prone effort!
UPDATE: The command will now remove all nodes from the candidate CM group before adding any nodes to it. This ensures that if the command is run multiple times with different candidate CM node sets, only the most recent set will receive the candidate CM configuration. (The command as initially posted would apply the candidate CM configuration to every node that was in the candidate CM group at invocation time, but only those nodes that were named in its most recent invocation would actually become candidate CMs.) Thanks to Rob Rati for the observation.
Wallaby 0.16.0, which updates the Wallaby API version to 20101031.6, includes support for authorizing broker users with various roles that can interact with Wallaby in different ways. This post will explain how the authorization support works and show how to get started using it. If you just want to get started using Wallaby with authorization support as quickly as possible, skip ahead to the section titled “Getting Started” below. Detailed information about which role is required for each Wallaby API method is after the jump.
Users must authenticate to the AMQP broker before using Wallaby (although some installations may allow users to authenticate as “anonymous”), but previous versions of Wallaby implicitly authorized any user who had authenticated to the broker to perform any action. Wallaby now includes a database mapping from user names to roles, which allows installations to define how each broker user can interact with Wallaby. Each method is annotated with the role required to invoke it, and each method invocation is checked to ensure that the currently-authenticated user is authorized to assume the role required by the method. The roles Wallaby recognizes are NONE, READ, WRITE, or ADMIN, where each role includes all of the capabilities of the role that preceded it.
If WALLABY_USERDB_NAME is set in the Wallaby agent’s environment upon startup and represents a valid pathname, Wallaby will use that as the location of the user-role database. If this variable is set to a valid pathname but no file exists at that pathname, the Wallaby user-role database will be created upon agent startup. If WALLABY_USERDB_NAME is not set, the user-role database will be initialized in memory only and thus will not persist across agent restarts.
When Wallaby is about to service an API request, it:
checks the role required to invoke the method.
checks the authorization level specified for the user. There are several possibilities under which a user could be authorized to invoke a method:
the user is explicitly authorized for a role that includes the required role (e.g. the user has an ADMIN role but the method only requires READ);
the user is implicitly authorized for a role that includes the required role (e.g. there is an entry for the wildcard user * giving it READ access and the method requires READ access)
the role database is empty, in which case all authenticated users are implicitly authorized for all actions (this is the same behavior as in older versions of Wallaby)
the invocation is of a user-role database maintenance method and the client is authorized via shared secret (see below)
if none of the conditions of the above step hold, the method invocation is unauthorized and fails with an API-level error. If the API method is invoked over the Ruby client library, it will raise an exception. If it is invoked via a wallaby shell command-line tool, it will print a human-readable error message and exit with a nonzero exit status.
if the user is authorized to invoke the method, invocation proceeds normally.
Authorization with secret-based authentication
This version of the Wallaby API introduces three new methods: Store#set_user_privs, Store#del_user, and Store#users. These enable updating and reading the user-role database; the first two require ADMIN access, while the last requires READ access. Because changes in the user-role database may result in an administrator inadvertently removing administrator rights from his or her broker user, Wallaby provides another mechanism to authorize access to these methods. Each of these three methods supports a special secret option in its options argument. When the Wallaby service starts up, it loads a secret string from a file. Clients that supply the correct secret as an option to one of these calls will be authorized to invoke these calls, even if the broker user making the invocation is not authorized by the user-role database.
The pathname to the secret file is given by the environment variable WALLABY_SECRET_FILE. If this variable is unset upon agent startup, Wallaby will not use a shared secret (and secret-based authorization will not be available to API clients). It this variable is set and names an existing file that the Wallaby agent user can read, the Wallaby shared secret will be set to the entire contents of this file. If this variable is set and names a nonexistent file in a path that does exist, Wallaby will create a file at this path upon startup with a randomly-generated secret (consisting of a digest hash of some data read from /dev/urandom). If this variable is set to a pathname that includes nonexistent directory components, the Wallaby agent will raise an error. If you create your own secret file, ensure that it is only readable by the UNIX user that the Wallaby agent runs as (typically wallaby).
The Wallaby agent’s authorization support is designed to prevent broker users from altering Condor pool configurations in excess of their authority. It is not intended to keep all configuration data strictly confidential. (This is not as bad as it might sound, since Wallaby-generated configurations are available for inspection by Condor users.) Furthermore, due to technical limitations, it is not possible to protect object property accesses over the API with the same authorization support that we use for API method invocations. Therefore, if concealing configuration data from some subset of users is important for your installation, you should prevent these users from authenticating to the broker that the Wallaby agent runs on.
Here is a quick overview of how to get started with auth-enabled Wallaby:
Stop your running Wallaby and restart your broker before starting the new Wallaby (this is necessary to pick up the new API methods). Set WALLABY_USERDB_NAME in your environment to a path where you can store the user-role database. Install and start your new Wallaby.
If you’re using the RPM package, it will create a “secret file” for you in /var/lib/wallaby/secret. If not, you will need to set WALLABY_SECRET_FILE in the environment to specify a location for this secret file and then restart Wallaby. The Wallaby secret is a special token that can be passed to certain API methods (specifically, those related to user database management) in order to authorize users who aren’t authorized in the user database.
Try using some of the new shell commands: wallaby set-user-role, wallaby list-users, and wallaby delete-user.
Make sure that you have a secret in your secret file. Make a note of it. Try setting the role for your current broker user to READ or NONE (e.g. “wallaby set-user-role anonymous NONE”) and then see what happens when you try and run some other Wallaby shell commands. You can recover from this by passing the Wallaby secret to “wallaby set-user-role”; see its online help for details.
The default user database is empty, which will result in the same behavior as in older versions of Wallaby (viz., all actions are available to all broker users), but only until a user role is added, at which point all actions must be explicitly or implicitly authorized.
Many Condor users are interested in high-availability (HA) services: they don't want their compute resources to become unavailable due to the failure of a single machine that is running an important Condor daemon. (See this talk that Rob Rati and I gave at Condor Week this year for a couple of solutions to HA with the Condor schedd.) So it's only natural that Condor users who are interested in configuring their pools with Wallaby might wonder how Wallaby responds in the face of failure.
Some of the technologies that the current version of Wallaby is built upon do not lend themselves to traditional active-active high-availability solutions. However, the good news is that due to Wallaby's architecture, almost all running Condor nodes will not be affected by a failure of the Wallaby service or the machine it is running on. Nodes that have already checked in with Wallaby will have their latest activated configurations as of the last checking. The only limitations in the event of service failure are:
new nodes will not be able to check in and get the default configuration;
it will not be possible to access older activated configurations; and
it will not be possible to alter, activate, or deploy the configuration.
These limitations, of course, disappear when the service is restored. For most users, losing the ability to update or deploy configurations due to a service failure — but not losing their deployed configurations or otherwise affecting normal pool operation — represents an acceptable risk. Some installations, especially those who aggressively exploit Wallaby's scripting interface or versioning capability, may want a more robust solution: these users might want to be able to access older versions of their activated configurations even if Wallaby is down, or they might want a mechanism to speed recovery by starting a replica of their service on another machine. In the remainder of this post, we'll discuss some approaches to provide more access to Wallaby data when Wallaby is down.
Accessing older versioned configurations
If you need to access historical versioned configurations, the easiest way to do it is to set up a cron job on the machine running Wallaby that periodically runs wallaby vc-export on your snapshot database and outputs versioned configurations to a shared filesystem. wallaby vc-export, which is documented in this post, exports all historical snapshots to plain-text files, so you can access the configuration for foo.example.com at version 1234 in a file called something like 1234/nodes/foo.example.com. This cron job needs to be able to access the filesystem path of the Wallaby snapshot database; furthermore, to run it efficiently, you'll probably want to limit the number of snapshots it processes each time; see wallaby vc-export's online help for more details.
Exporting Wallaby state to a file
If you're just interested in the state of the Wallaby service (including possibly unactivated changes), you can periodically run wallaby dump over the network. This will produce a YAML file consisting of the serialized state of the Wallaby; you can later load this file by using the wallaby load command, possibly against another Wallaby agent.
Backing up the raw database files
The easiest way to pick up and recover from a Wallaby node failure is to start a new Wallaby service with the same databases as the failed node. In turn, the easiest way to do this is by periodically copying these database files from their locations on the Wallaby node (typically in /var/lib/wallaby) to some location on a shared filesystem. The following Ruby script will safely copy the SQLite files that Wallaby uses even while Wallaby is running:
Wallaby 0.15.0 includes a new feature called the skeleton group. (This feature was available in earlier versions of Wallaby, too, but it was experimental and had some rough edges.) Find out how the skeleton group makes configuration more flexible by reading all about it.
Often, if you're trying to reproduce a problem someone else is having with Condor, you'll need their configuration. Likewise, if you're trying to help someone reproduce a problem you're having, you'll want to send along your configuration to aid them in replicating your setup. For installations that use legacy flat-file configurations (optionally with a local configuration directory), this can be a pain, since you'll need to copy several files from site to site (ensuring that you've included all the files necessary to replicate your configuration, perhaps across multiple machines on the site experiencing the problem).
If everyone involved uses Wallaby for configuration management, things can be a lot simpler: the site experiencing the problem can use wallaby dump to save the state of their configuration for an entire pool to a flat file, which the troubleshooting site can then inspect or restore with wallaby load. If the problem appears in some configuration snapshots but not in others, the reporting site can use wallaby vc-export to generate a directory of all of their configurations over time, so that the troubleshooting site can attempt to pinpoint the differences between what worked and what didn't.
(Thanks to Matt for pointing out the value of versioned semantic configuration management in reproducing problems!)
From the very beginning of the project, we've developed Wallaby and its stack in Ruby 1.8 and not paid much attention to Ruby 1.9. We had done so for a few reasons, but primary among these is that we had internal Ruby 1.8 experience and that we needed to support Wallaby on 1.8.6 for the foreseeable future (and thus wouldn't be writing code depending on 1.9 language or library features). All of that changed, of course, once we began packaging Wallaby for Fedora. Because Fedora 17 defaults to Ruby 1.9.3, we needed to port the code to work with both Ruby 1.9.3 and Ruby 1.8.6. Most major Ruby projects work with 1.9 these days, but I suspect a lot of code that is intended to be deployed on Ruby 1.8 is in the same boat we were. In this post, we'll cover some of the pitfalls we ran into while boldly bringing our code base into 2009; we've kept an eye towards solutions that will work in 1.8 and 1.9, since we will be continuing to support Wallaby on Ruby 1.8.6.
Standard library interface changes
In Ruby 1.8, Module#instance_methods returned an Array of Strings; in Ruby 1.9, it returns an Array of Symbols. The return type of Module#constants has also changed similarly. Module#method_defined? and Module#const_defined? still accept String or Symbol parameters, though; when checking for the existence of a method or constant, prefer these. If you need to iterate through or grep for a substring in an Array of method or constant names, map each value to the result of calling to_s on itself first.
The sort of String returned by the to_s method on collection classes has changed in Ruby 1.9; it is now similar to the return value of the inspect method. So, in Ruby 1.8:
If you have code that depends on the old to_s behavior (we did, but I intend to conceal the identity of the developer responsible in order to protect the guilty), you can approximate it in several backwards-compatible ways: if you're only worried about Arrays that contain String elements, the easiest thing to do is just to call ls.join("").
Semantic changes in the Ruby language
The big change that affected us --- and will probably affect you, too, if you do a lot of metaprogramming or module trickery --- involves block scoping. In some cases in our code, define_method blocks that referred to free variables exhibited different behavior on Ruby 1.8 and Ruby 1.9. The solution in these cases is ugly but straightforward: use a Proc object as you would use a let in Scheme, like this:
free_var = :something
Proc.new do |fv|
define_method "my_method" do
A lot of people have written about Ruby 1.9's support for multiple text encodings. If you're using native extension libraries that create strings and haven't been explicitly vetted for 1.9 compatibility, you'll want to make sure that the Ruby String objects are created with the appropriate encoding metadata. In our case, the QMF library was returning UTF-8 strings that had encoding type BINARY in Ruby (i.e. raw bytes). Consider two String objects with identical sequences of bytes; one is tagged BINARY and the other is tagged UTF-8: these will be indistinguishable if you print them to a terminal or write them to a file, but Ruby's comparison operators will not find them identical. We submitted a patch to QMF to ensure that strings returned from QMF are either tagged as the default external encoding or as UTF-8 (if no external encoding is specified).
This isn't an exhaustive list of all of the changes between Ruby 1.8 and Ruby 1.9, of course (see here for that), but it is a set of problems that folks developing networked services and infrastructure for these might need to worry about. The standard-library interface changes were pretty minor; the other issues were listed roughly in order of increasing frustration. The important news, of course, is that Wallaby and its dependencies now work with Ruby 1.9.3 and are thus readily available to Fedora 17 users. Go forth, install, and configure!
Wallaby stores versioned configurations in a database. Wallaby API clients can access older versions of a node’s configuration by supplying the version option to the Node#getConfig method. Sometimes, though, we’d like to inspect individual configurations in greater detail than the API currently allows.
The Wallaby git repository now contains a command to export versioned configurations from a database to flat text files. Clone the repository or just download the file and then place cmd_versioned_config_export.rb somewhere in your WALLABY_SHELL_COMMAND_PATH, and you’ll be able to invoke it like this:
Usage: wallaby vc-export DBFILE
exports versioned configurations from DBFILE to plain text files
-h, --help displays this message
--verbose outputs information about command progress
-o, --output-dir DIR set output directory to DIR
--earliest NUM output only the earliest NUM configurations
--latest NUM output only the most recent NUM configurations
--since DATE output only configurations since the given date
It will create a directory (called snapshots by default) with subdirectories for each versioned configuration; each of these will be timestamped with the time of the configuration. Within each version directory, there will be directories for node configurations and stored group configurations. (If you’re using an older version of Wallaby, the only stored group configuration will be for the default group. Versioned configurations generated with a fairly recent version of Wallaby, on the other hand, will have stored configurations for more groups and should also have useful information about node memberships in the configuration file.)
On this site, I write about topics related to things I'm working on now and things I've worked on in the past: distributed computing and programming languages. I don't speak for my employer, and any opinions on this site are mine alone.
Will Benton: Erik, I absolutely agree; this should be considered early-access stuff. read more
Erik Erlandson: It might be better to organize as: import wallaby.tagging read more
ferkeltongs: Hi Will, I came across your post while looking for read more