Ad-hoc, informally-specified, bug-ridden operating system distributions

Greenspun’s tenth rule of programming states that

Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.

Expressive high-level languages with powerful runtimes are far more common now than they were in 1993, but the general insight behind Greenspun’s rule remains undeniable – lower-level environments may seem desirable because they’re unfettered by certain kinds of complexity and lack the (percieved) baggage of richer ones, but this baggage often turns out to be necessary to get real work done and winds up getting reinvented poorly.¹

Linux containers present the illusion of a relatively baggage-free environment for software distribution, and it’s wonderful that people have built workflows to let you go from a commit passing CI to an immutable deployment. But the fantastic developer experience that container tooling offers has also inspired a lot of people to do unsafe things in production, because there’s effectively no barrier to entry; building containers essentially turns everyone into a Linux distribution vendor; and being a Linux distribution vendor is not a part of most people’s skill set.²

Even if we just consider security (and ignore issues of legality and stability, among others), there are many places that these ad-hoc distributions can go off the rails. Just think of how many Dockerfiles (or similar image recipes) do things like

running services as root,
pulling down random binaries or tarballs from the public internet,
building static binaries against an environment defined by an unvetted image downloaded from a public registry,
building static binaries without any machine-readable or human-auditable representation of their dependencies, or
relying on alternative C library implementations that are designed to save code size and are only ever deployed in containers.

I’ve had many conversations in the last five years in which someone has asserted that container tooling obviates other packaging mechanisms.³ But this assumes that the hard part of packaging, e.g., an RPM for Fedora is in using the Fedora release tooling to get binaries into an RPM-shaped container. The hard part, of course, is in satisfying the guidelines that the Fedora project has put in place to make it more likely that Fedora will be stable, secure, legal, and usable. Since the issue is not the shape of the package but rather what it contains, saying that you don’t need to know how to make, e.g., an RPM if you have containers misses the point: it’s like saying “I know how to encode an audio stream as an MP3 file, so I could have produced this MP3 of Palestrina’s ‘Sicut cervus.’”⁴

Container tooling makes it very easy to produce ad-hoc systems software distributions that don’t offer any of the value of traditional systems software distributions but still have many of their potential liabilities. Indeed, one might say that any containerized software distribution of sufficient complexity includes an ad-hoc, informally-specified, bug-ridden, and probably legally dubious implementation of half of the Fedora packaging guidelines.

(I’ve been meaning to write this post for a while; thanks to Paul Snively for inspiring me to finally get it done!)

Footnotes

There’s a corollary to Greenspun’s rule for distributed systems and Erlang, naturally.↩︎
Indeed, the concerns of distributing systems software aren’t even particularly obvious to people who haven’t spent time in this world.↩︎
This conversation has even happened with people who work in the business of making open-source software consumable and supportable (and should probably know better).↩︎
The analogy with Palestrina’s contrapuntal style, governed as it is by rules and constraints, is deliberate.↩︎