fedmsg talk at Spark Summit

I’m speaking at Spark Summit today about using Spark to analyze operational data from the Fedora project. Here are some links to further resources related to my talk:

My talk slides are online; the online deck includes some extra slides that I skipped in the talk as delivered
An earlier post provides some background on fedmsg and Spark
You may also be interested in a higher-level discussion of issues with schema inference from the perspective of type theory
Here’s the annotated source code for the ML pipeline transformer I discussed in my talk

You should also check out my team’s Silex library, which contains useful code factored out of real Spark applications we’ve built in Red Hat’s Emerging Technology group. It includes a lot of cool functionality, but the part I mentioned in the talk is this handy interface for preprocessing JSON data before inferring a schema.