Cloud-native machine learning systems at day two and beyond

It was a lot of fun to co-present at KubeCon with Sophie Watson on machine learning systems and MLOps today. Kubernetes is an obvious choice for building machine learning systems in 2020, but as you build these systems, you will be faced with several non-obvious choices. In this talk, we sought to distill many of the things we’ve learned while supporting machine learning systems and workflows on Kubernetes over the years and help to make the road ahead straighter and smoother for practitioners and operators who are just getting started.

We had a wonderfully engaged audience, and Q&A was a lot of fun, both during the talk and on Slack afterwards. Several attendees were interested in our slides, which are available here, and in the MLOps “tube map,” which is available here, along with a number of links to other useful resources.

Sophie and I have been collaborating in this space for quite a while and we’ve produced some really cool work. Here are links to some other materials of interest:

Our OSCON 2019 talk “Kubernetes for Machine Learning: Productivity over Primitives” shows how Kubernetes provides the basis for machine learning system solutions — and how to build solutions that ML practitioners will actually want to use,
our nachlass framework demonstrates how to publish pipeline services directly from unmodified Jupyter notebooks and CI/CD pipelines in Kubernetes using source-to-image builders, and
our two interactive workshops (here and here) show how to do entire end-to-end ML lifecycles – discovery, training, CI/CD, inference, and monitoring – all on Kubernetes.