It’s an honor to present at Red Hat Summit again this year! I’m giving a brief introduction to machine learning concepts for developers. Of course, one can’t do justice to such a broad topic in a forty-minute session, but I have some materials for people who’d like to experiment with some fundamental ML techniques on their own time.
These materials are all presented as Jupyter notebooks, which combine code, narrative explanations, and output. These notebooks mean that you can inspect code, run it, change it, and experiment with it. The main thing to know about Jupyter is that notebooks are made up of cells, and pressing shift+enter will run the cell you’re currently on and move to the next one. If you get stuck, you can go up to the “Kernel” menu, and select “Restart and clear output.”
First up, this notebook can be run directly in your browser through the mybinder.org service – it presents an introduction to the scalable analytic techniques I mentioned in the beginning of the session.
If you’d like to dive deeper into specific machine learning techniques, you’ll need to fire up OpenShift:
- log in to an OpenShift cluster, or create a temporary one on your local machine with
oc cluster up
. - create a pod to serve some more example notebooks with
oc new-app radanalyticsio/workshop-notebook -e JUPYTER_NOTEBOOK_PASSWORD=developer
, and - expose a route to that pod with
oc expose workshop-notebook
.
When you visit the route for the Jupyter pod, you’ll need to log in. The password is developer
. After you log in, you’ll be presented a with a list of notebook files. Here’s what each of them contain:
ml-basics.ipynb
contains visual explanations and examples of clustering, classification, and regression using Apache Spark,pyspark.ipynb
introduces data engineering and data cleaning using Apache Spark and shows you how to train a natural language model on a data set from an open-source project,var.ipynb
shows you how to model data and run Monte Carlo simulations with Apache Spark using an example from the financial domain.
Finally, be sure to visit radanalytics.io to see examples of intelligent applications on OpenShift and strimzi.io to learn how to enable Apache Kafka on OpenShift.
You’re at the beginning of a really exciting journey! I hope these resources are helpful as you get started.